The most challenging part of vision-based place recognition is the wide variety in appearance of places. However temporal information between consecutive frames can be used to infer the next locations of a vehicle and obtain information about its ego-motion. Effective use of temporal information is useful to narrow the search ranges of the next locations, hence an efficient place recognition system can be accomplished. This paper presents a robust vision-based place recognition method, using the recent discriminative ConvNet features and proposes a flexible tubing strategy which groups consecutive frames based on their similarities. With the tubing strategy, effective pair searching can be achieved. We also suggest to add additional variations in the appearance of places to further enhance the variety of the training data and fine-tune an off-the-shelf, CALC, network model to obtain better generalization about its extracted features. Experimental results show that our proposed temporal correlation based recognition strategy with the fine-tuned model achieves the best (0.572) F1 score improvement over the original CALC model. The proposed place recognition method is also faster than the linear full search method by a factor of 2.15.