Manually annotating a data set for scene text detection is extremely time-consuming. In this paper, we propose a new semi-automatic annotation model to produce tight polygonal annotations for text instances in scene images, based on the input of manually annotated text center lines. Our approach first generates multiple candidate boundaries, which share the same input center line. Then, by training a fastidious content recognizer, optimal boundary selection is performed. The bounded text region, which achieves the smallest recognition loss, is selected as the tightest of the text. As this optimal boundary estimation is guided by semantic recognition, our method is called Semantic Boundary Estimation. Experiment results show that only half clicks compared to manually annotated polygon, are input to annotate center line, and precise polygon text region annotation is automatically produced. A high recall of more than 95% at IoU > 0.5 and 80% at IoU > 0.7 is achieved, demonstrating the high agreement with the original ground truth. In addition, using the generated annotations on benchmarks, such as Total-Text, CTW1500 and ICDAR2015, to train state-of-the-art detectors can achieve similar performance to those trained with manual annotations. This further verifies the good annotation performance. A annotation toolkit based on the proposed model is available at CenterlineAnnotation.