Nowadays, it becomes very convenient to collect large-scale videos that record trajectories of human mobility behavior in various situations in cities, due to the increasing availability of surveillance camera. Obviously, surveillance videos became a new data source of spatiotemporal trajectories. However, a typical trajectory semantic enrichment process receives as input spatiotemporal trajectories. The process methods cannot be applied to video data directly. In this paper, we propose a semantic enrichment process framework for human trajectories in surveillance videos. It includes trajectory identification in videos, trajectory transformation, sub-traj ectory segmentation, segment annotation. We can derive semantic trajectories from surveillance videos through the four phases. Having observed the common occurrence of the similarities between individual trajectories, we propose a grid index-based method to search similar pre-annotated sub-trajectory segments in pixel space for retrieving semantic trajectories in order to enhance the performance of this approach. Finally, we demonstrate the effectiveness and efficiency of our proposed approach by using a real world data set.