TY - JOUR
T1 - A scene text detector based on deep feature merging
AU - Zhang, Yong
AU - Huang, Yubei
AU - Zhao, Donning
AU - Wu, Chun Ho
AU - Ip, Wai Hung
AU - Yung, Kai Leung
N1 - Funding Information:
This work was supported by the Science and Technology Plan Projects of Shenzhen (No. JSGG20200807171601010, JSGG20191127151401743), the Graduate Education Reform Project of Shenzhen University (SZUGS2020JG11) and the grants from the Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, China (H-ZG3K).
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/6
Y1 - 2021/6
N2 - Scene text detection has become an important research topic. It can be broadly applied to much industrial equipment, such as smart phones, intelligent scanners, and IoT devices. Many existing scene text detection methods have achieved advanced performance. However, text in scene images is presented with differing orientations and varying shapes, rendering scene text detection a challenging task. This paper proposes a method for detecting texts in scene images. First, four stages of low-level features is extracted using DenseNet121. Low-level features are then merged by transposed convolution and skip connection. Second, the merged feature map is used to generate a score map, box map, and angle map. Finally, the Locality-Aware Non-Maximum Suppression (LANMS) is applied as post-processing to generate the final bounding box. The proposed method achieves an F-measure of 0.826 on ICDAR 2015 and 0.761 on MSRA-TD500, respectively.
AB - Scene text detection has become an important research topic. It can be broadly applied to much industrial equipment, such as smart phones, intelligent scanners, and IoT devices. Many existing scene text detection methods have achieved advanced performance. However, text in scene images is presented with differing orientations and varying shapes, rendering scene text detection a challenging task. This paper proposes a method for detecting texts in scene images. First, four stages of low-level features is extracted using DenseNet121. Low-level features are then merged by transposed convolution and skip connection. Second, the merged feature map is used to generate a score map, box map, and angle map. Finally, the Locality-Aware Non-Maximum Suppression (LANMS) is applied as post-processing to generate the final bounding box. The proposed method achieves an F-measure of 0.826 on ICDAR 2015 and 0.761 on MSRA-TD500, respectively.
KW - Convolutional neural network
KW - Deep feature merging
KW - DenseNet121
KW - Scene text detector
UR - http://www.scopus.com/inward/record.url?scp=85108016779&partnerID=8YFLogxK
U2 - 10.1007/s11042-021-11101-w
DO - 10.1007/s11042-021-11101-w
M3 - Journal article
AN - SCOPUS:85108016779
SN - 1380-7501
VL - 80
SP - 29005
EP - 29016
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 19
ER -