A scene text detector based on deep feature merging

Yong Zhang; Yubei Huang; Donning Zhao; Chun Ho Wu; Wai Hung Ip; Kai Leung Yung

doi:10.1007/s11042-021-11101-w

A scene text detector based on deep feature merging

Yong Zhang
, Yubei Huang
, Donning Zhao
, Chun Ho Wu
, Wai Hung Ip
, Kai Leung Yung

Research output: Journal article publication › Journal article › Academic research › peer-review

7 Citations (Scopus)

Abstract

Scene text detection has become an important research topic. It can be broadly applied to much industrial equipment, such as smart phones, intelligent scanners, and IoT devices. Many existing scene text detection methods have achieved advanced performance. However, text in scene images is presented with differing orientations and varying shapes, rendering scene text detection a challenging task. This paper proposes a method for detecting texts in scene images. First, four stages of low-level features is extracted using DenseNet121. Low-level features are then merged by transposed convolution and skip connection. Second, the merged feature map is used to generate a score map, box map, and angle map. Finally, the Locality-Aware Non-Maximum Suppression (LANMS) is applied as post-processing to generate the final bounding box. The proposed method achieves an F-measure of 0.826 on ICDAR 2015 and 0.761 on MSRA-TD500, respectively.

Original language	English
Pages (from-to)	29005-29016
Number of pages	12
Journal	Multimedia Tools and Applications
Volume	80
Issue number	19
DOIs	https://doi.org/10.1007/s11042-021-11101-w
Publication status	Published - Jun 2021

Keywords

Convolutional neural network
Deep feature merging
DenseNet121
Scene text detector

ASJC Scopus subject areas

Software
Media Technology
Hardware and Architecture
Computer Networks and Communications

More information

10.1007/s11042-021-11101-w

http://hdl.handle.net/10397/104374

Cite this

@article{21af8461edbe409d81f39d3962f4c257,

title = "A scene text detector based on deep feature merging",

abstract = "Scene text detection has become an important research topic. It can be broadly applied to much industrial equipment, such as smart phones, intelligent scanners, and IoT devices. Many existing scene text detection methods have achieved advanced performance. However, text in scene images is presented with differing orientations and varying shapes, rendering scene text detection a challenging task. This paper proposes a method for detecting texts in scene images. First, four stages of low-level features is extracted using DenseNet121. Low-level features are then merged by transposed convolution and skip connection. Second, the merged feature map is used to generate a score map, box map, and angle map. Finally, the Locality-Aware Non-Maximum Suppression (LANMS) is applied as post-processing to generate the final bounding box. The proposed method achieves an F-measure of 0.826 on ICDAR 2015 and 0.761 on MSRA-TD500, respectively.",

keywords = "Convolutional neural network, Deep feature merging, DenseNet121, Scene text detector",

author = "Yong Zhang and Yubei Huang and Donning Zhao and Wu, \{Chun Ho\} and Ip, \{Wai Hung\} and Yung, \{Kai Leung\}",

note = "Funding Information: This work was supported by the Science and Technology Plan Projects of Shenzhen (No. JSGG20200807171601010, JSGG20191127151401743), the Graduate Education Reform Project of Shenzhen University (SZUGS2020JG11) and the grants from the Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, China (H-ZG3K). Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2021",

month = jun,

doi = "10.1007/s11042-021-11101-w",

language = "English",

volume = "80",

pages = "29005--29016",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "19",

}

TY - JOUR

T1 - A scene text detector based on deep feature merging

AU - Zhang, Yong

AU - Huang, Yubei

AU - Zhao, Donning

AU - Wu, Chun Ho

AU - Ip, Wai Hung

AU - Yung, Kai Leung

N1 - Funding Information: This work was supported by the Science and Technology Plan Projects of Shenzhen (No. JSGG20200807171601010, JSGG20191127151401743), the Graduate Education Reform Project of Shenzhen University (SZUGS2020JG11) and the grants from the Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, China (H-ZG3K). Publisher Copyright: © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

PY - 2021/6

Y1 - 2021/6

N2 - Scene text detection has become an important research topic. It can be broadly applied to much industrial equipment, such as smart phones, intelligent scanners, and IoT devices. Many existing scene text detection methods have achieved advanced performance. However, text in scene images is presented with differing orientations and varying shapes, rendering scene text detection a challenging task. This paper proposes a method for detecting texts in scene images. First, four stages of low-level features is extracted using DenseNet121. Low-level features are then merged by transposed convolution and skip connection. Second, the merged feature map is used to generate a score map, box map, and angle map. Finally, the Locality-Aware Non-Maximum Suppression (LANMS) is applied as post-processing to generate the final bounding box. The proposed method achieves an F-measure of 0.826 on ICDAR 2015 and 0.761 on MSRA-TD500, respectively.

AB - Scene text detection has become an important research topic. It can be broadly applied to much industrial equipment, such as smart phones, intelligent scanners, and IoT devices. Many existing scene text detection methods have achieved advanced performance. However, text in scene images is presented with differing orientations and varying shapes, rendering scene text detection a challenging task. This paper proposes a method for detecting texts in scene images. First, four stages of low-level features is extracted using DenseNet121. Low-level features are then merged by transposed convolution and skip connection. Second, the merged feature map is used to generate a score map, box map, and angle map. Finally, the Locality-Aware Non-Maximum Suppression (LANMS) is applied as post-processing to generate the final bounding box. The proposed method achieves an F-measure of 0.826 on ICDAR 2015 and 0.761 on MSRA-TD500, respectively.

KW - Convolutional neural network

KW - Deep feature merging

KW - DenseNet121

KW - Scene text detector

UR - https://www.scopus.com/pages/publications/85108016779

U2 - 10.1007/s11042-021-11101-w

DO - 10.1007/s11042-021-11101-w

M3 - Journal article

AN - SCOPUS:85108016779

SN - 1380-7501

VL - 80

SP - 29005

EP - 29016

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 19

ER -

A scene text detector based on deep feature merging

Abstract

Keywords

ASJC Scopus subject areas

More information

Other files and links

Fingerprint

Cite this