Abstract
In this work, a novel robust natural text recognition network (RNTR-Net) is proposed based on a combination of convolutional neural network (CNN) (for feature extraction) and a recurrent neural network (RNN) (for sequence recognition). The pipeline design comprises an improved block of residual learning combined with a general residual block to extract feature maps. Two bidirectional Long Short Term Memory (LSTM) networks are used for sequence recognition, and a transcription layer is used for decoding. The proposed network can handle text images suffering from distortion or other degradations. Compared with previous algorithms, we achieve superior results in general datasets, including the IIIT-5K, Street View Text and ICDAR datasets. Moreover, the performance of the presented network is either highly competitive or even state-of-the-art regarding the highly challenging SVT-Perspective and CUTE80 datasets. We obtain considerable performance of 84.7% and 62.6% on lexicon-free IIIT-5K and CUTE80 datasets, respectively. The experimental results demonstrate the effectiveness of our network.
Original language | English |
---|---|
Article number | 8950043 |
Pages (from-to) | 7719-7730 |
Number of pages | 12 |
Journal | IEEE Access |
Volume | 8 |
DOIs | |
Publication status | Published - 2020 |
Externally published | Yes |
Keywords
- bidirectional LSTMs
- CNN
- residual learning
- Robust natural text recognition network
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering