Abstract
In this paper, a neural network approach is presented to classify grey scale Chinese and English document images. The approach, which consists of three steps: preprocessing, feature extraction and classification, can successfully handle Chinese and English document images of different densities, fonts, sizes and styles of characters. Two neural networks are employed. The first neural network is used to derive a set of 15 masks for extracting features. The coefficients of the masks are approximated to a set of computationally-simple values so that the computational complexity in extracting features can be reduced significantly. The second neural network of a smaller size is then trained using the extracted 15 features to perform the language separation. Experimental results on a set of 40 document images including 20 Chinese document images and 20 English document images show that 100% correct classification rate can be achieved. Our approach is compared favorably with an existing language separation method.
Original language | English |
---|---|
Pages (from-to) | 381-386 |
Number of pages | 6 |
Journal | Chinese Journal of Electronics |
Volume | 7 |
Issue number | 4 |
Publication status | Published - 1 Oct 1998 |
Keywords
- Document image processing
- Neural network applications
- Texture analysis
- Written language separation
ASJC Scopus subject areas
- Electrical and Electronic Engineering