Script determination of mixed Chinese/English document images using Kolmogorov complexity measure

Zheru Chi, Qing Wang

Research output: Journal article publicationConference articleAcademic researchpeer-review

Abstract

In this paper, we propose an approach based on Kolmogorov Complexity (KC) measure for determining script classes in mixed Chinese (complex characters)/English document images. This approach, which mainly consists of two steps: document image preprocessing and KC measure, can successfully separate Chinese text lines from English ones. Our approach is robust and reliable in handling document images of different appearances and densities, and various fonts, sizes and styles of characters used in documents. Experimental results on a set of 40 text line images (20 English text lines and 20 Complex Chinese text lines) from various document images show that 100% correct classification rate can be achieved.
Original languageEnglish
Pages (from-to)686-692
Number of pages7
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume4875
Issue number2
DOIs
Publication statusPublished - 1 Jan 2002
EventSecond International Conference on Image and Graphics - Hefei, China
Duration: 16 Aug 200218 Aug 2002

Keywords

  • Document image processing
  • Kolmogorov complexity
  • Scrip determination

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Condensed Matter Physics

Cite this