Difficulties in the application of “Chinese character component standard of GB 13000.1 character set for information processing” for Chinese character input

Xiaoheng Zhang

Research output: Journal article publicationJournal articleAcademic research

Abstract

《信息处理用GB1 30 0 0 1字符集汉字部件规范》对于规范汉字形码输入法具有非常重要的意义。然而 ,在实际运用上却存在着部件数量太大 ,部件定义难以操作 ,部件拆分组合不易掌握等难处。造成困难的原因主要有 :(1 )基础部件主要靠列表来确定 ,(2 )部件强调按理切分和成字组合 ,(3)过多依赖“组字能力”的判别 ,(4 )过分注重部件数量的限制。要走出“难”的困境 ,应该在现有规范的基础上根据汉字的形态特征制定出简便可靠的部件识别规则和切分规则。实验证明 ,这种方法是行之有效的。||Chinese Character Component Standard of GB 13000.1 Character Set for Information Processing is an important document for the standardization of Chinese character input methods. Yet, when employed to the design and implementation of a nontrivial Chinese character input system, the standard encountered a number of difficulties: the hard to remember large number of coding components, the difficult to maneuver definition of basic components, and the poor rules for component disassembly and assembly. The sources of these difficulties include (a) definition of basic components by enumeration, (b) disassembly and assembly of components based on etymology and formation of characters, (c) reliance on the judgment of character forming capability of candidate components, and (d) over emphasis on the restriction of the number of basic components. To escape from this difficult position, we urgently need convenient and reliable rules for component identification and segmentation, which can be built up on the basis of the existing component standard by taking full advantage of the form features of Chinese characters. The feasibility and effectiveness of the proposed methodology have been verified by the successful development of the ZYQ Chinese character input system.
Original languageChinese (Simplified)
Pages (from-to)60-65
Number of pages6
Journal中文信息学报 (Journal of Chinese information processing)
Volume18
Issue number4
Publication statusPublished - 2004

Keywords

  • Computer application
  • Chinese information processing
  • Chinese character input
  • Chinese character component
  • Standard

Cite this