Abstract
在自然语言处理及其应用领域,人名和称谓作为重要的命名实体,是信息处理的关键部分之一。该文从命名实体识别和资讯提取的角度出发,在对4部明清古典小说的语料库进行标注的前提下,建构了姓名、字号和称谓作为命名实体的分类及标注系统。人名和称谓总体上分为单一型和复合型,根据复合型的内部组成元素和组合方式,将其进一步分为固定式、同位式、附属嵌套式、灵活嵌套式。结合语料库的完整数据统计,该文对各类型人名和称谓进行了比较分析,并分别展示了4部名著在人名、称谓使用上的特点。||Personal names and terms of address are important parts of named entities.The recognition of personal names as well as terms of address is ans essential issue in natural language processing.This paper presents a classification and annotation scheme for personal names and terms of address from the perspective of named entity recognition and information extraction on a corpus of four Chinese classical novels.Personal names and terms of address are categorized into simple types and compound types.And the compound-type is further categorized into four subtypes,fixed expressions,appositive constructions,subordinate constructions of affiliation,and other subordinate constructions.This paper also presents a comparative analysis on these types and the characteristics of the four novels based on full statistics of the annotated corpus.
Original language | Chinese (Simplified) |
---|---|
Pages (from-to) | 19-27 |
Number of pages | 9 |
Journal | 中文信息学报 (Journal of Chinese information processing) |
Volume | 29 |
Issue number | 1 |
Publication status | Published - 2015 |
Keywords
- Named entity annotation
- Classification of personal names and terms of address
- Corpus construction