Abstract
在金融领域信息抽取中 ,公司名扮演着非常重要的角色 ;因此如何正确识别文本中出现的公司名是一个非常重要的研究课题。在对金融新闻文本进行了深入地分析和研究的基础上 ,总结出了公司名的结构特征及其上下文信息 ,建立了六个用于识别公司名的知识库 ,并提出了一个基于两次扫描过程的识别策略。初步实验结果表明 ,在封闭测试中实验系统公司名识别的精确率可以达到 97 3% ,召回率可达 89 3% ;在开放测试中精确率可以达到 6 2 8% ,召回率可达 6 2 1%。 ||Identifying company names in running texts plays a significant role in financial information extraction.Based on the thoroughly investigations of financial articles,the relevant structural features and contextual constraints were obtained.In this paper,a company name identification system is proposed,which is built on the six knowledge bases and a twice scan method.The experiment achieved 97 3% precision and 89 3% recall respectively by close test,and 62 8% precision and 62 1% recall respectively by open test.
Original language | Chinese (Simplified) |
---|---|
Pages (from-to) | 1-6 |
Number of pages | 6 |
Journal | 中文信息学报 (Journal of Chinese information processing) |
Volume | 16 |
Issue number | 2 |
Publication status | Published - 2002 |
Keywords
- Company name
- Financial domain
- Named entity identification
- Information extraction