Abstract
Web Person Disambiguation (WPD) is often done through clustering of web documents to identify the different namesakes for a given name. This paper presents a clustering algorithm using key phrases as the basic feature. However, key phrases are used in two different forms to represent the document as well context information surround the name mentions in a document. In using the vector space model, key phrases extracted from the documents are used as document representation. Context information of name mentions is represented by skip bigrams of the key phrase sequences surrounding the name mentions. The two components are then aggregated into the vector space model for clustering Experiments on the WePS2 datasets show that the proposed approach achieved comparable results with the top 1 system. It indicates that key phrases can be a very effective feature for WPD both at the document level and at the sentential level near the name mentions.
Original language | English |
---|---|
Title of host publication | 11th Conference on Natural Language Processing, KONVENS 2012 |
Subtitle of host publication | Empirical Methods in Natural Language Processing - Proceedings of the Conference on Natural Language Processing 2012 |
Pages | 108-117 |
Number of pages | 10 |
Volume | 5 |
Publication status | Published - 1 Dec 2012 |
Event | 11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012 - Vienna, Austria Duration: 19 Sept 2012 → 21 Sept 2012 |
Conference
Conference | 11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012 |
---|---|
Country/Territory | Austria |
City | Vienna |
Period | 19/09/12 → 21/09/12 |
ASJC Scopus subject areas
- Software