Aggregating skip bigrams into key phrase-based vector space model for Web Person Disambiguation

Jian Xu, Qin Lu, Zhengzhong Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

3 Citations (Scopus)

Abstract

Web Person Disambiguation (WPD) is often done through clustering of web documents to identify the different namesakes for a given name. This paper presents a clustering algorithm using key phrases as the basic feature. However, key phrases are used in two different forms to represent the document as well context information surround the name mentions in a document. In using the vector space model, key phrases extracted from the documents are used as document representation. Context information of name mentions is represented by skip bigrams of the key phrase sequences surrounding the name mentions. The two components are then aggregated into the vector space model for clustering Experiments on the WePS2 datasets show that the proposed approach achieved comparable results with the top 1 system. It indicates that key phrases can be a very effective feature for WPD both at the document level and at the sentential level near the name mentions.
Original languageEnglish
Title of host publication11th Conference on Natural Language Processing, KONVENS 2012
Subtitle of host publicationEmpirical Methods in Natural Language Processing - Proceedings of the Conference on Natural Language Processing 2012
Pages108-117
Number of pages10
Volume5
Publication statusPublished - 1 Dec 2012
Event11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012 - Vienna, Austria
Duration: 19 Sep 201221 Sep 2012

Conference

Conference11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012
CountryAustria
CityVienna
Period19/09/1221/09/12

ASJC Scopus subject areas

  • Software

Cite this