Combining classification with clustering for Web Person Disambiguation

Jian Xu, Qin Lu, Zhengzhong Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

8 Citations (Scopus)


Web Person Disambiguation is often conducted through clustering web documents to identify different namesakes for a given name. This paper presents a new key-phrased clustering method combined with a second step re-classification to identify outliers to improve cluster performance. For document clustering, the hierarchical agglomerative approach is conducted based on the vector space model which uses key phrases as the main feature. Outliers of cluster results are then identified through a centroids-based method. The outliers are then reclassified by the SVM classifier into the more appropriate clusters using a key phrase-based string kernel model as its feature space. The reclassification uses the clustering result in the first step as its training data so as to avoid the use of separate training data required by most classification algorithms. Experiments conducted on the WePS-2 dataset show that the algorithm based on key phrases is effective in improving the WPD performance.
Original languageEnglish
Title of host publicationWWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion
Number of pages2
Publication statusPublished - 21 May 2012
Event21st Annual Conference on World Wide Web, WWW'12 - Lyon, France
Duration: 16 Apr 201220 Apr 2012


Conference21st Annual Conference on World Wide Web, WWW'12


  • Key phrase
  • String kernel
  • SVM
  • Web Person Disambiguation

ASJC Scopus subject areas

  • Computer Networks and Communications

Cite this