Distributional consistency: As a general method for defining a core lexicon

Huarui Zhang, Churen Huang, Shiwen Yu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

11 Citations (Scopus)

Abstract

We propose Distributional Consistency (DC) as a general method for defining a Core Lexicon. The property of DC is investigated theoretically and empirically, showing that it is clearly distinguishable from word frequency and range of distribution. DC is also shown to reflect intuitive interpretations, especially when its value is close to 1. Its immediate application in NLP would include defining a core lexicon in a language and identifying topical words in a document. We also categorize the existent measures of dispersion into 3 groups via ratio of norm or entropy, proposed a simplified measure and a combined kind of measure. These new measures can be used as virtual prototype or medium type for the study and comparison of existent measures in the future. Keywords: Distributional Consistency; Lexical Usuality; Measure of Dispersion; Square Mean Root (SMR); Modified Frequency; Core Lexicon.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
EditorsMaria Francisca Xavier, Rute Costa, Fatima Ferreira, Maria Teresa Lino, Raquel Silva
PublisherEuropean Language Resources Association (ELRA)
Pages1119-1122
Number of pages4
ISBN (Electronic)2951740816, 9782951740815
Publication statusPublished - 1 Jan 2004
Externally publishedYes
Event4th International Conference on Language Resources and Evaluation, LREC 2004 - Lisbon, Portugal
Duration: 26 May 200428 May 2004

Publication series

NameProceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004

Conference

Conference4th International Conference on Language Resources and Evaluation, LREC 2004
Country/TerritoryPortugal
CityLisbon
Period26/05/0428/05/04

ASJC Scopus subject areas

  • Library and Information Sciences
  • Education
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Distributional consistency: As a general method for defining a core lexicon'. Together they form a unique fingerprint.

Cite this