Abstract
In this article, we introduce a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on analyses of contextual information extracted from a text corpus-specifically, analyses of word co-occurrences in a large-scale electronic database of text. Here, a target word is represented as the combination of the average of all words preceding the target and all words following it in a text corpus. The semantic representation of the target words can be further processed by a self-organizing map (SOM; Kohonen, Self-organizing maps,2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving features, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has its applications in a variety of contexts, including computational modeling of language acquisition and processing. In this report, we present specific examples from two languages (English and Chinese) to demonstrate how the method is applied to extract the semantic representations of words. © 2010 Psychonomic Society, Inc.
Original language | English |
---|---|
Pages (from-to) | 77-88 |
Number of pages | 12 |
Journal | Behavior Research Methods |
Volume | 43 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Mar 2011 |
Externally published | Yes |
Keywords
- Contextual self-organizing map
- Corpus analysis
- Distributed semantic representation
- Sematic vectors
ASJC Scopus subject areas
- Experimental and Cognitive Psychology
- Developmental and Educational Psychology
- Arts and Humanities (miscellaneous)
- Psychology (miscellaneous)
- General Psychology