TY - JOUR
T1 - Understand Short Texts by Harvesting and Analyzing Semantic Knowledge
AU - Hua, Wen
AU - Wang, Zhongyuan
AU - Wang, Haixun
AU - Zheng, Kai
AU - Zhou, Xiaofang
N1 - Funding Information:
Acknowledgments. This paper has been funded by research grants from CONACyT, Grant 81965, and from PAPIIT–UNAM, Grant 104408.
Funding Information:
This paper has been funded by research grants from CONACyT, Grant 81965, and from PAPIIT–UNAM, Grant 104408.
Publisher Copyright:
© 2016 IEEE.
PY - 2017/3/1
Y1 - 2017/3/1
N2 - Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing tools, ranging from part-of-speech tagging to dependency parsing, cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text mining such as topic modeling. Third, short texts are more ambiguous and noisy, and are generated in an enormous volume, which further increases the difficulty to handle them. We argue that semantic knowledge is required in order to better understand short texts. In this work, we build a prototype system for short text understanding which exploits semantic knowledge provided by a well-known knowledgebase and automatically harvested from a web corpus. Our knowledge-intensive approaches disrupt traditional methods for tasks such as text segmentation, part-of-speech tagging, and concept labeling, in the sense that we focus on semantics in all these tasks. We conduct a comprehensive performance evaluation on real-life data. The results show that semantic knowledge is indispensable for short text understanding, and our knowledge-intensive approaches are both effective and efficient in discovering semantics of short texts.
AB - Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing tools, ranging from part-of-speech tagging to dependency parsing, cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text mining such as topic modeling. Third, short texts are more ambiguous and noisy, and are generated in an enormous volume, which further increases the difficulty to handle them. We argue that semantic knowledge is required in order to better understand short texts. In this work, we build a prototype system for short text understanding which exploits semantic knowledge provided by a well-known knowledgebase and automatically harvested from a web corpus. Our knowledge-intensive approaches disrupt traditional methods for tasks such as text segmentation, part-of-speech tagging, and concept labeling, in the sense that we focus on semantics in all these tasks. We conduct a comprehensive performance evaluation on real-life data. The results show that semantic knowledge is indispensable for short text understanding, and our knowledge-intensive approaches are both effective and efficient in discovering semantics of short texts.
KW - concept labeling
KW - semantic knowledge
KW - Short text understanding
KW - Text segmentation
KW - type detection
UR - http://www.scopus.com/inward/record.url?scp=85012273882&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2016.2571687
DO - 10.1109/TKDE.2016.2571687
M3 - Journal article
AN - SCOPUS:85012273882
SN - 1041-4347
VL - 29
SP - 499
EP - 512
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 3
M1 - 7476863
ER -