TY - GEN
T1 - A weighted word embedding model for text classification
AU - Ren, Haopeng
AU - Zeng, Ze Quan
AU - Cai, Yi
AU - Du, Qing
AU - Li, Qing
AU - Xie, Haoran
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.
AB - Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.
KW - N-grams
KW - Neural bag-of-words models
KW - Term weighting schemes
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85065515641&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-18576-3_25
DO - 10.1007/978-3-030-18576-3_25
M3 - Conference article published in proceeding or book
AN - SCOPUS:85065515641
SN - 9783030185756
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 419
EP - 434
BT - Database Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Proceedings
A2 - Li, Guoliang
A2 - Yang, Jun
A2 - Gama, Joao
A2 - Natwichai, Juggapong
A2 - Tong, Yongxin
PB - Springer-Verlag
T2 - 24th International Conference on Database Systems for Advanced Applications, DASFAA 2019
Y2 - 22 April 2019 through 25 April 2019
ER -