TY - GEN
T1 - An empirical study on email classification using supervised machine learning in real environments
AU - Li, Wenjuan
AU - Meng, Weizhi
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/9
Y1 - 2015/9/9
N2 - Spam emails are considered as one of the biggest challenges for the Internet. Thus email classification, which aims to correctly classify legitimate and spam emails, becomes an important topic for both industry and academia. To achieve this goal, machine learning techniques, especially supervised machine learning algorithms, have been extensively applied to this field. In literature, several studies reveal that supervised machine learning (SML) suffers from some limitations such as performance fluctuation, hence many works start focusing on designing more complex algorithms. However, we identify that most existing research efforts are based on datasets, while more research should be conducted to investigate the performance of SML in real environments. In this paper, we thus perform an empirical study with three different environments and over 1,000 users regarding this issue. In the study, we find that SML classifiers like decision tree and SVMs are acceptable by users in real email classification. In addition, we discuss promising directions and provide new insights in this area.
AB - Spam emails are considered as one of the biggest challenges for the Internet. Thus email classification, which aims to correctly classify legitimate and spam emails, becomes an important topic for both industry and academia. To achieve this goal, machine learning techniques, especially supervised machine learning algorithms, have been extensively applied to this field. In literature, several studies reveal that supervised machine learning (SML) suffers from some limitations such as performance fluctuation, hence many works start focusing on designing more complex algorithms. However, we identify that most existing research efforts are based on datasets, while more research should be conducted to investigate the performance of SML in real environments. In this paper, we thus perform an empirical study with three different environments and over 1,000 users regarding this issue. In the study, we find that SML classifiers like decision tree and SVMs are acceptable by users in real email classification. In addition, we discuss promising directions and provide new insights in this area.
KW - Email Classification
KW - Empirical Study
KW - Spam Detection
KW - Supervised Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=84953741076&partnerID=8YFLogxK
U2 - 10.1109/ICC.2015.7249515
DO - 10.1109/ICC.2015.7249515
M3 - Conference article published in proceeding or book
AN - SCOPUS:84953741076
T3 - IEEE International Conference on Communications
SP - 7438
EP - 7443
BT - 2015 IEEE International Conference on Communications, ICC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Conference on Communications, ICC 2015
Y2 - 8 June 2015 through 12 June 2015
ER -