TY - GEN
T1 - Can We Learn what People are Doing from Raw DNS Queries?
AU - Li, Jianfeng
AU - Ma, Xiaobo
AU - Guodong, Li
AU - Luo, Xiapu
AU - Zhang, Junjie
AU - Li, Wei
AU - Guan, Xiaohong
PY - 2018/10/8
Y1 - 2018/10/8
N2 - Domain Name System (DNS) is one of the pillars of today's Internet. Due to its appealing properties such as low data volume, wide-ranging applications and encryption free, DNS traffic has been extensively utilized for network monitoring. Most existing studies of DNS traffic, however, focus on domain name reputation. Little attention has been paid to understanding and profiling what people are doing from DNS traffic, a fundamental problem in the areas including Internet demographics and network behavior analysis. Consequently, simple questions like 'How to determine whether a DNS query for www.google.com means searching or any other behaviors?' cannot be answered by existing studies. In this paper, we take the first step to identify user activities from raw DNS queries. We advance a multiscale hierarchical framework to tackle two practical challenges, i.e., behavior ambiguity and behavior polymorphism. Under this framework, a series of novel methods, such as pattern upward mapping and multi-scale random forest classifier, are proposed to characterize and identify user activities of interest. Evaluation using both synthetic and real-world DNS traces demonstrates the effectiveness of our method.
AB - Domain Name System (DNS) is one of the pillars of today's Internet. Due to its appealing properties such as low data volume, wide-ranging applications and encryption free, DNS traffic has been extensively utilized for network monitoring. Most existing studies of DNS traffic, however, focus on domain name reputation. Little attention has been paid to understanding and profiling what people are doing from DNS traffic, a fundamental problem in the areas including Internet demographics and network behavior analysis. Consequently, simple questions like 'How to determine whether a DNS query for www.google.com means searching or any other behaviors?' cannot be answered by existing studies. In this paper, we take the first step to identify user activities from raw DNS queries. We advance a multiscale hierarchical framework to tackle two practical challenges, i.e., behavior ambiguity and behavior polymorphism. Under this framework, a series of novel methods, such as pattern upward mapping and multi-scale random forest classifier, are proposed to characterize and identify user activities of interest. Evaluation using both synthetic and real-world DNS traces demonstrates the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=85056181392&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2018.8486210
DO - 10.1109/INFOCOM.2018.8486210
M3 - Conference article published in proceeding or book
AN - SCOPUS:85056181392
T3 - Proceedings - IEEE INFOCOM
SP - 2240
EP - 2248
BT - INFOCOM 2018 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Conference on Computer Communications, INFOCOM 2018
Y2 - 15 April 2018 through 19 April 2018
ER -