Use of subword tokenization for domain generation algorithm classification

Sea Ran Cleon Liew, Ngai Fong Law

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Domain name generation algorithm (DGA) classification is an essential but challenging problem. Both feature-extracting machine learning (ML) methods and deep learning (DL) models such as convolutional neural networks and long short-term memory have been developed. However, the performance of these approaches varies with different types of DGAs. Most features in the ML methods can characterize random-looking DGAs better than word-looking DGAs. To improve the classification performance on word-looking DGAs, subword tokenization is employed for the DL models. Our experimental results proved that the subword tokenization can provide excellent classification performance on the word-looking DGAs. We then propose an integrated scheme that chooses an appropriate method for DGA classification depending on the nature of the DGAs. Results show that the integrated scheme outperformed existing ML and DL methods, and also the subword DL methods.

Original languageEnglish
Article number49
JournalCybersecurity
Volume6
Issue number1
DOIs
Publication statusPublished - Dec 2023

Keywords

  • Botnet detection
  • Domain names
  • Machine learning-based botnet detection
  • Network security

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Use of subword tokenization for domain generation algorithm classification'. Together they form a unique fingerprint.

Cite this