TY - GEN
T1 - A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings
AU - Liu, Wenyang
AU - Wang, Yi
AU - Wu, Kejun
AU - Yap, Kim Hui
AU - Chau, Lap Pui
N1 - Funding Information:
This research/project is supported by the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity R&D Programme (NRF2018NCR-NCR009-0001). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and Cyber Security Agency of Singapore.
Funding Information:
This research/project is supported by the National Research Foundation, Singapore, and Cyber Security Agency of Sin- gapore under its National Cybersecurity R&D Programme (NRF2018NCR-NCR009-0001). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and Cyber Security Agency.
Publisher Copyright:
© 2023 IEEE.
PY - 2023/6
Y1 - 2023/6
N2 - File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence & image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image.
AB - File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence & image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image.
KW - byte2image
KW - CNN
KW - file fragment classification
KW - memory forensics
UR - http://www.scopus.com/inward/record.url?scp=85166368115&partnerID=8YFLogxK
U2 - 10.1109/AICAS57966.2023.10168636
DO - 10.1109/AICAS57966.2023.10168636
M3 - Conference article published in proceeding or book
AN - SCOPUS:85166368115
T3 - AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
BT - AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2023
Y2 - 11 June 2023 through 13 June 2023
ER -