A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

Wenyang Liu, Yi Wang, Kejun Wu, Kim Hui Yap, Lap Pui Chau

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

2 Citations (Scopus)

Abstract

File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence & image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image.

Original languageEnglish
Title of host publicationAICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350332674
DOIs
Publication statusPublished - Jun 2023
Event5th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2023 - Hangzhou, China
Duration: 11 Jun 202313 Jun 2023

Publication series

NameAICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding

Conference

Conference5th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2023
Country/TerritoryChina
CityHangzhou
Period11/06/2313/06/23

Keywords

  • byte2image
  • CNN
  • file fragment classification
  • memory forensics

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Information Systems
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings'. Together they form a unique fingerprint.

Cite this