TY - GEN
T1 - A Byte-based GPT-2 Model for Bit-flip JPEG Bitstream Restoration
AU - Qin, Hao
AU - Sun, Haoran
AU - Wang, Yi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/12
Y1 - 2024/12
N2 - In this paper, we investigate the application of large language models (LLMs) for the recovery of corrupted bitstreams, specifically focusing on JPEG image data. We propose a byte-based GPT-2 model that directly processes byte sequences and predicts the subsequent byte, enabling its application to JPEG bitstream recovery. This architecture allows the model to capture the relationships between consecutive byte data within the bitstream of a JPEG image, such that the model can restore the bit-flip errors due to the damaged storage and malicious attack. We evaluate the model's performance on bit-flip JPEG datasets with varying bit error rates (BERs). The experimental results demonstrate the model's ability to implicitly learn patterns in the bitstream and correct erroneous bytes, showcasing the potential of LLMs in binary processing tasks. Our findings highlight the promise of byte-based LLMs in addressing data corruption issues and open up new avenues for research in this domain.
AB - In this paper, we investigate the application of large language models (LLMs) for the recovery of corrupted bitstreams, specifically focusing on JPEG image data. We propose a byte-based GPT-2 model that directly processes byte sequences and predicts the subsequent byte, enabling its application to JPEG bitstream recovery. This architecture allows the model to capture the relationships between consecutive byte data within the bitstream of a JPEG image, such that the model can restore the bit-flip errors due to the damaged storage and malicious attack. We evaluate the model's performance on bit-flip JPEG datasets with varying bit error rates (BERs). The experimental results demonstrate the model's ability to implicitly learn patterns in the bitstream and correct erroneous bytes, showcasing the potential of LLMs in binary processing tasks. Our findings highlight the promise of byte-based LLMs in addressing data corruption issues and open up new avenues for research in this domain.
UR - http://www.scopus.com/inward/record.url?scp=85218203908&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC63619.2025.10849152
DO - 10.1109/APSIPAASC63619.2025.10849152
M3 - Conference article published in proceeding or book
AN - SCOPUS:85218203908
T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Y2 - 3 December 2024 through 6 December 2024
ER -