Interpretation-Empowered Neural Cleanse for Backdoor Attacks

Liang Bo Ning, Zeyu Dai, Jingran Su, Chao Pan, Luning Wang, Wenqi Fan, Qing Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Backdoor attacks have posed a significant threat to deep neural networks, highlighting the need for robust defense strategies. Previous research has demonstrated that attribution maps change substantially when exposed to attacks, suggesting the potential of interpreters in detecting adversarial examples. However, most existing defense methods against backdoor attacks overlook the untapped capabilities of interpreters, failing to fully leverage their potential. In this paper, we propose a novel approach called interpretation-empowered neural cleanse (IENC) for defending backdoor attacks. Specifically, integrated gradient (IG) is adopted to bridge the interpreters and classifiers to reverse and reconstruct the high-quality backdoor trigger. Then, an interpretation-empowered adaptative pruning strategy (IEAPS) is proposed to cleanse the backdoor-related neurons without the pre-defined threshold. Additionally, a hybrid model patching approach is employed to integrate the IEAPS and preprocessing techniques to enhance the defense performance. Comprehensive experiments are constructed on various datasets, demonstrating the potential of interpretations in defending backdoor attacks and the superiority of the proposed method.

Original languageEnglish
Title of host publicationWWW 2024 Companion - Companion Proceedings of the ACM Web Conference
PublisherAssociation for Computing Machinery, Inc
Pages951-954
Number of pages4
ISBN (Electronic)9798400701726
DOIs
Publication statusPublished - 13 May 2024
Event33rd ACM Web Conference, WWW 2024 - Singapore, Singapore
Duration: 13 May 202417 May 2024

Publication series

NameWWW 2024 Companion - Companion Proceedings of the ACM Web Conference

Conference

Conference33rd ACM Web Conference, WWW 2024
Country/TerritorySingapore
CitySingapore
Period13/05/2417/05/24

Keywords

  • Backdoor Attacks
  • Defense Mechanism
  • Interpretability

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Interpretation-Empowered Neural Cleanse for Backdoor Attacks'. Together they form a unique fingerprint.

Cite this