NTD: Non-Transferability Enabled Deep Learning Backdoor Detection

Yinshan Li, Hua Ma, Zhi Zhang, Yansong Gao, Alsharif Abuadbba, Minhui Xue, Anmin Fu, Yifeng Zheng, Said F. Al-Sarawi, Derek Abbott

Research output: Journal article publicationJournal articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

To mitigate recent insidious backdoor attacks on deep learning models, advances have been made by the research community. Nonetheless, state-of-the-art defenses are either limited to specific backdoor attacks (i.e., source-agnostic attacks) or non-user-friendly in that machine learning expertise and/or expensive computing resources are required. This work observes that all existing backdoor attacks have an inadvertent and inevitable intrinsic weakness, termed as non-transferability - that is, a trigger input hijacks a backdoored model but is not effective in another model that has not been implanted with the same backdoor. With this key observation, we propose non-transferability enabled backdoor detection to identify trigger inputs for a model-under-test during run-time. Specifically, our detection allows a potentially backdoored model-under-test to predict a label for an input. Moreover, our detection leverages a feature extractor to extract feature vectors for the input and a group of samples randomly picked from its predicted class label, and then compares the similarity between the input and the samples in the feature extractor's latent space to determine whether the input is a trigger input or a benign one. The feature extractor can be provided by a reputable party or is a free pre-trained model privately reserved from any open platform (e.g., ModelZoo, GitHub, Kaggle) by a user and thus our detection does not require the user to have any machine learning expertise or perform costly computations. Extensive experimental evaluations on four common tasks affirm that our detection scheme has high effectiveness (low false acceptance rate) and usability (low false rejection rate) with low detection latency against different types of backdoor attacks.

Original languageEnglish
Pages (from-to)104-119
Number of pages16
JournalIEEE Transactions on Information Forensics and Security
Volume19
DOIs
Publication statusPublished - Sept 2023

Keywords

  • Backdoor countermeasure
  • deep learning
  • non-transferability
  • NTD

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'NTD: Non-Transferability Enabled Deep Learning Backdoor Detection'. Together they form a unique fingerprint.

Cite this