Abstract
Recently, achieving query-efficient adversarial example attacks targeting black-box natural language models has attracted widespread attention from researchers. This task is considered difficult due to the discrete nature of texts, limited knowledge of the target model, and strict query access limitations in real-world systems. However, existing attacks often require a large number of queries or result in low attack success rates, having not met practical requirements. To address this, we propose FastTextDodger, a simple and compact decision-based black-box textual adversarial attack that generates grammatically correct adversarial texts with high attack success rates and few queries. Experimental results show that FastTextDodger achieves an impressive 97.4% attack success rate on benchmark datasets and models, and only needs about 200 queries. Compared to state-of-the-art attacks, FastTextDodger only requires one-tenth of the number of queries in text classification and entailment tasks while maintaining comparable attack success rates and perturbed word rates.
Original language | English |
---|---|
Pages (from-to) | 2398-2411 |
Number of pages | 14 |
Journal | IEEE Transactions on Information Forensics and Security |
Volume | 19 |
DOIs | |
Publication status | Published - 5 Jan 2024 |
Externally published | Yes |
Keywords
- Adversarial attacks
- black-box attacks
- natural language processing
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Computer Networks and Communications