Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection

Tsun Hin Cheung, Ka Chun Fung, Songjiang Lai, Kwan Ho Lin, Vincent Ng, Kin Man Lam

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Identifying defects and anomalies in industrial products is a critical quality control task. Traditional manual inspection methods are slow, subjective, and error-prone. In this work, we propose a novel zero-shot training-free approach for automated industrial image anomaly detection using a multimodal machine learning pipeline, consisting of three foundation models. Our method first uses a large language model, i.e., GPT-3. generate text prompts describing the expected appearances of normal and abnormal products. We then use a grounding object detection model, called Grounding DINO, to locate the product in the image. Finally, we compare the cropped product image patches to the generated prompts using a zero-shot image-text matching model, called CLIP, to identify any anomalies. Our experiments on two datasets of industrial product images, namely MVTec-AD and VisA, demonstrate the effectiveness of this method, achieving high accuracy in detecting various types of defects and anomalies without the need for model training. Our proposed model enables efficient, scalable, and objective quality control in industrial manufacturing settings.

Original languageEnglish
Title of host publicationAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350367331
DOIs
Publication statusPublished - Dec 2024
Event2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China
Duration: 3 Dec 20246 Dec 2024

Publication series

NameAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Country/TerritoryChina
CityMacau
Period3/12/246/12/24

Keywords

  • Image Anomaly Detection
  • Multimodal Model
  • Object Localization
  • Prompt Generation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection'. Together they form a unique fingerprint.

Cite this