Contrastive Pre-training and Representation Distillation for Medical Visual Question Answering Based on Radiology Images

Bo Liu, Li Ming Zhan, Xiao Ming Wu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

34 Citations (Scopus)

Abstract

One of the primary challenges facing medical visual question answering (Med-VQA) is the lack of large-scale well-annotated datasets for training. To overcome this challenge, this paper proposes a two-stage pre-training framework by learning transferable feature representations of radiology images and distilling a lightweight visual feature extractor for Med-VQA. Specifically, we leverage large amounts of unlabeled radiology images to train three teacher models for the body regions of brain, chest, and abdomen respectively via contrastive learning. Then, we distill the teacher models to a lightweight student model that can be used as a universal visual feature extractor for any Med-VQA system. The lightweight feature extractor can be readily fine-tuned on the training radiology images of any Med-VQA dataset, saving the annotation effort while preventing overfitting to small-scale training data. The effectiveness and advantages of the pre-trained model are demonstrated by extensive experiments with state-of-the-art Med-VQA methods on existing benchmarks. The source code and the pre-training dataset can be downloaded from https://github.com/awenbocc/cprd.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2021 - 24th International Conference, Proceedings
EditorsMarleen de Bruijne, Philippe C. Cattin, Stéphane Cotin, Nicolas Padoy, Stefanie Speidel, Yefeng Zheng, Caroline Essert
PublisherSpringer Science and Business Media Deutschland GmbH
Pages210-220
Number of pages11
ISBN (Print)9783030871956
DOIs
Publication statusPublished - Sept 2021
Event24th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2021 - Virtual, Online
Duration: 27 Sept 20211 Oct 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12902 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2021
CityVirtual, Online
Period27/09/211/10/21

Keywords

  • Contrastive learning
  • Medical visual question answering
  • Model compression
  • Representation distillation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Contrastive Pre-training and Representation Distillation for Medical Visual Question Answering Based on Radiology Images'. Together they form a unique fingerprint.

Cite this