Anatomical Structure-Guided Medical Vision-Language Pre-training

  • Qingqiu Li
  • , Xiaohan Yan
  • , Jilan Xu
  • , Runtian Yuan
  • , Yuejie Zhang
  • , Rui Feng
  • , Quanli Shen
  • , Xiaobo Zhang
  • , Shujun Wang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

10 Citations (Scopus)

Abstract

Learning medical visual representations through visionlanguage pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods. Our code is available at https://asgmvlp.github.io.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention - MICCAI 2024 - 27th International Conference, Proceedings
EditorsMarius George Linguraru, Aasa Feragen, Ben Glocker, Stamatia Giannarou, Julia A. Schnabel, Qi Dou, Karim Lekadir
PublisherSpringer Science and Business Media Deutschland GmbH
Pages80-90
Number of pages11
ISBN (Print)9783031721199
DOIs
Publication statusPublished - 14 Mar 2024
Event27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024 - Marrakesh, Morocco
Duration: 6 Oct 202410 Oct 2024

Publication series

NameLecture Notes in Computer Science
Volume15011 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
Country/TerritoryMorocco
CityMarrakesh
Period6/10/2410/10/24

Keywords

  • Anatomical Structure
  • Contrastive Learning
  • Medical Vision-Language Pre-training
  • Representation Learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Anatomical Structure-Guided Medical Vision-Language Pre-training'. Together they form a unique fingerprint.

Cite this