Medical Visual Question Answering via Conditional Reasoning

Li Ming Zhan, Bo Liu, Lu Fan, Jiaxin Chen, Xiao Ming Wu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

76 Citations (Scopus)


Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare industry and services, the technology is still in its infancy and is far from practical use. Med-VQA tasks are highly challenging due to the massive diversity of clinical questions and the disparity of required visual reasoning skills for different types of questions. In this paper, we propose a novel conditional reasoning framework for Med-VQA, aiming to automatically learn effective reasoning skills for various Med-VQA tasks. Particularly, we develop a question-conditioned reasoning module to guide the importance selection over multimodal fusion features. Considering the different nature of closed-ended and open-ended Med-VQA tasks, we further propose a type-conditioned reasoning module to learn a different set of reasoning skills for the two types of tasks separately. Our conditional reasoning framework can be easily applied to existing Med-VQA systems to bring performance gains. In the experiments, we build our system on top of a recent state-of-the-art Med-VQA model and evaluate it on the VQA-RAD benchmark [23]. Remarkably, our system achieves significantly increased accuracy in predicting answers to both closed-ended and open-ended questions, especially for open-ended questions, where a 10.8% increase in absolute accuracy is obtained. The source code can be downloaded from

Original languageEnglish
Title of host publicationMM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Number of pages10
ISBN (Electronic)9781450379885
Publication statusPublished - 12 Oct 2020
Event28th ACM International Conference on Multimedia, MM 2020 - Virtual, Online, United States
Duration: 12 Oct 202016 Oct 2020

Publication series

NameMM 2020 - Proceedings of the 28th ACM International Conference on Multimedia


Conference28th ACM International Conference on Multimedia, MM 2020
Country/TerritoryUnited States
CityVirtual, Online


  • attention mechanism
  • conditional reasoning
  • medical visual question answering

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction


Dive into the research topics of 'Medical Visual Question Answering via Conditional Reasoning'. Together they form a unique fingerprint.

Cite this