TY - JOUR
T1 - ChatFFA: An ophthalmic chat system for unified vision-language understanding and question answering for fundus fluorescein angiography
AU - Chen, Xiaolan
AU - Xu, Pusheng
AU - Li, Yao
AU - Zhang, Weiyi
AU - Song, Fan
AU - He, Mingguang
AU - Shi, Danli
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/5/17
Y1 - 2024/5/17
N2 - Existing automatic analysis of fundus fluorescein angiography (FFA) images faces limitations, including a predetermined set of possible image classifications and being confined to text-based question-answering (QA) approaches. This study aims to address these limitations by developing an end-to-end unified model that utilizes synthetic data to train a visual question-answering model for FFA images. To achieve this, we employed ChatGPT to generate 4,110,581 QA pairs for a large FFA dataset, which encompassed a total of 654,343 FFA images from 9,392 participants. We then fine-tuned the Bootstrapping Language-Image Pre-training (BLIP) framework to enable simultaneous handling of vision and language. The performance of the fine-tuned model (ChatFFA) was thoroughly evaluated through automated and manual assessments, as well as case studies based on an external validation set, demonstrating satisfactory results. In conclusion, our ChatFFA system paves the way for improved efficiency and feasibility in medical imaging analysis by leveraging generative large language models.
AB - Existing automatic analysis of fundus fluorescein angiography (FFA) images faces limitations, including a predetermined set of possible image classifications and being confined to text-based question-answering (QA) approaches. This study aims to address these limitations by developing an end-to-end unified model that utilizes synthetic data to train a visual question-answering model for FFA images. To achieve this, we employed ChatGPT to generate 4,110,581 QA pairs for a large FFA dataset, which encompassed a total of 654,343 FFA images from 9,392 participants. We then fine-tuned the Bootstrapping Language-Image Pre-training (BLIP) framework to enable simultaneous handling of vision and language. The performance of the fine-tuned model (ChatFFA) was thoroughly evaluated through automated and manual assessments, as well as case studies based on an external validation set, demonstrating satisfactory results. In conclusion, our ChatFFA system paves the way for improved efficiency and feasibility in medical imaging analysis by leveraging generative large language models.
KW - Artificial intelligence
KW - Ophthalmology
UR - http://www.scopus.com/inward/record.url?scp=85194249743&partnerID=8YFLogxK
U2 - 10.1016/j.isci.2024.110021
DO - 10.1016/j.isci.2024.110021
M3 - Journal article
AN - SCOPUS:85194249743
SN - 2589-0042
VL - 27
SP - 1
EP - 11
JO - iScience
JF - iScience
IS - 7
M1 - 110021
ER -