Generation of Fundus Fluorescein Angiography Videos for Health Care Data Sharing

  • Xinyuan Wu
  • , Lili Wang
  • , Ruoyu Chen
  • , Bowen Liu
  • , Weiyi Zhang
  • , Xi Yang
  • , Yifan Feng
  • , Mingguang He (Corresponding Author)
  • , Danli Shi (Corresponding Author)

Research output: Journal article publicationJournal articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

Importance: Medical data sharing faces strict restrictions. Text-to-video generation shows potential for creating realistic medical data while preserving privacy, offering a solution for cross-center data sharing and medical education.

Objective: To develop and evaluate a text-to-video generative artificial intelligence (AI)-driven model that converts the text of reports into dynamic fundus fluorescein angiography (FFA) videos, enabling visualization of retinal vascular and structural abnormalities.

Design, setting, and participants: This study retrospectively collected anonymized FFA data from a tertiary hospital in China. The dataset included both the medical records and FFA examinations of patients assessed between November 2016 and December 2019. A text-to-video model was developed and evaluated. The AI-driven model integrated the wavelet-flow variational autoencoder and the diffusion transformer.

Main outcomes and measures: The AI-driven model's performance was assessed through objective metrics (Fréchet video distance, learned perceptual image patch similarity score, and visual question answering score [VQAScore]). The domain-specific evaluation for the generated FFA videos was measured by the bidirectional encoder representations from transformers score (BERTScore). Image retrieval was evaluated using a Recall@K score. Each video was rated for quality by 3 ophthalmologists on a scale of 1 (excellent) to 5 (very poor).

Results: A total of 3625 FFA videos were included (2851 videos [78.6%] for training, 387 videos [10.7%] for validation, and 387 videos [10.7%] for testing). The AI-generated FFA videos demonstrated retinal abnormalities from the input text (Fréchet video distance of 2273, a mean learned perceptual image patch similarity score of 0.48 [SD, 0.04], and a mean VQAScore of 0.61 [SD, 0.08]). The domain-specific evaluations showed alignment between the generated videos and textual prompts (mean BERTScore, 0.35 [SD, 0.09]). The Recall@K scores were 0.02 for K = 5, 0.04 for K = 10, and 0.16 for K = 50, yielding a mean score of 0.073, reflecting disparities between AI-generated and real clinical videos and demonstrating privacy-preserving effectiveness. For assessment of visual quality of the FFA videos by the 3 ophthalmologists, the mean score was 1.57 (SD, 0.44).

Conclusions and relevance: This study demonstrated that an AI-driven text-to-video model generated FFA videos from textual descriptions, potentially improving visualization for clinical and educational purposes. The privacy-preserving nature of the model may address key challenges in data sharing while trying to ensure compliance with confidentiality standards.
Original languageEnglish
Article numbere251419
Pages (from-to)623-632
Number of pages10
JournalJAMA Ophthalmology
Volume143
Issue number8
Early online date26 Jun 2025
DOIs
Publication statusPublished - 21 Aug 2025

Fingerprint

Dive into the research topics of 'Generation of Fundus Fluorescein Angiography Videos for Health Care Data Sharing'. Together they form a unique fingerprint.

Cite this