Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiao Yong Wei, Chang Wen Chen, Qing Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

6 Citations (Scopus)

Abstract

In this paper, we explore the use of large language models (LLMs) to enhance video moment retrieval (VMR) by integrating general knowledge and pseudo-events as priors. We address the limitations of LLMs in generating continuous outputs, such as salience scores and inter-frame embeddings, which are critical for capturing inter-frame relations. To address these limitations, we propose using LLM encoders, which refine inter-concept relations in multimodal embeddings effectively, even without textual training. Our feasibility study shows that this capability extends to other embeddings like BLIP and T5 when they exhibit similar patterns to CLIP embeddings. We present a general framework for integrating LLM encoders into existing VMR architectures, specifically within the fusion module. The LLM encoder's ability to refine concept relation can help the model to achieve a balanced understanding of the foreground concepts (e.g., persons, faces) and background concepts (e.g., street, mountains) rather focusing only on the visually dominant foreground concepts. Additionally, we utilize pseudo-events, identified via event detection, to guide accurate moment prediction within event boundaries, reducing distractions from adjacent moments. Our plug-in approach for semantic refinement and pseudo-event regulation demonstrates state-of-the-art VMR performance through experimental validation. The source code can be accessed at https://github.com/fletcherjiang/LLMEPET.

Original languageEnglish
Title of host publicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages7249-7258
Number of pages10
ISBN (Electronic)9798400706868
DOIs
Publication statusPublished - 28 Oct 2024
Event32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024

Publication series

NameMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia, MM 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Keywords

  • highlight detection
  • llms
  • video moment retrieval

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval'. Together they form a unique fingerprint.

Cite this