Skip to main navigation Skip to search Skip to main content

Gated Probabilistic Diffusion for Temporal Action Segmentation

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Temporal action segmentation is a fundamental task in video understanding, involving the identification and classification of human actions in long, untrimmed videos. Existing methods often suffer from over-segmentation errors and struggle to model complex temporal dependencies. Inspired by denoising diffusion models, we propose Gated Probabilistic Diffusion Action Segmentation (GPDAS), a novel framework that formulates action segmentation as a conditional sequence generation task. GPDAS iteratively refines frame-wise action labels through a denoising process conditioned on video features, implicitly modeling action priors and domain-specific behavioral knowledge. Our approach includes (1) a gated probabilistic decoder with adaptive temporal convolutions to enhance boundary accuracy and action continuity, (2) dual boundary-aware and action-dependent loss functions to capture chronological dependencies and improve temporal localization, and (3) masked conditioning strategies to improve robustness. Evaluated on the GTEA, 50Salads, and Breakfast benchmarks, GPDAS achieves state-of-the-art performance, outperforming existing methods in edit score and segmental F1 scores while effectively mitigating over-segmentation. The gated decoder demonstrates strong performance in modeling long-range, complex action dynamics.

Original languageEnglish
Title of host publication2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1868-1873
Number of pages6
ISBN (Electronic)9798331572068
DOIs
Publication statusPublished - Oct 2025
Event17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025 - Singapore, Singapore
Duration: 22 Oct 202524 Oct 2025

Publication series

Name2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025

Conference

Conference17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
Country/TerritorySingapore
CitySingapore
Period22/10/2524/10/25

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Gated Probabilistic Diffusion for Temporal Action Segmentation'. Together they form a unique fingerprint.

Cite this