H2R Bridge: Transferring vision-language models to few-shot intention meta-perception in human robot collaboration

Duidi Wu, Qianyou Zhao, Junming Fan, Jin Qi (Corresponding Author), Pai Zheng (Corresponding Author), Jie Hu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Human–robot collaboration enhances efficiency by enabling robots to work alongside human operators in shared tasks. Accurately understanding human intentions is critical for achieving a high level of collaboration. Existing methods heavily rely on case-specific data and face challenges with new tasks and unseen categories, while often limited data is available under real-world conditions. To bolster the proactive cognitive abilities of collaborative robots, this work introduces a Visual-Language-Temporal approach, conceptualizing intent recognition as a multimodal learning problem with HRC-oriented prompts. A large model with prior knowledge is fine-tuned to acquire industrial domain expertise, then enables efficient rapid transfer through few-shot learning in data-scarce scenarios. Comparisons with state-of-the-art methods across various datasets demonstrate the proposed approach achieves new benchmarks. Ablation studies confirm the efficacy of the multimodal framework, and few-shot experiments further underscore meta-perceptual potential. This work addresses the challenges of perceptual data and training costs, building a human–robot bridge (H2R Bridge) for semantic communication, and is expected to facilitate proactive HRC and further integration of large models in industrial applications.

Original languageEnglish
Pages (from-to)524-535
Number of pages12
JournalJournal of Manufacturing Systems
Volume80
DOIs
Publication statusPublished - Jun 2025

Keywords

  • Few-shot learning
  • Human–robot collaboration
  • Intent recognition
  • Vision-language models

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Hardware and Architecture
  • Industrial and Manufacturing Engineering

Fingerprint

Dive into the research topics of 'H2R Bridge: Transferring vision-language models to few-shot intention meta-perception in human robot collaboration'. Together they form a unique fingerprint.

Cite this