Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectives

  • Duidi Wu
  • , Pai Zheng
  • , Qianyou Zhao
  • , Shuo Zhang
  • , Jin Qi
  • , Jie Hu
  • , Guo Niu Zhu
  • , Lihui Wang

Research output: Journal article publicationReview articleAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Industry 5.0 advocates human-centric smart manufacturing (HSM), with growing attention to proactive human-machine collaboration (HRC). Meanwhile, the rapid development of Multimodal large language models (MLLMs) and embodied intelligence is driving an unprecedented evolution. This work aims to leverage these opportunities to enhance robots’ learning and cognitive capabilities, enabling seamless and natural interaction. However, current research often overlooks human–robot symbiosis and lacks attention to specialized models and practical applications. This review adheres to a human-centric vision, taking language as the pivot to connect humans with large models. To our best knowledge, this is the first attempt to integrate HRC, MLLMs and embodied intelligence into a holistic view. The review first introduces representative foundation models to provide a comprehensive summary of state-of-the-art methods in the ”Perception-Cognition-Actuation” loop. It then discusses pathways and platforms for efficient spatial skills learning, followed by an analysis of four key questions from the ”Why, How, What, Where” perspectives. Finally, it highlights future challenges and potential research directions. It is hoped that this work can help fill the research gap between HRC and MLLMs, offering a systematic pathway for developing human-centered collaborative systems and promoting further exploration and innovation in this exciting and crucial field. The resources are available at: https://github.com/WuDuidi/MLLM-HRC-Survey.

Original languageEnglish
Article number103064
Number of pages24
JournalRobotics and Computer-Integrated Manufacturing
Volume97
DOIs
Publication statusPublished - Feb 2026

Keywords

  • Embodied intelligence
  • Human–robot collaboration
  • Large language model
  • Robot learning
  • Smart manufacturing

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • General Mathematics
  • Computer Science Applications
  • Industrial and Manufacturing Engineering

Fingerprint

Dive into the research topics of 'Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectives'. Together they form a unique fingerprint.

Cite this