Abstract
AbstractIn this work, we propose an intelligent human–robot collaboration system designed to assist embodied intelligence in learning complex, long-horizon manufacturing assembly tasks. The system integrates multiple expert agents and augmented reality (AR) interaction interfaces, enabling robots to request planning and execution guidance from humans and efficiently complete intricate tasks. Specifically, the expert agents equipped with a high-level planner and a Vision-Language-Action (VLA) model actively interact with users through text, vision, and action modalities to acquire critical information, learn task-specific skills, and develop sub-task planning strategies. A distributed data and model architecture ensures real-time interactions between different models and facilitates seamless human–robot collaboration. We evaluate the system on two challenging long-horizon manufacturing assembly tasks (gear assembly and peg insertion) to demonstrate the effectiveness of the proposed approach. The system successfully learns both assembly tasks within five trials and enables the embodied intelligence to complete them with progressively reduced execution time.
| Original language | English |
|---|---|
| Article number | 103268 |
| Number of pages | 13 |
| Journal | Robotics and Computer-Integrated Manufacturing |
| Volume | 100 |
| DOIs | |
| Publication status | Published - Aug 2026 |
Keywords
- Augmented reality
- Embodied intelligence
- Human–robot collaboration
- Multimodal large-language model
- Visual language action model
ASJC Scopus subject areas
- Control and Systems Engineering
- Software
- General Mathematics
- Computer Science Applications
- Industrial and Manufacturing Engineering
Fingerprint
Dive into the research topics of 'VLAbot: A human Vision–Language–Action models interaction framework for robotic assembly'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver