TY - GEN
T1 - Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope
AU - Li, Jixiu
AU - Huang, Yisen
AU - Ng, Wing Yin
AU - Cheng, Truman
AU - Wu, Xixin
AU - Dou, Qi
AU - Meng, Helen
AU - Heng, Pheng Ann
AU - Liu, Yunhui
AU - Chan, Shannon Melissa
AU - Navarro-Alarcon, David
AU - Hang Ng, Calvin Sze
AU - Yan Chiu, Philip Wai
AU - Li, Zheng
N1 - Funding Information:
This work was supported in part by Research Grant Council General Research Fund under Projects 14203019 and 14202820, and in part by Early Career Scheme under Project 24204818. (Jixiu. Li and Yisen. Huang contributed equally to this work.)(Corresponding author: Zheng Li.) Jixiu Li, Yisen Huang, Wing Yin Ng, Truman Cheng, Shannon Melissa Chan, and Calvin Sze Hang Ng are with the Department of Surgery, The Chinese University of Hong Kong, Hong Kong Xixin Wu and Helen Meng are with the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong.
Publisher Copyright:
© 2022 IEEE.
PY - 2022/12
Y1 - 2022/12
N2 - In minimally invasive surgery (MIS), controlling the endoscope view is crucial for the operation. Many robotic endoscope holders were developed aiming to address this prob-lem,. These systems rely on joystick, foot pedal, simple voice command, etc. to control the robot. These methods requires surgeons extra effort and are not intuitive enough. In this paper, we propose a speech-vision based multi-modal AI approach, which integrates deep learning based instrument detection, automatic speech recognition and robot visual servo control. Surgeons could communicate with the endoscope by speech to indicate their view preference, such as the instrument to be tracked. The instrument is detected by the deep learning neural network. Then the endoscope takes the detected instrument as the target and follows it with the visual servo controller. This method is applied to a magnetic anchored and guided endoscope and evaluated experimentally. Preliminary results demonstrated this approach is effective and requires little efforts for the surgeon to control the endoscope view intuitively.
AB - In minimally invasive surgery (MIS), controlling the endoscope view is crucial for the operation. Many robotic endoscope holders were developed aiming to address this prob-lem,. These systems rely on joystick, foot pedal, simple voice command, etc. to control the robot. These methods requires surgeons extra effort and are not intuitive enough. In this paper, we propose a speech-vision based multi-modal AI approach, which integrates deep learning based instrument detection, automatic speech recognition and robot visual servo control. Surgeons could communicate with the endoscope by speech to indicate their view preference, such as the instrument to be tracked. The instrument is detected by the deep learning neural network. Then the endoscope takes the detected instrument as the target and follows it with the visual servo controller. This method is applied to a magnetic anchored and guided endoscope and evaluated experimentally. Preliminary results demonstrated this approach is effective and requires little efforts for the surgeon to control the endoscope view intuitively.
UR - http://www.scopus.com/inward/record.url?scp=85147329936&partnerID=8YFLogxK
U2 - 10.1109/ROBIO55434.2022.10011904
DO - 10.1109/ROBIO55434.2022.10011904
M3 - Conference article published in proceeding or book
AN - SCOPUS:85147329936
T3 - 2022 IEEE International Conference on Robotics and Biomimetics, ROBIO 2022
SP - 403
EP - 408
BT - 2022 IEEE International Conference on Robotics and Biomimetics, ROBIO 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Robotics and Biomimetics, ROBIO 2022
Y2 - 5 December 2022 through 9 December 2022
ER -