TY - JOUR
T1 - Multi-task recurrent convolutional network with correlation loss for surgical video analysis
AU - Jin, Yueming
AU - Li, Huaxia
AU - Dou, Qi
AU - Chen, Hao
AU - Qin, Jing
AU - Fu, Chi Wing
AU - Heng, Pheng Ann
N1 - Funding Information:
This work is supported by Hong Kong RGC TRS project T42-409/18-R, Hong Kong RGC GRF project 14225616, the National Natural Science Foundation of China (Project No. U1813204), the Hong Kong Innovation Fund (Project No. ITT/024/17GP), the Hong Kong Innovation and Technology Commission (Project No. ITS/319/17), the Shenzhen Science and Technology Program (Project No. JCYJ20170413162617606), and the CUHK T Stone Robotics Institute, CUHK. Yueming Jin is funded by the HK Ph.D. Fellowship.
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2020/1
Y1 - 2020/1
N2 - Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin, e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition.
AB - Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin, e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition.
KW - Correlation loss
KW - Deep learning
KW - Multi-task learning
KW - Surgical video analysis
UR - http://www.scopus.com/inward/record.url?scp=85073512736&partnerID=8YFLogxK
U2 - 10.1016/j.media.2019.101572
DO - 10.1016/j.media.2019.101572
M3 - Journal article
C2 - 31639622
AN - SCOPUS:85073512736
SN - 1361-8415
VL - 59
JO - Medical Image Analysis
JF - Medical Image Analysis
M1 - 101572
ER -