RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes

Yuxiang Sun, Weixun Zuo, Ming Liu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

207 Citations (Scopus)


Semantic segmentation is a fundamental capability for autonomous vehicles. With the advancements of deep learning technologies, many effective semantic segmentation networks have been proposed in recent years. However, most of them are designed using RGB images from visible cameras. The quality of RGB images is prone to be degraded under unsatisfied lighting conditions, such as darkness and glares of oncoming headlights, which imposes critical challenges for the networks that use only RGB images. Different from visible cameras, thermal imaging cameras generate images using thermal radiations. They are able to see under various lighting conditions. In order to enable robust and accurate semantic segmentation for autonomous vehicles, we take the advantage of thermal images and fuse both the RGB and thermal information in a novel deep neural network. The main innovation of this letter is the architecture of the proposed network. We adopt the encoder-decoder design concept. ResNet is employed for feature extraction and a new decoder is developed to restore the feature map resolution. The experimental results prove that our network outperforms the state of the arts.

Original languageEnglish
Article number8666745
Pages (from-to)2576-2583
Number of pages8
JournalIEEE Robotics and Automation Letters
Issue number3
Publication statusPublished - Jul 2019


  • Deep Neural Network
  • Information Fusion
  • Semantic Segmentation
  • Thermal Images
  • Urban Scenes

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Biomedical Engineering
  • Human-Computer Interaction
  • Mechanical Engineering
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Control and Optimization
  • Artificial Intelligence


Dive into the research topics of 'RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes'. Together they form a unique fingerprint.

Cite this