Abstract
Semantic scene understanding is a fundamental capability for autonomous vehicles. Under challenging lighting conditions, such as nighttime and on-coming headlights, the semantic scene understanding performance using only RGB images are usually degraded. Thermal images can provide complementary information to RGB images, so many recent semantic segmentation networks have been proposed using RGB-Thermal (RGB-T) images. However, most existing networks focus only on improving segmentation accuracy for single image frames, omitting the information consistency between consecutive frames. To provide a solution to this issue, we propose a temporal-consistent framework for RGB-T semantic segmentation, which introduces a virtual view image generation module to synthesize a virtual image for the next moment, and a consistency loss function to ensure the segmentation consistency. We also propose an evaluation metric to measure both the accuracy and consistency for semantic segmentation. Experimental results show that our framework outperforms state-of-the-art methods.
| Original language | English |
|---|---|
| Pages (from-to) | 9757-9764 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 9 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published - Nov 2024 |
Keywords
- Autonomous vehicles
- multi-modal fusion
- RGB-Thermal
- semantic segmentation
- temporal consistency
ASJC Scopus subject areas
- Control and Systems Engineering
- Biomedical Engineering
- Human-Computer Interaction
- Mechanical Engineering
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Control and Optimization
- Artificial Intelligence