Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap Pui Chau, Gang Wang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

19 Citations (Scopus)


This paper proposes a new method called multimodal recurrent neural networks (RNNs) for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and depth maps. It simultaneously performs training of two RNNs that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality features. Each RNN model learns its representations from its own previous hidden states and transferred patterns from the other RNNs previous hidden states; thus, both model-specific and cross-modality features are retained. We exploit the structure of quad-directional 2D-RNNs to model the short- and long-range contextual information in the 2D input image. We carefully designed various baselines to efficiently examine our proposed model structure. We test our multimodal RNNs method on popular RGB-D benchmarks and show how it outperforms previous methods significantly and achieves competitive results with other state-of-the-art works.

Original languageEnglish
Pages (from-to)1656-1671
Number of pages16
JournalIEEE Transactions on Multimedia
Issue number7
Publication statusPublished - Jul 2018
Externally publishedYes


  • CNNs
  • Multimodal learning
  • RGB-D scene labeling
  • RNNs

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling'. Together they form a unique fingerprint.

Cite this