Skip to main navigation Skip to search Skip to main content

Residual Attention Single-Head Vision Transformer Network for Rolling Bearing Fault Diagnosis in Noisy Environments

  • Songjiang Lai
  • , Tsun Hin Cheung
  • , Jiayi Zhao
  • , Kaiwen Xue
  • , Ka Chun Fung
  • , Kin Man Lam

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Rolling bearings are critical components in modern industrial machinery, significantly impacting the performance, longevity, and safety of equipment. Due to harsh operating conditions, such as high speeds and temperatures, rolling bearings are prone to malfunctions, leading to equipment downtime, economic losses, and safety risks. In this paper, the Residual Attention Single-Head Vision Transformer Network (RA-SHViT-Net) is proposed for fault diagnosis in rolling bearings. The vibration signal collected from rolling bearings is first transformed from the time domain to the frequency domain using Fast Fourier Transform (FFT). The RA-SHViT-Net model then leverages the Single-Head Vision Transformer (SHViT), which is adept at capturing local and global features from time-series signals. SHViT also offers a state-of-the-art balance between computational complexity and prediction accuracy and it has been demonstrated to achieve promising results in the field of computer vision. To enhance feature extraction, we introduce an Adaptive Hybrid Attention Block (AHAB) that combines channel and spatial attention mechanisms. The core building block of the RA-SHViT-Net is the Residual Attention Single-Head Vision Transformer Block, which consists of a Depthwise Convolution (DWConv) layer, a Single-Head Self-Attention (SHSA) layer, a Residual Feed-Forward-Network (Res-FFN) and an Adaptive Hybrid Attention Block (AHAB). This architecture is designed to comprehensively extract vibration signal features by considering the interdependencies among feature channels and spatial information based on the excellent feature extraction capabilities of SHViT. Additionally, each Single-Head Vision Transformer Block incorporates a Residual Feed-Forward-Network (Res-FNN) module, which uses residual connections to mitigate the vanishing gradient problem, enabling stable and efficient training of deep models. This design enhances the model's ability to learn complex representations and improves its generalization capabilities. The proposed RA-SHViT-Net was evaluated using the Case Western Reserve University (CWRU) dataset and the Paderborn University dataset. The results demonstrate that the RA-SHViT-Net outperforms state-of-the-art methods in terms of accuracy and robustness, particularly in scenarios involving complex and noisy environments. In addition, we designed multiple ablation studies to investigate the impact of different modules on the network's prediction performance. Overall, the RA-SHViT-Net provides a powerful tool for the early detection and classification of bearing faults, contributing to more reliable and efficient maintenance strategies in industrial applications.

Original languageEnglish
Title of host publicationProceedings of the 2024 6th International Conference on Video, Signal and Image Processing, VSIP 2024
PublisherAssociation for Computing Machinery, Inc
Pages136-150
Number of pages15
ISBN (Electronic)9798400709647
DOIs
Publication statusPublished - 27 Feb 2025
Event6th International Conference on Video, Signal and Image Processing, VSIP 2024 - Ningbo, China
Duration: 22 Nov 202424 Nov 2024

Publication series

NameProceedings of the 2024 6th International Conference on Video, Signal and Image Processing, VSIP 2024

Conference

Conference6th International Conference on Video, Signal and Image Processing, VSIP 2024
Country/TerritoryChina
CityNingbo
Period22/11/2424/11/24

Keywords

  • attention mechanism
  • Fast Fourier Transform (FFT)
  • fault diagnosis
  • noisy environments
  • rolling bearings
  • Vision Transformer

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications
  • Signal Processing

Fingerprint

Dive into the research topics of 'Residual Attention Single-Head Vision Transformer Network for Rolling Bearing Fault Diagnosis in Noisy Environments'. Together they form a unique fingerprint.

Cite this