TGLC: Visual object tracking by fusion of global-local information and channel information

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Visual object tracking aspires to locate the target incessantly in each frame with designated initial target location, which is an imperative yet demanding task in computer vision. Recent approaches strive to fuse global information of template and search region for object tracking, which achieve promising tracking performance. However, fusion of global information devastates some local details. Local information is essential for distinguishing the target from background regions. With a focus on addressing this problem, this work presents a novel tracking algorithm TGLC integrating a channel-aware convolution block and Transformer attention for global and local representation aggregation, and for channel information modeling. This method is capable of accurately estimating the bounding box of the target. Extensive experiments are conducted on five widely recognized datasets, i.e., GOT-10k, TrackingNet, LaSOT, OTB100 and UAV123. The results depict that the proposed tracking method achieves competitive tracking performance compared with state-of-the-art trackers while still running in real-time. Visualization of the tracking results on LaSOT further demonstrates the capability of the proposed tracking method to cope with tracking challenges, e.g., illumination variation, deformation of the target and background clutter.

Original languageEnglish
JournalMultimedia Tools and Applications
DOIs
Publication statusE-pub ahead of print - Mar 2024

Keywords

  • Channel information
  • Convolution
  • Global-local representation aggregation
  • Transformer attention
  • Visual object tracking

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'TGLC: Visual object tracking by fusion of global-local information and channel information'. Together they form a unique fingerprint.

Cite this