Deep Learning Approach for No-Reference Screen Content Video Quality Assessment

Ngai Wing Kwong, Yui Lam Chan, Sik Ho Tsang, Ziyin Huang, Kin Man Lam

Research output: Journal article publicationJournal articleAcademic researchpeer-review


Screen content video (SCV) has drawn much more attention than ever during the COVID-19 period and has evolved from a niche to a mainstream due to the recent proliferation of remote offices, online meetings, shared-screen collaboration, and gaming live streaming. Therefore, quality assessments for screen content media are highly demanded to maintain service quality recently. Although many practical natural scene video quality assessment methods have been proposed and achieved promising results, these methods cannot be applied to the screen content video quality assessment (SCVQA) task directly since the content characteristics of SCV are substantially different from natural scene video. Besides, only one no-reference SCVQA (NR-SCVQA) method, which requires handcrafted features, has been proposed in the literature. Therefore, we propose the first deep learning approach explicitly designed for NR-SCVQA. First, a multi-channel convolutional neural network (CNN) model is used to extract spatial quality features of pictorial and textual regions separately. Since there is no human annotated quality for each screen content frame (SCF), the CNN model is pre-trained in a multi-task self-supervised fashion to extract spatial quality feature representation of SCF. Second, we propose a time-distributed CNN transformer model (TCNNT) to further process all SCF spatial quality feature representations of an SCV and learn spatial and temporal features simultaneously so that high-level spatiotemporal features of SCV can be extracted and used to assess the whole SCV quality. Experimental results demonstrate the robustness and validity of our model, which is closely related to human perception.

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalIEEE Transactions on Broadcasting
Publication statusPublished - Jun 2024


  • Distortion
  • Feature extraction
  • Human visual experience
  • multi-channel convolutional neural network
  • multi-task learning
  • Multitasking
  • no reference video quality assessment
  • Quality assessment
  • screen content video quality assessment
  • self-supervised learning
  • spatiotemporal features
  • Spatiotemporal phenomena
  • Task analysis
  • Video recording

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering


Dive into the research topics of 'Deep Learning Approach for No-Reference Screen Content Video Quality Assessment'. Together they form a unique fingerprint.

Cite this