HEVC Based Screen Content Coding and Transcoding Using Machine Learning Techniques

    Student thesis: PhD

    Abstract

    Screen content video is an emerging video type, and it usually shows mixed content with both nature image blocks (NIBs) and computer-generated screen content blocks (SCBs). Since High Efficiency Video Coding (HEVC) is only optimized for NIBs while SCBs exhibit very different characteristics, new techniques are necessary for SCBs. Therefore, the Joint Collaborative Team on Video Coding (JCT-VC) developed Screen Content Coding (SCC) extension on top of HEVC to explore new encoding tools for screen content videos. SCC employs two additional coding modes, intra block copy (IBC) mode and palette (PLT) mode for intra-prediction. Although it can provide high coding efficiency for screen coding videos, the exhaustive mode searching makes the computational complexity of SCC increase dramatically. Therefore, in this thesis, some novel machine learning based techniques are suggested to simplify the encoding and transcoding of SCC.
    A video frame usually has similar characteristics as the previous frames. Therefore, based on the characteristics of previously coded frames, content dependent rules can be derived to predict the optimal modes and coding unit (CU) partitions of the current frame. In this thesis, a fast intra prediction algorithm for SCC by content analysis and dynamic thresholding is firstly proposed. A scene change detection method is adopted to obtain a learning frame in each scene, and the learning frame is encoded by the original SCC encoder to collect learning statistics. Then prediction models are tailor-made for the following frames in the same scene according to the video content and quantization parameter (QP) of the learning frame. Simulation results show that the proposed scheme can achieve remarkable complexity reduction while preserve the coded video quality.
    Afterwards, we propose a decision tree (DT) based framework for fast intra mode decision by investigating various features in training sets. To avoid the exhaustive mode searching process, a sequential arrangement of DTs is proposed to check each mode separately by inserting a classifier before checking a mode. As compared with the previous approaches that both IBC and PLT modes are checked for SCBs, the proposed coding framework is more flexible which facilitates either IBC or PLT mode to be checked for SCBs such that computational complexity is further reduced. To enhance the accuracy of DTs, dynamic features are introduced which reveal the unique intermediate coding information of a coding unit. Simulation results show that the proposed scheme can provide significant complexity saving with negligible loss of coded video quality.
    Since traditional machine learning based approaches heavily rely on manually selected features, it would hurt the performance of an encoder if some important features are falsely omitted. To avoid the necessary of manually selected features, a deep learning based fast prediction network DeepSCC is then proposed, which contains two parts, DeepSCC-I and DeepSCC-II. Before fed to DeepSCC, incoming CUs are divided into two categories: dynamic CTUs and stationary CTUs. For dynamic CTUs having different content as their collocated CTUs, DeepSCC-I takes raw sample values as the input to make fast predictions. For stationary CTUs having the same content as their collocated CTUs, DeepSCC-II additionally utilizes the optimal mode maps of the stationary CTU to further reduce the computational complexity. Simulation results show that the proposed scheme further improve the complexity reduction.
    Finally, we proposed a fast SCC to HEVC transcoder. Although SCC was developed to enhance HEVC for encoding screen content videos, HEVC has dominated the market for many years and it leaves many legacy screen content videos encoded by HEVC. To migrate the legacy screen content videos from HEVC to SCC to improve the coding efficiency, a fast transcoding framework is proposed by analyzing various features from 4 categories. They are the features from the HEVC decoder, static features, dynamic features and spatial features. First, the CU depth level collected from the HEVC decoder is utilized to early terminate the CU partition in SCC. Second, a flexible encoding structure is proposed to make early mode decisions with the help of various features. On the one hand, high decision accuracy is achieved because mode decision is considered from different aspects by utilizing features from more than one category. On the other hand, high computational complexity is reduced because the flexible structure considers the decision of each mode separately. Simulation results show that the proposed scheme dramatically shortens the transcoding time.
    Date of Award7 Aug 2019
    Original languageEnglish

    Cite this

    '