Ö. D. ÖNÜR METU 2012 A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING QUALITY OF EXPERIENCE FOR VISUAL CONTENT IN MOBILE DEVICES ÖZGÜR DENIZ ÖNÜR - PDF

Description
Ö. D. ÖNÜR METU 2012 A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING QUALITY OF EXPERIENCE FOR VISUAL CONTENT IN MOBILE DEVICES ÖZGÜR DENIZ ÖNÜR FEBRUARY 2012 A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING

Please download to get full document.

View again

of 120
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Science & Technology

Publish on:

Views: 31 | Pages: 120

Extension: PDF | Download: 0

Share
Transcript
Ö. D. ÖNÜR METU 2012 A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING QUALITY OF EXPERIENCE FOR VISUAL CONTENT IN MOBILE DEVICES ÖZGÜR DENIZ ÖNÜR FEBRUARY 2012 A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING QUALITY OF EXPERIENCE FOR VISUAL CONTENT IN MOBILE DEVICES A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY BY ÖZGÜR DENİZ ÖNÜR IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN ELECTRICAL AND ELECTRONICS ENGINEERING FEBRUARY 2012 Approval of the thesis: A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING QUALITY OF EXPERIENCE FOR VISUAL CONTENT IN MOBILE DEVICES submitted by ÖZGÜR DENİZ ÖNÜR in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Electronics Engineering Department, Middle East Technical University by, Prof. Dr. Canan Özgen Dean, Graduate School of Natural and Applied Sciences Prof. Dr. İsmet Erkmen Head of Department, Electrical and Electronics Engineering Prof. Dr. A. Aydın Alatan Supervisor, Electrical and Electronics Engineering Dept., METU Examining Committee Members: Prof. Dr. Gözde Bozdağı Akar Electrical and Electronics Engineering Dept., METU Prof. Dr. A. Aydın Alatan Electrical and Electronics Engineering Dept., METU Prof. Dr. Levent Onural Electrical and Electronics Engineering Dept., Bilkent University Prof. Dr. Tolga Çiloğlu Electrical and Electronics Engineering Dept., METU Assoc. Prof. Dr. Çağatay Candan Electrical and Electronics Engineering Dept., METU Date: 08/02/2012 I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work. Name, Last name : Signature : iii ABSTRACT A COMPLEXITY-UTILITY FRAMEWORK FOR OPTIMIZING QUALITY OF EXPERIENCE FOR VISUAL CONTENT IN MOBILE DEVICES Önür, Özgür Deniz Ph.D., Department of Electrical and Electronics Engineering Supervisor : Prof. Dr. A. Aydın Alatan February 2012, 121 pages Subjective video quality and video decoding complexity are jointly optimized in order to determine the video encoding parameters that will result in the best Quality of Experience (QoE) for an end user watching a video clip on a mobile device. Subjective video quality is estimated by an objective criteria, video quality metric (VQM), and a method for predicting the video quality of a test sequence from the available training sequences with similar content characteristics is presented. Standardized spatial index and temporal index metrics are utilized in order to measure content similarity. A statistical approach for modeling decoding complexity on a hardware platform using content features extracted from video clips is presented. The overall decoding complexity is modeled as the sum of component complexities that are associated with the computation intensive code blocks present in state-of-the-art hybrid video decoders. The content features and decoding complexities are modeled as random parameters and their joint probability density function is predicted as Gaussian Mixture Models (GMM). These GMMs are obtained off-line using a large training set comprised of video clips. Subsequently, iv decoding complexity of a new video clip is estimated by using the available GMM and the content features extracted in real time. A novel method to determine the video decoding capacity of mobile terminals by using a set of subjective decodability experiments that are performed once for each device is also proposed. Finally, the estimated video quality of a content and the decoding capacity of a device are combined in a utility-complexity framework that optimizes complexity-quality tradeoff to determine video coding parameters that result in highest video quality without exceeding the hardware capabilities of a client device. The simulation results indicate that this approach is capable of predicting the user viewing satisfaction on a mobile device. Keywords: Video Adaptation, Decoding Complexity, Video Content Characteristics, Quality of Experience. v ÖZ MOBİL CİHAZLARDA GÖRSEL İÇERİK İÇİN TECRÜBE NİTELİĞİ ENİYİLEMESİ AMAÇLI KARMAŞIKLIK VE FAYDA TEMELLİ YAKLAŞIM Önür, Özgür Deniz Doktora, Elektrik ve Elektronik Mühendisliği Bölümü Tez Yöneticisi : Prof. Dr. A.Aydın Alatan Şubat 2012, 121 sayfa Mobil cihazlarda video izlenirken en yüksek tecrübe niteliği sağlayacak video kodlama parametrelerinin belirlenmesi için, öznel video kalitesi ve video çözme karmaşıklığı birlikte en iyilenmiştir. Öznel video kalitesi, nesnel bir kriter olan video kalite metriği (VQM) kullanılarak modellenmiş ve bir video nun kalitesinin, kalite değerleri önceden ölçülmüş bir eğitim seti içinden benzer içerik özelliklerine sahip videolar kullanılarak kestirilmesini sağlayan bir yöntem sunulmuştur. İçerik benzerliğinin ölçülmesi için standartlaştırılmış uzamsal ve zamansal index metrikleri kullanılmaktadır. Belirli bir donanım için videonun çözme karmaşıklığını videolardan elde edilen içerik özellikleri kullanarak modelleyen istatistiki bir yöntem sunulmaktadır. Toplam çözme karmaşıklığı, modern video çözücülerinde bulunan ve yoğun işlem gücü gerektiren kod parçalarının karmaşıklıklarının toplamı şeklinde modellenmektedir. İçerik özellikleri ve çözme karmaşıklıkları rassal değişkenler olarak modellenmiş ve aralarındaki bileşik olasılık yoğunluk fonksiyonları Gauss Karışım Modelleri (GMM) kullanılarak elde edilmiştir. GMM ler çok sayıda videodan oluşan bir eğitim seti kullanılarak belirlenmiştir. Yeni bir videonun çözme vi karmaşıklığını ölçmek için önceden hesaplanmış olan GMM ler ve videodan gerçek zamanlı olarak çıkarılan içerik özellikleri kullanılmaktadır. Ayrıca her cihaz için bir kere yapılacak video çözme deneyleri kullanılarak mobil cihazların video çözme kapasitesinin belirlenmesini sağlayan özgün bir yöntem geliştirilmiştir. Son olarak, kompleksite-kalite dengesini en iyileyerek çözme karmaşıklığı kullanılacak cihazın donanım kapasitesini aşmayacak şekilde erişilebilecek maksimum kalitede videoların elde edilmesini sağlayacak kodlama parametrelerinin belirlenmesi için faydakarmaşıklık temelli bir yöntem önerilmektedir. Simulasyon sonuçları bu yaklaşımın kullanıcıların mobil cihazlardan video izlerken elde ettikleri tatmini kestirmek için kullanılabileceğini göstermektedir. Anatar Kelimeler: Video Uyarlama, Video Çözücü Karmaşıklığı, Video İçerik Özellikleri, Tecrübe Kalitesi. vii To my wife, my parents, and my sister viii ACKNOWLEDGEMENTS I would like to express my deepest gratitude to my supervisor Prof. Dr. A. Aydın Alatan for his guidance, advice, criticism, encouragements and insight throughout the research. I am also indepted to the members of my thesis committee, Prof. Dr. Levent Onural, Prof. Dr. Gözde Bozdağı Akar, Prof. Dr. Tolga Çiloğlu and Assist. Prof. Dr. Çağatay Candan for their support and suggestions which improved the quality of the thesis. I would also like to thank my partners at Mobilus Ltd. who patiently waited for me to finish my studies. Last but not the least; I would like to thank my family for loving and supporting me in all my endeavors. ix TABLE OF CONTENTS ABSTRACT... iv ÖZ... vi ACKNOWLEDGEMENTS... ix TABLE OF CONTENTS... x LIST OF TABLES... xiii LIST OF FIGURES... xiv CHAPTERS 1. INTRODUCTION Motivation and Problem Definition Joint Optimization of Complexity and Utility Main Contributions of the Thesis Thesis Outline VIDEO QUALITY Measuring Video Quality: Subjective vs. Objective Methods Subjective Video Quality Subjective Video Quality Testing Methods Objective Video Quality Structural Similarity (SSIM) Index Video Quality Metric (VQM) Modeling Video Quality Video Content Characteristics and Video Quality Predicting Objective Video Quality Using Training Data Summary and Discussions VIDEO DECODING COMPLEXITY Decoding Complexity in Hybrid Video Decoders Relating Decoding Complexity and Content Features Complexity Modeling with GMM x 3.3 Complexity Prediction Tests Summary and Discussions UTILITY-COMPLEXITY FRAMEWORK Rate Distortion Optimization Complexity-Distortion Theory Proposed Complexity Constrained Utility Optimization Video Quality Decoding Complexity Video Clip Decodability Modeling Subjective Quality System Architecture Quality Complexity Joint Optimization Predicting Decodability Subjective Tests for Measuring Decodability Statistical Analysis of Subjective Test Results for Decodability Predicting Decodability Utilizing Decoding Complexity Statistics Subjective Quality Prediction Subjective Video Quality Evaluation Tests Predicting Subjective Quality Utilizing Decodability and Complexity Determining Optimal Adaptation Operation Summary and Discussions SUMMARY, CONCLUSIONS AND FUTURE DIRECTIONS Summary Conclusions Future Directions APPENDICES A. THE H.264 STANDARD B. OPERATIONAL RATE-DISTORTION FUNCTION B.1 R-D for Standards Based Video Coding B.2 Lagrangian Optimization B.3 Lagrangian R-D Optimization for Encoding Decisions xi REFERENCES VITA xii LIST OF TABLES Table 1 : Manually Changing Frame Rate - Frame Orderings For Different Frame Rates Table 2 : VQM Predition Errors Table 3 : Prediction Errors for 17 Sequence Training Set Table 4 : Correlation Coefficient Between VQM and PSNR Values of the Training Data Table 5 : Correlation Coefficients Between Decoding Complexities And Content Features Table 6 : Prediction Error for Inverse Transform complexity for varying number of GMM Components Table 7 : Average percentage of motion compensation complexity prediction error for varying number of GMM components Table 8 : Average entropy decoding complexity prediction error for varying number of GMM components Table 9 : Average deblocking complexity prediction error in different content classes for varying number of GMM components Table 10 : Decodability Prediction Error Table 11 : a,b,c Values for each Sequence with Sequence Removed from Training Set Table 12 : Subjective Quality Prediction Error - VQM Table 13 : Subjective Quality Prediction Error - PSNR Table 14 : Optimal Coding Parameters Using SubjectiveTests vs Proposed Algorithm xiii LIST OF FIGURES Figure 1: FD-CD adaptation [29] Figure 2 : Thesis Organization Figure 3 : Performance vs Complexity of objective Video Quality Assesment algorithms [5] Figure 4 : Subjective Quality vs VQM [5] Figure 5 : SI and TI values for sequences Akiyo, Bus, Coast, Flower, Foreman, Mobile, Mother, Soccer and Waterfall Figure 6 : SI andti values for the Extended Training Set Figure 7 : GMM Components vs BIC for Inverse Transform Complexity Figure 8 : GMM Components vs NLogL for Inverse Transform Complexity Figure 9 : Proposed System for Determining Optimal Adaptation Operation Figure 10 : Decodability Scores vs Total Decoding Complexity for Nokia N Figure 11 : Histogram of Decodability Scores for 5 bins Figure 12 : Decodability vs Complexity Kernel Estimate Figure 13 : Histogram of MOS Subjective Video Quality Figure 14 : Total Complexity vs Predicted Subjective Video Quality Figure 15 : Total Complexity vs Predicted Quality for Low End Device Figure 16 : NAL Access Unit [54] Figure 17 : Subdivision of a picture into slices without using FMO [54] Figure 18 : Subdivison of a frame into slices with FMO [54] Figure 19 : The operational R-D curve [49] Figure 20 : For each coding unit, to minimize for a given λ is equivalent to finding the point in the R-D characteristic that is hit first by a plane wave of slope λ Figure 21 : PSNR vs Bit Rate spent on motion vectors for the 3 different macroblock size modes [50] xiv CHAPTER I INTRODUCTION The processing capabilities of mobile terminals, such as Personal Digital Assistants (PDA), tablet computers and cellular phones, have increased at an unprecedented rate during the previous decade. Accompanied by the much anticipated spread of broad band wireless access, this has brought about a wealth of new possibilities for novel consumer services. Among the most exciting killer applications of this era is the pervasive access to rich multimedia content on mobile terminals. 1.1 Motivation and Problem Definition Delivering multimedia content to terminals with diverse processing capabilities through heterogeneous networks is challenging. The problem is intensified by the fact that the end users have unique preferences and the representation of a content that is desirable for a user might be unsatisfactory for another. It is apparent that a particular representation of content would be satisfactory for a very limited number of use cases. Consequently, it is mandatory to be able to adapt the multimedia content depending on the requirements of the consumption scenario. The factors that need to be considered while determining the best representation of the content include network characteristics (maximum bandwidth or bit error rate of transmission channel), terminal characteristics (Central Processing Unit (CPU) capacity, available video codecs, color capability, display resolution), natural environment (ambient noise, illumination conditions), video content characteristics (amount of motion, amount of spatial detail) and user preferences. In [1] the research challenges outlined above are described in detail. 1 In most modern video distribution systems, high quality versions of video clips are stored on a media server. When a video clip is requested by a particular client, the video bit stream is modified so that it is suitable for the current consumption scenario (network conditions, client capacity etc.). Generally, the resource requirements of the high quality clip need to be decreased by using video adaptation algorithms in order to make sure that the content can be successfully delivered to the client. The process of modifying a given representation of a video into another representation, in order to change the amount of resources required for transmitting, decoding and displaying video is defined as video adaptation [2][3]. One of the most important factors that determine the success of a video adaptation system is its ability to retain an acceptable amount of video quality, while reducing the resource requirements. Video quality can be measured by using a plethora of tools and methods [4][5][6]. Since the ultimate consumers of video content are humans, the ultimate judge of video quality is the human subjective opinion. However, it is always difficult to find human subjects to participate in video quality tests, adhering to strict testing standards is tedious and is rarely done, and the results usually cannot be generalized for different terminals and testing environments. In practice, objective measures are commonly utilized for video quality measurement due to these difficulties involved in subjective testing methods. The validity of the objective methods is directly related to their correlation with human opinion. It is well established that conventional objective metrics fail to measure the human satisfaction accurately. However, recently developed objective metrics, such as Structural Similarity Index Metric (SSIM) [4] or Video Quality Metric (VQM) [5], have shown significant correlation with subjective data [6]. In this dissertation, VQM metric is used for modeling subjective video quality. The justification for using VQM is presented in Chapter 2. 2 Regardless of the method utilized to measure video quality, the subjective user satisfaction pertaining to video content is named as the utility of the video. The first reference to utility in the context of video adaptation appears in [7]. In a more theoretical approach, a conceptual framework that models adaptation, as well as resource, utility and the relationships in between, are also presented [8]. A contentbased utility function predictor, in which the system extracts compressed domain features in real time and uses content-based pattern classification and regression to obtain a prediction to the utility function, is first proposed in [9]. In [10], a novel method to determine an optimal video adaptation scheme, given the properties of an end-terminal, on which the video is to be displayed, is presented. In this approach, Utility Theory [11] is utilized to model a strictly subjective quantity, satisfaction, a user will get from watching a certain video clip. In [12], the multidimensional adaptation problem is considered. The utility of video clips is determined using subjective video evaluation experiments and the results are tested using a scalable video codec (MC-3DSBC [13]). However, the processing capabilities of user terminals are not taken into consideration and this limits the usefulness of the results. In addition, most of the evaluated video content is evaluated by only five assessors, and thus, the results cannot be used to make statistical generalizations. In [14] a system that aims to deliver multi-view video over peer-to-peer (P2P) networks is presented. The scalable video coding (SVC) extension of the H.264 standard is utilized. Each view is coded with two signal to noise ratio (SNR) scalability layers i.e. a base layer and an enhancement layer. A video adaptation decision engine is capable of adapting the bit-stream depending on the amount network resources available. If the network resources are not sufficient, the adaptation engine adjusts stream bandwidth either by selectively discarding the enhancement layer of some views, or, if the resources are even more scarce, by completely discarding some of the views. However, this approach only takes into account network characteristics and ignores device capabilities while adapting 3 content. In addition, the use of SVC limits its applicability since mobile devices are not able to decode SVC content and an extra step of converting SVC to baseline H.264 is required. In [15], an end to end video adaptation architecture that enables on-the-fly content adaptation and enriched Perceived Quality of Service (PQoS) by dynamically combining different content layers, views and representations of the same video stream transmitted from multiple sources (different servers or peers for P2P) and received from multiple diverse paths and networks is presented. The MPEG-21 framework is utilized for cross-layer metadata exchange, while a Session Description Protocol (SDP) is preferred for low end terminals. The framework performs video adaptation based on network characteristics, the terminal requirements and the user preferences. However, this approach does not.take into account the terminals processing capabilities, it only considers requirements like available codecs, screen resolution etc., thus it is not capable of adapting video resource requirements according to device decoding capacity. In addition to video quality, another important factor that determines the success of a video adaptation system is the resource requirements of the adapted video. After all, the need for adaptation arises, when the original video is too complex; i.e., it requires more resources to transmit and decode the content than what is available. If the resource requirements of the adapted video still exceed the resources available in a particular usage environment, the performed adaptation becomes useless regardless of the resulting video quality. The most resource critical component of a video delivery system is the transmission channel. With the growth of ubiquitous access to multimedia, wireless networks have become the main medium for video transmission. Wireless channels are highly error prone and their characteristics change rapidly making capacity prediction a challenging task.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks