Video transition identification based on 2D image analysis. Silvio Jamil Ferzoli Guimarães - PDF

Description
Video transition identification based on 2D image analysis Silvio Jamil Ferzoli Guimarães 14 March 2003 Video transition identification based on 2D image analysis This text corresponds to the Thesis which

Please download to get full document.

View again

of 132
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Jobs & Career

Publish on:

Views: 43 | Pages: 132

Extension: PDF | Download: 0

Share
Transcript
Video transition identification based on 2D image analysis Silvio Jamil Ferzoli Guimarães 14 March 2003 Video transition identification based on 2D image analysis This text corresponds to the Thesis which is presented by Silvio Jamil Ferzoli Guimarães and appreciates by Jury. Belo Horizonte-Brasil, 14 March Prof. Dr. Arnaldo de Albuquerque Araújo (Advisor) Thesis presented to the Departamento de Ciência da Computação, ufmg, as a partial requeriment to obtain the title of PhD in Computer Science. Departamento de Ciência da Computação Universidade Federal de Minas Gerais Video transition identification based on 2D image analysis Silvio Jamil Ferzoli Guimarães 14 March 2003 Jury: Prof. Dr. Arnaldo de Albuquerque Araújo (Advisor) DCC-UFMG Prof. Dr. Michel Couprie (Co-advisor) ESIEE-FRANCE Prof. Dr. Neucimar Jerônimo Leite (Co-advisor) IC-UNICAMP Prof. Dr. Roberto de Alencar Lotufo FEEC-UNICAMP Prof. Dr. Jacques Facon PUC-PR Prof. Dr. Mário Fernando Montenegro Campos DCC-UFMG Prof Dr. Rodrigo Lima Carceroni DCC-UFMG Departamento de Ciência da Computação Universidade Federal de Minas Gerais Identificação de transições em vídeo baseada na análise de imagens 2D Silvio Jamil Ferzoli Guimarães 14 de Março de 2003 Banca: Prof. Dr. Arnaldo de Albuquerque Araújo (Orientador) DCC-UFMG Prof. Dr. Michel Couprie (Co-orientador) ESIEE-France Prof. Dr. Neucimar Jerônimo Leite (Co-orientador) IC-UNICAMP Prof. Dr. Roberto de Alencar Lotufo FEEC-UNICAMP Prof. Dr. Jacques Facon PUC-PR Prof. Dr. Mário Fernando Campos Montenegro DCC-UFMG Prof. Dr. Rodrigo Lima Carceroni DCC-UFMG Université de Marne-la-Vallée Identification de transitions dans des séquences d images vidéo basée sur l analyse d images 2D Silvio Jamil Ferzoli Guimarães 14 Mars 2003 Jury: Prof. Dr. Neucimar Jerônimo Leite (Président) IC-UNICAMP-Brésil Prof. Dr. Arnaldo de Albuquerque Araújo (Directeur) DCC-UFMG-Brésil Prof. Dr. Michel Couprie (Co-directeur) ESIEE-France Prof. Dr. Roberto de Alencar Lotufo (Rapporteur) FEEC-UNICAMP-Brésil Prof. Dr. Sylvie Philipp-Foliguet (Rapporteur) ENSEA-France c Silvio Jamil Ferzoli Guimarães, All right reserved. v Aos meus pais, Luiz e Saada, e à minha amada esposa, Tatiane. vi Agradecimentos (in portuguese and in french) À Deus, por ter me dado saúde e coragem para enfrentar este desafio. Aos meus pais, Luiz e Saada, pelas imensas horas de conforto e sabedoria a mim dispensados dando-me força para continuidade. Ao Prof. Arnaldo e ao Prof. Neucimar, pela ajuda, companheirismo e pelas brilhantes idéias que foram determinantes para o término deste trabalho. Não podendo esquecer também da ajuda referente as infinitas correções de artigos. Je remercie à Michel Couprie pour tant m aider pendant mon sejour en France, avec beaucoup de patience et compréhension car je ne suis pas expert de la langue française. Je le remercie aussi pour ses idées et pour les discussions toujours intéressantes. Ao grupo de trabalho do NPDI do DCC-UFMG, em especial ao Paulo e ao Camillo. E também, ao grupo de trabalho do LA 2 SI-ESIEE, em especial à Yukiko, ao Nivando, ao Christophe e ao Marco pelas diversas discussões que só levaram ao crescimento pessoal e profissional. Aos meus amigos e colegas do DCC, em especial ao pessoal da minha turma, Lucila, Mark, Fátima, Maria de Lourdes e, de novo, Paulo. Aos meus pais, ao CNPq e à CAPES pelo imprescindível apoio financeiro durante toda a minha jornada. vii Eu não me envergonho de corrigir e mudar minhas opiniões, porque não me envergonho de raciocinar e aprender. (Alexandre Herculano) viii Abstract The video segmentation problem consists in the identification of the boundary between consecutive shots in a video sequence. The common approach to solve this problem is based on the computation of dissimilarity measures between frames. In this work, the video segmentation problem is transformed into a problem of pattern detection, where each video event is represented by a different pattern on a 2D spatio-temporal image, called visual rhythm. To cope with this problem, we consider basically morphological and topological tools that we use in order to identify the specific patterns that are related to video events such as cuts, fades, dissolves and flashes. To compare different methods we define two new measures, the robustness and the gamma measures. In general, the proposed methods present the quality measures better than the other methods used to comparison. ix Résumé Le problème de la segmentation de séquences d images vidéos est principalement associé au changement de plan. L approche courante pour résoudre ce problème est basée sur le calcul de mesures de dissimilarités entre images. Dans ce travail, le problème de la segmentation de séquences vidéo est transformé en un problème de détection de motifs, où chaque évènement dans la vidéo est representé par un motif différent sur une image 2D, apellée rythme visuel. Cette image est obtenue par une transformation spécifique de la vidéo. Pour traiter ce problème, nous allons considérer principalement des outils morphologiques et topologiques. Nous montrons commment identifier grâce à ces outils, les motifs spécifiques qui sont associés à coupures, fondus et fondus enchaînés, ainsi qu aux flashs. Dans l ensemble, les méthodes proposées dans cette thèse obtiennent des indices de qualité meilleurs que les autres méthodes auxquelles nous les avons comparées. x Resumo O problema de segmentação em vídeo consiste na identificação dos limites entre as tomadas em um video. A abordagem clássica para resolver este problema é baseada no cálculo de medidas de dissimilaridade entre quadros. Neste trabalho, o problema de segmentação em vídeo é transformado em um problema de detecção de padrões, onde cada evento de vídeo é transformado em diferentes padrões em um imagem espaço-temporal 2D, chamada ritmo visual. Para tratar este problema, nós consideramos basicamente ferramentas morfológicas e topológicas com o objetivo de identificar os padrões específicos que são relacionados à eventos do vídeo, como cortes, fades, dissolves e flash. Para comparar os diferentes métodos, nós definimos duas novas medidas de dissimilaridade, a robusteza e a medida gama, que relacionam as medidas básicas de qualidade com um família de limiares. Os resultados obtidos a partir dos métodos propostos, definidos em termos de medidas de qualidade, são melhores que os resultados dos outros métodos usados como critério de comparação. xi Contents Agradecimentos (in portuguese and in french) Abstract Résumé Resumo List of Definitions List of Figures vii ix x xi xvi xviii 1 Introduction Our contribution Organization of the text I Theoretical background 6 2 Video model Basic definitions Types of transitions Cut Fade Dissolve Wipe Transition classification xii 2.3 Camera work Conclusions Video analysis Video segmentation Approaches based on dissimilarity measures Image-based Camera work analysis Quality measures for video analysis Quantitative analysis Threshold sensitivity Conclusions Video transformation from 2D + t to 1D + t Visual rhythm by sub-sampling Pattern analysis Visual rhythm by histogram Pattern analysis Conclusions II Video transition identification by 2D analysis 37 5 Introduction for video transition identification Description of the corpora Image analysis operators Mathematical morphology Basic operators Morphological residues Multi-scale gradient Thinning Max-tree Conclusions xiii 7 Cut detection Introduction Our method Experiments Conclusions and discussions Flash detection Introduction Based on top-hat filtering Based on max tree filtering Experiments Conclusions and discussions Gradual transition detection Introduction Method based on multi-scale gradient Transition detection Experiments Analysis of results and parameters Sharpening by flat zone enlargement Transition detection Experimental analysis Analysis of the results Conclusions and discussions Specific fade detection Introduction Video transformation Analysis based on discrete line identification Experiments Conclusions and discussions Conclusions and future work 110 xiv Bibliography 113 xv List of Definitions 2.1 Frame Video Shot [4] Scene [4] Key-frame Transition Cut (sharp transition) Fade-out Fade-in Dissolve Horizontal wipe Vertical wipe Recall, error and precision rates Robustness Missless error Falseless recall Gamma measure Visual rhythm [17] or spatio-temporal slice [57] Visual rhythm by histogram (VRH) Morphological gradient [68, 63] White top-hat [68, 63] Inf top-hat [68, 63] Ultimate erosion [68, 63] Morphological residues [54] xvi 6.6 Residue mapping [47] Soille s morphological gradient [68] Gradient based on ultimate erosion Gradient based on thinning Flat zone, k-flat zone and k + -flat zone Transition Constructible transition point Destructible transition point Histogram width xvii List of Figures 1.1 Video transformation: (a) simplification of the video content by transformation of each frame into a column on VR; (b) a real VR, obtained by the principal diagonal sub-sampling (Chapter 4) General framework for our approach to the video segmentation problem Video hierarchical representation [61] Cut example Example of fade-out Example of dissolve Example of a horizontal wipe (right to left) Example of a vertical wipe (down to up) Camera basic operations Example of zoom-in Two different approaches for video segmentation Video transformation: (a) simplification of the video content by transformation of each frame into a column on VR; (b) a real example of the principal diagonal sub-sampling Robustness (µ) measure Example of pixel samplings: D1 is the principal diagonal, D2 is the secondary diagonal, V is the central vertical line and H is the central horizontal line Examples of visual rhythm by principal diagonal sub-sampling: (a) video 0132.mpg ; (b) video 0599.mpg and (c) video 0117.mpg xviii 4.3 Visual rhythm obtained by a real video using different pixel sub-samplings: principal diagonal (top) and central vertical line (bottom). The temporal positions of the cuts are indicated in the middle image Block diagram for video segmentation using visual rhythm by sub-sampling Examples of sharp transitions in the visual rhythm by sub-sampling (a-c) vertical sharp transitions, and (d) inclined sharp transition Examples of gradual transitions present in the visual rhythm Examples of light and thin vertical regions present in the visual rhythm Example of deformed regions present in the visual rhythm: (a) shifted region (pan); (b) expanded region (zoom-in); and (c) funneled region (zoomout) Real examples of visual rhythm by histogram: (a) video 0094.mpg ; (b) video 0136.mpg ; (c) video 0132.mpg and (d) video 0131.mpg Block diagram for video segmentation using visual rhythm by histogram Real examples of sharp transitions present in the visual rhythm by histogram Examples of orthogonal discontinuities present in visual rhythm by histogram. Both contain flashes Examples of deformed regions present in the visual rhythm by histogram: (a) corresponds to a fade and (b) to a dissolve Block diagram for event detection from (a) visual rhythm by sub-sampling and (b) by histogram in which we are interested Soille s gradient Morphological multi-scale gradient: (a) original 1D image; (b,d,f) correspond to the gradient values at a specific level n = 4 and (c,e,g) correspond to the supremum of the gradient values at different levels (n = [1, 5]) Examples of multi-scale gradients where the SE size is in range [1, 7]. These results correspond to the supremum of gradient values of all levels D image thinning example. Dotted line, in (b), represents the original image (a) xix 6.5 Process of max-tree creation [62]: (a) original image; (b) first step of the process considering the levels 0 and 1; (c) second step considering the levels 0, 1 and 2; and (d) the final tree Cut example Cut detection block diagram Visual rhythm by principal diagonal sub-sampling computed from the video 132.mpg Visual rhythm filtered in which the small components are eliminated: (a) the original visual rhythm and (b) the result of the morphological filtering (reconstructive opening followed by a reconstructive closing). The radius size of the horizontal structuring element is Horizontal gradient image. To facilitate the visualization, we apply an operator to equalize the image histogram Thinning operation: (a) the equalized horizontal gradient image and (b) the result of the thinning. To facilitate the visualization, we apply an operator to equalize the histogram Detection of maximum points: (a) the equalized thinning image and (b) the maximum points Filtering of the maximum points Cut detection from a visual rhythm by sub-sampling: (a) visual rhythm; (b) thinning of the horizontal gradient (equalized); (c) maximum points; (d) maxima filtering; (e) normalized number of maximum points in the range [0, 255]; (f) detected cuts (white bars) superimposed on the visual rhythm Cut detection from a visual rhythm by histogram: (a) visual rhythm; (b) thinning of the horizontal gradient; (c) maximum points; (d) maxima filtering; (e) normalized number of maximum points in the range [0, 255]; (f) detected cuts (white bars) superimposed on the visual rhythm Experimental results xx 8.1 Flash model: (a) flash occurrence in the middle of the shot and (b) flash occurrence in the boundary of the shot Flash video detection: (a) some frames of a sequence with the flash presence; (b) visual rhythm by sub-sampling; (c) detected flash Flash detection block diagram using top-hat filtering Visual rhythm by principal diagonal sub-sampling computed for video 0600.mpg White top-hat filtering: (a) result of the white top-hat and (b) result of the histogram equalization of (a) Thinning: (a) result of the thinning and (b) result of the histogram equalization of (a) Detection of maximum points Maxima image filtering Detection of flashes in which the flashes are represented by white vertical column bars: (a) original image with 4 flashes and (b) result of the flash detection Flash detection block diagram using max-tree filtering Visual rhythm by principal diagonal sub-sampling computed from video 0600.mpg Average computation: result of the average of the elements for each column Result of the max-tree filtering Detection of flashes in which the flashes are represented by white vertical column bars: (a) original image with 4 flashes and (b) result of the flash detection Experimental results for flash detection Example of cut and gradual transitions Block diagrams for detection of transitions considering multi-scale gradient Opening operation: (a) thresholded Soille s gradient in which the SE is in the range [1, 4] and (b) result of the opening operation using a vertical SE with size equals Number of non-zero gradient values xxi 9.5 Closing operation using a SE of size White top operation using SE of size Thinning operation Detection of some types of transitions, like cuts, fades and dissolves Example of multi-scale gradient analysis for transition detection, where n = 7 for the Soille s multi-scale gradient and gradient based on ultimate erosion, and n = [1, 7] for the gradient based on thinning: the gradient computation (a,c,e) and the line profile of the projected image (b,d,f) Graphics of the quality measures: (a) recall; (b) error and (c) precision Transition points (constructible and destructible) Example of enlargement of flat zones: an artificial example An example of visual rhythm with some events (a), and the image obtained by the sharpening process (b). In (c) and (d) is illustrated their respective line profile of the center horizontal line of the image Main steps of the proposed gradual transition detection algorithm for video images Example of enlargement of flat zones Gradual transition detection An example of fade in: ((a)-left) illustrates the frames 0, 14, 24 and 29 of a transition with 30 frames and their respective histograms ((a)-right); (b) visual rhythm by histogram of the fade transition; (c) result of the image segmentation (thresholding) of (b) Visual rhythm by histogram Block diagram for the fade detection method Visual rhythm by histogram computed from the video voyage.mpg Image segmentation using thresholding (threshold = 1) Visual rhythm filtered: (a) the thresholded visual rhythm and (b) the result of the morphological filtering (opening followed by a closing). The vertical size of the structuring element is xxii 10.7 Morphological gradient and thinning: (a) result of the morphological gradient and (b) result of the thinning applied to the image illustrated in (a) Example of a curve approximation: (a) edge image and (b) segmentation of the edge in sub-segments Fade detection: (a) result of the fade-in detection is superimposed on the visual rhythm by histogram and (b) result of the fade-out detection is superimposed on the visual rhythm by histogram Fade detection process: (a) visual rhythm by histogram (VRH) computed from the video voyage.mpg ; (b) thresholding; (c) gradient; (d) line filtering result; and (e) result superimposed on the VRH Example of orthogonal discontinuities present in visual rhythm by histogram. Both represent flashes xxiii Chapter 1 Introduction Traditionally, visual information has been stored analogically and indexed manually. Nowadays, due to the improvements on digitalization and compression technologies, database systems are used to store images and videos, together with their meta-data and associated taxonomy. Unfortunately, these systems are still very costly. Meta-data include bibliography information, capture conditions, compression parameters, etc. The taxonomy is a hierarchy of subjective classes (people, nature, news) used to organize image/video in different subjects, such as humor, politic, people, etc. A good selection of meta-data and taxonomy must incorporate special features of the application that, in general, represent the first step to create and use a large image/video database. Obviously, there are many constraints related to the use of these indexes: manual annotation represents a big problem in large databases (time-consuming); the domain of the application and the personal knowledge bias the choice of these indexes, etc. Unfortunately, the existing indexes are always limited in the sense of capturing the salient content of an image ( a picture is worth a thousand words ) [11, 14, 44, 3, 7, 20]. Multimedia content analysis and content-based indexing represent together a promising direction for the above methodology. Many systems that consider image/video queries based on content have been developed [29, 43, 15, 70, 8, 48, 6]. In the last few years, progress in image/video searching tools for large database systems has been steady. To build a query in these tools, one can use sketches, selection of visual features (color, texture, shape and motion), examples and/or temporal/spatial features. Concerning video, the indexing problem becomes much more complex, because it 1 2 involves the identification and understanding of fundamental units, such as, scene and shots. Fundamental units can be semantically and physically sub-divided. To decrease the number of items to index, we may consider the semantic units, the scenes, instead of consider the physical units (shots). However, due to the non-structured format, the size and the general content of a video, this indexing is non trivial [58, 15, 70]. Therefore, automatic methods for video indexing are extremely relevant in modern applications, in which speed and precision of queries are required. An example of the use of this index is video browsing, where it is necessary to segment a video in fundamental units without knowing the nature and type of the video [60]. Another video problem is the detection of specific events, such as the identification of the instan
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks