r58 - 27 Apr 2009 - 14:37:23 - GiorgosEvangelopoulosYou are here: TWiki >  MovieSum Web  > WebHome

Movie Summarization Project

As the amount of video data available (movie, TV programs, clips) in a personal recorder or computer are becoming increasingly large intelligent algorithms for efficiently representing video data and presenting them to the user are becoming important. Video summarization, movie summarization and movie skimming are increasingly popular research areas with immediate applications.

Overview

MovieSum is a project/collaboration for the development of efficient and generic schemes for saliency-based video skimming with application to movie summarization. The research directions of the group include:

  1. Development of features to measure saliency from various information modalities, e.g. aural, visual, textual, movie structure etc.
  2. Development of efficient for multimodal saliency integration to derive a single measure of importance.
  3. Modeling viewer attention by means of multimodal saliency.
  4. Capturing essential multimodal events that attract attention.
  5. Building bottom-up, genre-independent summarization algorithms (for static storyboards and dynamic skimming).
  6. Building annotated movie databases and forming efficient qualitative and quantitative evaluation plans.


moviesum_system.jpg

Details

I. Saliency modelling and feature extraction

Features from various modules are mapped one-dimensional, time-varying unimodal saliency curves, from which statistics of salient segments can be extracted.
  • Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking.
  • Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion.
  • Text saliency is extracted from part-of-speech tagging on the subtitles information available for example with most movie distributions.

II. Mutlimodal integration

The various modality curves are integrated in a single saliency curve, where stream importance is signified and quantified in more than a single domain. This multimodal saliency curve (MSC) is currently formulated through a linear scheme where the weights can be constant, adaptive or stream variance-based. Other schemes include nonlinear inter- and intra- modality fusion schemes. Two kinds of mulsitmodal saliency curves have been tested so far, AudioVisual (AV) and AudioVisualText (AVT) measures.

III. From saliency to attention

Although human perception appears to be automatic and unconscious, complex sensory mechanisms exist that form the preattentive component of understanding and lead to awareness. Considerable research has been carried out into these preattentive mechanisms and computational models have been developed for similar problems in the fields of computer vision and speech analysis. Our focus is to explore aural, visual and possible other streams of information present in video distributions for modeling attention and detecting salient events. The relation of saliency to attention models is still an open research challenge. Based on recent studies on perceptual and computational attention modeling, we formulate measures of attention using features of saliency for the audiovisual or a more generic multimodal stream.

IV. Event detection

Detection of perceptually important video events is formulated on the basis of the saliency models for the audio, visual and textual information conveyed in a video stream. The separate information modules may convey explicit, complementary or mutually exclusive information around the audiovisual or multimodal events that are contained in the captured sequence. Psychophysical and congitive observations are taken into account according to which an event is described by an onset, a transition and offset period. Thus, salient events are detected from the multimodal saliency curve through geometrical features such as local extrema, sharp transitions, level sets, etc. This way the presence of an event may be signified in one or multiple domains.

V. Video summarization

The integrated multimodal saliency curve is the core of a video summarization algorithm, in terms of both static storyboards (key-frame selection) and dynamic dynamic video skims. In effect, given an initial video sequence we are interested in extracting:
  • A set of key-frames representing a static storyboard, i.e. the audiovisually important frames in the video sequence.
  • A shorter video in duration based on user-defined skimming percentages. The shorter video is descriptive and informative, with smooth salient segment transitions.
Summarization is based on the extracted audio, visual and integrated AudioVisual (AV) and AudioVisualText (AVT) saliency curves from the corresponding information streams. The algorithm is based a generic, bottom-up algorithm, based on low-level processing and is independent of the video semantics, syntax, structure or genre.

VI. Movie databases and evaluation

The application at hand is forming an automatic summarizer for movie content. Our efforts include forming saliency-wise movie databases, with annotated content with respect to the saliency of the various information streams and the perceptual saliency of the video sequence as a whole. Subjective evaluations are based on user-ratings of the formed skims, regarding their information content and their aesthetics. So far, evaluations have showed that informative and pleasing video summaries can be obtained using multimodal saliency indicator functions, that refine the results of unimodal saliency-based skimming. Given that no high-level features, e.g., plot, are used by the summarizer, the performance of the algorithm is impressive in terms of summary informativeness.

Ongoing work

Our current research involves extensions regarding
  • sophisticated fusion algorithms, both inside and among the various modalities:
    • learning schemes
    • non-linear feature correlations
    • variance-adaptive stream weights
    • cognitive models
  • incorporation of extra higher-level features to movie transcript information
  • systematic frameworks for thorough evaluations of the summarization algorithm

Sample video skims and on-going evaluations can be found at the demos section. More ongoing work details can be found in WorkLog.

Miscellaneous

Movie Summarization is coordinated by Prof. Petros Maragos of National Technical University of Athens (NTUA) and Prof. Alex Potamianos of Technical University of Crete (TUC).


Notes:

  • You are currently in the MovieSum web. The color code for this web is this background, so you know where you are.
  • If you are not familiar with the TWiki collaboration platform, please visit WelcomeGuest first.

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r58 < r57 < r56 < r55 < r54 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback