Movie Summarization Project
As the amount of video data available (movie, TV programs, clips) in a personal recorder or computer are becoming increasingly large intelligent algorithms for efficiently representing video data and presenting them to the user are becoming important. Video summarization, movie summarization and movie skimming are increasingly popular research areas with immediate applications.
Overview
MovieSum is a project/collaboration for the development of efficient and generic schemes for saliency-based video skimming with application to movie summarization. The research directions of the group include:
- Development of features to measure saliency from various information modalities, e.g. aural, visual, textual, movie structure etc.
- Development of efficient for multimodal saliency integration to derive a single measure of importance.
- Modeling viewer attention by means of multimodal saliency.
- Capturing essential multimodal events that attract attention.
- Building bottom-up, genre-independent summarization algorithms (for static storyboards and dynamic skimming).
- Building annotated movie databases and forming efficient qualitative and quantitative evaluation plans.
Details
I. Saliency modelling and feature extraction
Features from various modules are mapped one-dimensional, time-varying unimodal saliency curves, from which statistics of salient segments can be extracted.
- Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking.
- Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion.
- Text saliency is extracted from part-of-speech tagging on the subtitles information available for example with most movie distributions.
II. Mutlimodal integration
The various modality curves are integrated in a single saliency curve, where stream importance is signified and quantified in more than a single domain.
This multimodal saliency curve (MSC) is currently formulated through a linear scheme where the weights can be constant, adaptive or stream variance-based.
Other schemes include nonlinear inter- and intra- modality fusion schemes. Two kinds of mulsitmodal saliency curves have been tested so far, AudioVisual (AV) and AudioVisualText (AVT) measures.
III. From saliency to attention
Although human perception appears to be automatic and unconscious, complex sensory mechanisms exist that form the preattentive component of understanding and lead to awareness. Considerable research has been carried out into these preattentive mechanisms and computational models have been developed for similar problems in the fields of computer vision and speech analysis. Our focus is to explore aural, visual and possible other streams of information present in video distributions for modeling attention and detecting salient events. The relation of saliency to attention models is still an open research challenge. Based on recent studies on perceptual and computational attention modeling, we formulate measures of attention using features of saliency for the audiovisual or a more generic multimodal stream.
IV. Event detection
Detection of perceptually important video events is formulated on the basis of the saliency models for the audio, visual and textual information conveyed in a video stream. The separate information modules may convey explicit, complementary or mutually exclusive information around the audiovisual or multimodal events that are contained in the captured sequence. Psychophysical and congitive observations are taken into account according to which an event is described by an onset, a transition and offset period. Thus, salient events are detected from the multimodal saliency curve through geometrical features such as local extrema, sharp transitions, level sets, etc. This way the presence of an event may be signified in one or multiple domains.
V. Video summarization
The integrated multimodal saliency curve is the core of a video summarization algorithm, in terms of both static storyboards (key-frame selection) and dynamic
dynamic video skims. In effect, given an initial video sequence we are interested in extracting:
- A set of key-frames representing a static storyboard, i.e. the audiovisually important frames in the video sequence.
- A shorter video in duration based on user-defined skimming percentages. The shorter video is descriptive and informative, with smooth salient segment transitions.
Summarization is based on the extracted audio, visual and integrated AudioVisual (AV) and AudioVisualText (AVT) saliency curves from the corresponding information streams. The algorithm is based a generic, bottom-up algorithm, based on low-level processing and is independent of the video semantics, syntax, structure or genre.
VI. Movie databases and evaluation
The application at hand is forming an automatic summarizer for movie content. Our efforts include forming saliency-wise movie databases, with annotated content with respect to the saliency of the various information streams and the perceptual saliency of the video sequence as a whole. Subjective evaluations are based on user-ratings of the formed skims, regarding their information content and their aesthetics. So far, evaluations have showed that informative and pleasing video summaries can be obtained using multimodal saliency indicator functions, that refine the results of unimodal saliency-based skimming. Given that no high-level features, e.g., plot, are used by the summarizer, the performance of the algorithm is impressive in terms of summary informativeness.
Ongoing work
Our current research involves extensions regarding
- sophisticated fusion algorithms, both inside and among the various modalities:
- learning schemes
- non-linear feature correlations
- variance-adaptive stream weights
- cognitive models
- incorporation of extra higher-level features to movie transcript information
- systematic frameworks for thorough evaluations of the summarization algorithm
Sample video skims and on-going evaluations can be found at the
demos
section. More ongoing work details can be found in
WorkLog.
Miscellaneous
Movie Summarization is coordinated by Prof.
Petros Maragos of National Technical University of Athens (NTUA) and Prof.
Alex Potamianos of Technical University of Crete (TUC).
Notes:
- You are currently in the MovieSum web. The color code for this web is this background, so you know where you are.
- If you are not familiar with the TWiki collaboration platform, please visit WelcomeGuest first.