Computer Vision, Speech Communication &

Signal Processing Group

NTUA | ECE
Faculty | PhD Students | Collaborators
Journal | Book Chapters | Conference
Undergraduate | Graduate | Diploma Theses

Research

Recent Research Highlights
Audiovisual Speech Recognition

Audiovisual speech recognition refers to the problem of recognizing speech by lipreading. We have developed highly adaptive multimodal fusion rules based on uncertainty compensation which are compatible with synchronous and asynchronous multimodal interaction architectures. Further, our work on AAM-based face representations leads to highly informative visual speech features which can be extracted in real-time.
[Research page]

Audiovisual Speech Inversion

We focus on recovering aspects of vocal tract’s geometry and dynamics from speech, a problem referred to as speech inversion. To alleviate the ill-posedness of the audio-only inversion process, we propose an inversion scheme which also exploits visual information from the speaker’s face.
[Research page]

Multigrid Geometric Active Contours

We investigate multigrid techniques for the solution of the time-dependent PDEs of geometric active contour models in Computer Vision. The method allows interactive solution of models whose numerical implementation with conventional techniques has been prohibitively slow.
[Research page]

Image Texture Analysis

We pursue a concise texture modeling, analysis and segmentation system for generic natural images. Our research directions are in the areas of texture feature extraction through multicomponent AM-FM models, feature interpretation through generative models and probabilistic discrimination and texture segmentation using an unsupervised variational scheme.
[Research page]

Movie Summarization

Detection of perceptually important video events is formulated on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. The various modality curves are integrated in a single attention curve. This multimodal saliency curve (MSC) is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming.
[Research page]

Digital Restoration of Missing Parts in the Theran Wall Paintings

We have been working on PDE and wavelet-based techniques for the digital restoration of missing parts in paintings. This is part of an ongoing project on the virtual restoration of the 3,600 years old wall paintings excavated in the pre-historic Aegean settlement in Akrotiri, Thera, Greece.
[Research page] [Project page...]

Image Saliency through Spatial Surprise

Using an information-theoretic approach to study bottom-up spatial saliency, we show how Bayesian surprise can be interpreted to explain spatial saliency. Applications include attention modeling and fixation-prediction, image region detector and image quality assesement.
[Research page]

Sign Language Recognition

Sign Languages (SL) and Gestures manifest themselves via the visual modality in the 4D space-time. We have developed a visual processing framework for hands and head tracking and feature extraction from SL/gesture videos. In addition an unsupervised HMM-based framework was developed based on statistical subunits i.e. intra-sign primitives, incorporating prior phonetic level linguistic knowledge for automatic sign/gesture recognition.
[Research page]

Research Areas

Multiscale image analysis, enhancement, feature extraction and object detection with algebraic, geometric and statistical methods. Analysis and modeling of shape, texture, color, and motion. Visual Attention. Multiple-cue Segmentation. Reconstruction of 3D structure. Visual object recognition. Cognitive vision. Advanced methodologies: nonlinear scale-spaces, pyramids and wavelets, geometric partial differential equations, active contours, curve/surface evolution via level sets methods, variational calculus, graph-based models, multiple-view geometry, random fields and statistical inference. Applications in artificial intelligence, biomedicine, human computer/robot interaction, recognition of sign language, action and gestures, video technology, digital arts and cultural heritage, robotics, environment, geosciences, satellite remote sensing.

Modeling of speech production and hearing systems. Digital processing of speech and other sounds, e.g. music, feature extraction and acoustic event detection, filterbanks. Nonlinear and Multiresolution methodologies: modulations, fractals, chaos, wavelets. Automatic speech recognition and synthesis. Music analysis and recognition. Language models and NLP. Analysis of 3D acoustic scenes and auditory attention. Applications in speech and audio technology, biomedicine, communications, human computer/robot interaction (HCI/HRI), digital music, multimedia.

Multimodal information processing at various levels of integration of multiple sensory streams. Applications to problems of analysis, enhancement, detection, estimation, segmentation, recognition and synthesis, and more generally to cognition-related technologies (e.g. HCI, HRI, multimodal video analysis, multimedia internet) that deal with multimodal (audio, video, text, graphics, tactile) data. Cross-modal Integration and Fusion for performance improving in multimedia. Multimodal saliency and video summarization. Cognitive systems modeling.

Mathematical Morphology, Fractals, Chaos, Dynamical Systems theory, Discrete Events control systems. Applications in signals and systems analysis, automation, biomedicine, environment, communications, informatics, and econometrics.

 
Research Projects
  • "MOBOT: Intelligent Active MObility Assistance RoBOT integrating Multimodal Sensory Processing, Proactive Autonomy and Adaptive Interaction"
    Sponsor: European Union FP7: Specific Targeted Research-Innovation Project (STREP),
    2013 – 2016, ICCS – NTUA. ICCS team: Scientific Director and WP leader: P. Maragos; Co-PIs: P. Maragos and K. Tzafestas.

  • "COGNIMUSE: Multimodal Signal and Event Processing In Perception and Cognition"
    Sponsor: Greek Ministry of Education and Ministry of Development (Basic Research Program “ARISTEIA”),
    2012-2015, NTUA. Scientific Director and PI: P. Maragos.

  • "TIMELY: Time In MEntaL activitY: theoretical, behavioral, bioimaging and clinical perspectives"
    Sponsor: ESF-Cost Action (European Network).
    2010-present, NTUA. PI of NTUA’s team: P. Maragos.

  • "Music Signal Processing and Applications in Recognition"
    Sponsor: Greek Ministry of Education (Basic Research Program "HRAKLEITOS II"),
    2010-2013, NTUA. Scientific Director and PI: P. Maragos.

  • "Dicta-Sign: Sign Language Recognition, Generation and Modeling with Application in Deaf Communication"
    Sponsor: European Union: FP7, Specific Targeted Research-Innovation Project,
    2009 – 2012, NTUA. Scientific Director and PI of NTUA’s team: P. Maragos; also WP Leader.

  • "ASPI: Audiovisual to Articulatory Speech Inversion"
    Sponsor: European Union: Future and Emergent Technologies, 6th Framework Programme,
    2005 – 2008, ICCS – NTUA. Scientific Director and PI of ICCS-NTUA’s team: P. Maragos; also WP Leader.

  • "DIANOEMA: Visual Gesture Analysis and Recognition for Sign Language Modeling and Application to Robot Teleoperation"
    Sponsor: Greek General Secretariat for Research and Technology,
    2000-2001, ICCS - NTUA. Project PI and Coordinator, and Scientific Director of ICCS-NTUA’s team: P. Maragos.

  • "GridNews: Distributed GRID Platform for Advanced Content Management and Retrieval in Audiovisual Broadcast News"
    Sponsor: Greek General Secretariat for Research and Technology,
    2006-2008, ICCS-NTUA. Co-PI: P. Maragos.

  • "MUSCLE: Multimedia Understanding, Semantics, Computation and Learning"
    Sponsor: European Union: Network of Excellence, 6th Framework Programme,
    2004 – 2008, ICCS – NTUA. Scientific Director and PI of ICCS-NTUA’s team: P. Maragos; also WP Leader.

  • "HIWIRE: Human Input That Works In Real Environments"
    Sponsor: European Union: Specific Targeted Research or Innovation Project, 6th Framework Programme,
    2004 – 2007, ICCS – NTUA. Scientific Director and PI of ICCS-NTUA’s team: P. Maragos.

  • "PENED: Computer Vision and Virtual Reality for Three-Dimensional Reconstruction of Archeological Findings with Applications to Wall Painintings and Buildings of the Prehistoric Settlement of Akrotiri in Santorini" ,
    Sponsor: Greek Secretariat for Research & Technology ( PENED-2003),
    2005-2009, NTUA. Project PI and Coordinator, and Scientific Director of NTUA’s team: P. Maragos.

  • "Aero-Dynamic Analysis of the Vocal Tract with Application to Speech Modeling, Synthesis and Recognition" ,
    Sponsor: Greek Secretariat for Research & Technology ( PENED-2003),
    2005-2009, NTUA and Technical University of Crete. Co-PI and Scientific Director of NTUA’s team: P. Maragos.

  • "Statistical Signal and Image Processing Methods for Optimum Denoising and Detection with Applications to Satellite Data" ,
    Sponsor: Greek Secretariat for Research & Technology ( PENED-2003),
    2005-2009, NTUA and National Observatory of Athens. Co-PI and Scientific Director of NTUA’s team: P. Maragos.

  • "Nonlinear Systems for Analysis of Microstuctures in Speech and Image Signals"
    Sponsor: NTUA Basic Research Program "PROTAGORAS",
    2004-2007, NTUA. Scientific Director and PI: P. Maragos.

  • "Advanced Optical Imaging and Computer Vision Methods With Applications to Cancer Diagnosis"
    Sponsor: Greek Ministry of Education (Basic Research Program "PYTHAGORAS"),
    2004-2007, NTUA. Co-PI: P. Maragos.

  • "Synergy Between Image Segmentation and Object Recognition Using Geometrical and Statistical Computer Vision Techniques"
    Sponsor: Greek Ministry of Education (Basic Research Program "HRAKLEITOS"),
    2003-2006, NTUA. Scientific Director and PI: P. Maragos.

  • "Integrated System for Control of the Ecologic Quality of Soils"
    Sponsor: Greek Secretariat for Research & Technology (ΠΕΝΕΔ - 2001),
    2002-2006, ICCS-NTUA and Aristotle Univ. of Thessaloniki. Co-PI and Scientific Director of ICCS-NTUA’s team: P. Maragos.

  • "COST 277: Nonlinear Speech Processing"
    Sponsor: European Union,
    2002-2005, NTUA. PI of NTUA’s team: P. Maragos.

  • "Development of Novel Methods of Digital Processing of Nonstationary Signals for Improving the Accuracy and Resolution of Ultrasound Doppler Sepctroscopy in Measuring Blood Flow"
    Sponsor: ICCS (Basic Research Program ARCHIMEDES),
    2000-2001, ICCS-NTUA. PI: P. Maragos.

  • "Aerodynamic Analysis of Human Vocal Tract with Navier-Stokes Equations and Relations to Speech Production"
    Sponsor: ICCS-Inst. of Computer and Communication Systems (Basic Research Program ARCHIMEDES),
    2000-2001, ICCS - NTUA. Co-PI: P. Maragos.

  • "Evaluation of the Biological Quality of Soils with New Computer Vision Techniques"
    Sponsor: Greek Secretariat for Research & Technology (ΠΕΝΕΔ - 1999),
    2000-2001, Institute of Communication & Computer Systems (ICCS) - NTUA. Project PI and Coordinator, and Scientific Director of ICCS-NTUA’s team: P. Maragos.

  • "AKODIFON: Improvement of Phonemic Recognition by Developing and Using Modulation and Fractal Acoustic Features"
    Sponsor: Greek Secretariat for Research & Technology (ΕΠΕΤ ΙΙ – 98),
    1999-2001, NTUA. Scientific Director and PI: P. Maragos.

Last modified: Tuesday, 23 April 2013 | Created by Nassos Katsamanis and George Papandreou