Computer Vision, Speech Communication & Signal Processing Group

Recent Research Highlights

STAViS: Spatio-Temporal AudioVisual Saliency Network

We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. Our approach employs a single network that combines visual saliency and auditory features and learns to appropriately localize sound sources and to fuse the two saliencies into a final saliency map. Evaluation results across 6 databases indicate that our STAViS model outperforms our visual only variant as well as the other state-of-the-art models. Also, the consistently good performance it achieves for all databases indicates that it is appropriate for estimating saliency "in-the-wild".
[Research page]

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL).
[Research page]

Graph Clustering for Image Segmentation

We propose a graph clustering approach for image segmentation by developing diffusion processes defined on arbitrary graphs. We formulate a solution to the image segmentation problem as the result of infectious wavefronts propagating on an image-driven graph via a contact infection mechanism. Our arbitrary graph clustering scheme allows us to consider both pixel level and node level approaches.
[ Research page]

Microphone array speech processing

We are working on microphone array processing and distant speech recognition, aiming to create hands-free, voice-enabled interfaces for home automation control. The user will be able to control appliances and perform actions without having to move from his/her place, by using their voice. For this purpose, microphone array processing is employed, with microphones placed on walls and ceiling. Our research is focused on acoustic speaker localization, voice activity detection, acoustic event detection, speech enhancement/beamforming, activation keyword spotting and distant speech recognition. We have also collected a distant speech database in Greek that is publicly available, called Athena database.
[Research page]

Cognimuse

Motivated by the grand challenge to endow computers with human-like abilities for multimodal sensory information processing, perception and cognitive attention, COGNIMUSE will undertake fundamental research in modeling multisensory and sensory-semantic integration via a synergy between system theory, computational algorithms and human cognition. It focuses on integrating three modalities (audio, vision and text) toward detecting salient perceptual events and combining them with semantics to build higher-level stable events through controlled attention mechanisms.
[Official Research page]

Sign Language Recognition

Sign Languages (SL) and Gestures manifest themselves via the visual modality in the 4D space-time. We have developed a visual processing framework for hands and head tracking and feature extraction from SL/gesture videos. In addition an unsupervised HMM-based framework was developed based on statistical subunits i.e. intra-sign primitives, incorporating prior phonetic level linguistic knowledge for automatic sign/gesture recognition.

Audiovisual Speech Recognition

Audiovisual speech recognition refers to the problem of recognizing speech by lipreading. We have developed highly adaptive multimodal fusion rules based on uncertainty compensation which are compatible with synchronous and asynchronous multimodal interaction architectures. Further, our work on AAM-based face representations leads to highly informative visual speech features which can be extracted in real-time.
[Research page]

Audiovisual Speech Inversion

We focus on recovering aspects of vocal tract’s geometry and dynamics from speech, a problem referred to as speech inversion. To alleviate the ill-posedness of the audio-only inversion process, we propose an inversion scheme which also exploits visual information from the speaker’s face.
[Research page]

Multigrid Geometric Active Contours

We investigate multigrid techniques for the solution of the time-dependent PDEs of geometric active contour models in Computer Vision. The method allows interactive solution of models whose numerical implementation with conventional techniques has been prohibitively slow.
[Research page]

Image Texture Analysis

We pursue a concise texture modeling, analysis and segmentation system for generic natural images. Our research directions are in the areas of texture feature extraction through multicomponent AM-FM models, feature interpretation through generative models and probabilistic discrimination and texture segmentation using an unsupervised variational scheme.
[Research page]

Movie Summarization

Detection of perceptually important video events is formulated on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. The various modality curves are integrated in a single attention curve. This multimodal saliency curve (MSC) is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming.

Digital Restoration of Missing Parts in the Theran Wall Paintings

We have been working on PDE and wavelet-based techniques for the digital restoration of missing parts in paintings. This is part of an ongoing project on the virtual restoration of the 3,600 years old wall paintings excavated in the pre-historic Aegean settlement in Akrotiri, Thera, Greece.
[Research page]

Image Saliency through Spatial Surprise

Using an information-theoretic approach to study bottom-up spatial saliency, we show how Bayesian surprise can be interpreted to explain spatial saliency. Applications include attention modeling and fixation-prediction, image region detector and image quality assesement.
[Research page]

Research Areas

Computer Vision and Image Processing

Multiscale image analysis, enhancement, feature extraction and object detection with algebraic, geometric and statistical methods. Analysis and modeling of shape, texture, color, and motion. Visual Attention. Multiple-cue Segmentation. Reconstruction of 3D structure. Visual object recognition. Cognitive vision. Advanced methodologies: nonlinear scale-spaces, pyramids and wavelets, geometric partial differential equations, active contours, curve/surface evolution via level sets methods, variational calculus, graph-based models, multiple-view geometry, random fields and statistical inference. Applications in artificial intelligence, biomedicine, human computer/robot interaction, recognition of sign language, action and gestures, video technology, digital arts and cultural heritage, robotics, environment, geosciences, satellite remote sensing.

Audio Processing and Speech Communication

Modeling of speech production and hearing systems. Digital processing of speech and other sounds, e.g. music, feature extraction and acoustic event detection, filterbanks. Nonlinear and Multiresolution methodologies: modulations, fractals, chaos, wavelets. Automatic speech recognition and synthesis. Music analysis and recognition. Language models and NLP. Analysis of 3D acoustic scenes and auditory attention. Applications in speech and audio technology, biomedicine, communications, human computer/robot interaction (HCI/HRI), digital music, multimedia.

Multimodal Signal Processing, Cognitive Systems, and Machine Learning

Multimodal information processing at various levels of integration of multiple sensory streams. Applications to problems of analysis, enhancement, detection, estimation, segmentation, recognition and synthesis, and more generally to cognition-related technologies (e.g. HCI, HRI, multimodal video analysis, multimedia internet) that deal with multimodal (audio, video, text, graphics, tactile) data. Cross-modal Integration and Fusion for performance improving in multimedia. Multimodal saliency and video summarization. Cognitive systems modeling.

Nonlinear Systems

Mathematical Morphology, Fractals, Chaos, Dynamical Systems theory, Discrete Events control systems. Applications in signals and systems analysis, automation, biomedicine, environment, communications, informatics, and econometrics.

Research Projects

European and Greek Grants

"SoftGrip: Functionalised Soft robotic gripper for delicate produce harvesting powered by imitation learning based control"
Sponsor: European Union Horizon 2020,
Time/Place: 2021 – 2023, ICCS - NTUA. P. Maragos' participation: Co-PI of ICCS-NTUA’s Team.
"TeachBot: Intelligent Child-Robot Interaction System for designing and implementing edutainment scenarios with emphasis on visual information"
Sponsor: European Union (European Social Fund- ESF) and Greek national funds through the Operational Programme «Human Resources Development, Education and Lifelong Learning 2014 2020,
Time/Place: 2020 – 2021, NTUA. P. Maragos' participation: Scientific Director and Principal Investigator.
"e-Prevention: Prevention in Patients with Psychotic Disorders using Long Term Recording and Analysis of Biometric Indexes"
Sponsor: Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation under the call RESEARCH – CREATE – INNOVATE,
Time/Place: 2018 – 2021, NTUA. P. Maragos' participation: Scientific Director and Principal Investigator.
"i-Walk: Intelligent Robotic Walker for mobility and cognitive assistance of elderly and motor-impaired people"
Sponsor: Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation under the call RESEARCH – CREATE – INNOVATE,
Time/Place: 2018 – 2021, ICCS - NTUA. P. Maragos' participation: Scientific Director and Principal Investigator.
"BabyRobot: Child-Robot Communication and Collaboration"
Sponsor: European Union Horizon 2020 (Research and Innovation Action),
Time/Place: 2016 – 2018, ICCS-NTUA. Scientific Director: K. Tzafestas, PI: A. Potamianos.
"I-SUPPORT" (http://www.i-support-project.eu/)
Sponsor: European Union Horizon 2020,
2015 – 2018, ICCS – NTUA. Co-PI of ICCS-NTUA's Team: P. Maragos.
"MOBOT: Intelligent Active MObility Assistance RoBOT integrating Multimodal Sensory Processing, Proactive Autonomy and Adaptive Interaction"
Sponsor: European Union FP7: Specific Targeted Research-Innovation Project (STREP),
2013 – 2016, ICCS – NTUA. ICCS team: Scientific Director and WP leader: P. Maragos; Co-PIs: P. Maragos and K. Tzafestas.
"COGNIMUSE: Multimodal Signal and Event Processing In Perception and Cognition"
Sponsor: Greek Ministry of Education and Ministry of Development (Basic Research Program “ARISTEIA”),
2012-2015, NTUA. Scientific Director and PI: P. Maragos.
"BabyAffect: Affective and behavioral modeling of early lexicalizations of ASD and TD children"
Sponsor: Greek Ministry of Education and Ministry of Development (Basic Research Program “ARISTEIA”).
2014-2015, NTUA. PI: A. Potamianos, Senior Researcher: P. Maragos.
"TIMELY: Time In MEntaL activitY: theoretical, behavioral, bioimaging and clinical perspectives"
Sponsor: ESF-Cost Action (European Network).
2010-present, NTUA. PI of NTUA’s team: P. Maragos.
"Music Signal Processing and Applications in Recognition"
Sponsor: Greek Ministry of Education (Basic Research Program "HRAKLEITOS II"),
2010-2013, NTUA. Scientific Director and PI: P. Maragos.
"Dicta-Sign: Sign Language Recognition, Generation and Modeling with Application in Deaf Communication"
Sponsor: European Union: FP7, Specific Targeted Research-Innovation Project,
2009 – 2012, NTUA. Scientific Director and PI of NTUA’s team: P. Maragos; also WP Leader.
"ASPI: Audiovisual to Articulatory Speech Inversion"
Sponsor: European Union: Future and Emergent Technologies, 6th Framework Programme,
2005 – 2008, ICCS – NTUA. Scientific Director and PI of ICCS-NTUA’s team: P. Maragos; also WP Leader.
"DIANOEMA: Visual Gesture Analysis and Recognition for Sign Language Modeling and Application to Robot Teleoperation"
Sponsor: Greek General Secretariat for Research and Technology,
2000-2001, ICCS - NTUA. Project PI and Coordinator, and Scientific Director of ICCS-NTUA’s team: P. Maragos.
"GridNews: Distributed GRID Platform for Advanced Content Management and Retrieval in Audiovisual Broadcast News"
Sponsor: Greek General Secretariat for Research and Technology,
2006-2008, ICCS-NTUA. Co-PI: P. Maragos.
"MUSCLE: Multimedia Understanding, Semantics, Computation and Learning"
Sponsor: European Union: Network of Excellence, 6th Framework Programme,
2004 – 2008, ICCS – NTUA. Scientific Director and PI of ICCS-NTUA’s team: P. Maragos; also WP Leader.
"HIWIRE: Human Input That Works In Real Environments"
Sponsor: European Union: Specific Targeted Research or Innovation Project, 6th Framework Programme,
2004 – 2007, ICCS – NTUA. Scientific Director and PI of ICCS-NTUA’s team: P. Maragos.
"PENED: Computer Vision and Virtual Reality for Three-Dimensional Reconstruction of Archeological Findings with Applications to Wall Painintings and Buildings of the Prehistoric Settlement of Akrotiri in Santorini" ,
Sponsor: Greek Secretariat for Research & Technology ( PENED-2003),
2005-2009, NTUA. Project PI and Coordinator, and Scientific Director of NTUA’s team: P. Maragos.
"Aero-Dynamic Analysis of the Vocal Tract with Application to Speech Modeling, Synthesis and Recognition" ,
Sponsor: Greek Secretariat for Research & Technology ( PENED-2003),
2005-2009, NTUA and Technical University of Crete. Co-PI and Scientific Director of NTUA’s team: P. Maragos.
"Statistical Signal and Image Processing Methods for Optimum Denoising and Detection with Applications to Satellite Data" ,
Sponsor: Greek Secretariat for Research & Technology ( PENED-2003),
2005-2009, NTUA and National Observatory of Athens. Co-PI and Scientific Director of NTUA’s team: P. Maragos.
"Nonlinear Systems for Analysis of Microstuctures in Speech and Image Signals"
Sponsor: NTUA Basic Research Program "PROTAGORAS",
2004-2007, NTUA. Scientific Director and PI: P. Maragos.
"Advanced Optical Imaging and Computer Vision Methods With Applications to Cancer Diagnosis"
Sponsor: Greek Ministry of Education (Basic Research Program "PYTHAGORAS"),
2004-2007, NTUA. Co-PI: P. Maragos.
"Synergy Between Image Segmentation and Object Recognition Using Geometrical and Statistical Computer Vision Techniques"
Sponsor: Greek Ministry of Education (Basic Research Program "HRAKLEITOS"),
2003-2006, NTUA. Scientific Director and PI: P. Maragos.
"Integrated System for Control of the Ecologic Quality of Soils"
Sponsor: Greek Secretariat for Research & Technology (ΠΕΝΕΔ - 2001),
2002-2006, ICCS-NTUA and Aristotle Univ. of Thessaloniki. Co-PI and Scientific Director of ICCS-NTUA’s team: P. Maragos.
"COST 277: Nonlinear Speech Processing"
Sponsor: European Union,
2002-2005, NTUA. PI of NTUA’s team: P. Maragos.
"Development of Novel Methods of Digital Processing of Nonstationary Signals for Improving the Accuracy and Resolution of Ultrasound Doppler Sepctroscopy in Measuring Blood Flow"
Sponsor: ICCS (Basic Research Program ARCHIMEDES),
2000-2001, ICCS-NTUA. PI: P. Maragos.
"Aerodynamic Analysis of Human Vocal Tract with Navier-Stokes Equations and Relations to Speech Production"
Sponsor: ICCS-Inst. of Computer and Communication Systems (Basic Research Program ARCHIMEDES),
2000-2001, ICCS - NTUA. Co-PI: P. Maragos.
"Evaluation of the Biological Quality of Soils with New Computer Vision Techniques"
Sponsor: Greek Secretariat for Research & Technology (ΠΕΝΕΔ - 1999),
2000-2001, Institute of Communication & Computer Systems (ICCS) - NTUA. Project PI and Coordinator, and Scientific Director of ICCS-NTUA’s team: P. Maragos.
"AKODIFON: Improvement of Phonemic Recognition by Developing and Using Modulation and Fractal Acoustic Features"
Sponsor: Greek Secretariat for Research & Technology (ΕΠΕΤ ΙΙ – 98),
1999-2001, NTUA. Scientific Director and PI: P. Maragos.