Computer Vision, Speech Communication &

Signal Processing Group

Faculty | PhD Students | Collaborators
Journal | Book Chapters | Conference
Undergraduate | Graduate | Diploma Theses

Interspeech 2018

small logo

Multimodal Speech and Audio Procesing
in Audio-Visual Human-Robot Interaction

Tutorial Title: Multimodal Speech and Audio Procesing in Audio-Visual Human-Robot Interaction

Abstract: The goal of this tutorial is to provide a concise overview of ideas, methods and research results in multimodal speech and audio processing, spatio-temporal sensory processing, perception and fusion, with applications in Human-Robot Interaction. Nowadays, most data are multimodal, thus there is the emergent need of developing multimodal methodologies, taking also into account the visual modality so as to enhance and assist the audio/speech modality. This tutorial will present state-of-the-art work for the major application area, which is Human-Robot Interaction, for social, edutainment and healthcare applications, including among others audio-gestural recognition for natural communication with the robotic agent and audio-visual speech synthesis for assistance and maximization of the naturalness of the interaction. Established results and recent advances from our research in various EU projects concerning the above areas as well as for the purposes of distant-speech interaction for robust home applications will also be discussed. Additionally, it will present a secondary application area that also relies on audio-visual processing, including in this case methodologies for saliency detection and automatic summarization of mono-modal or multimodal data (i.e., audio or video) and for the development of virtual interactive environments, where human body motion or hand gestures are used for audio-gestural music synthesis.

Related papers and current results can be found in and

Date/Time: September 2, 2018; 2 PM to 5.30 PM


Petros Maragos
Athanasia Zlatintsi

Primary Contact: Petros Maragos
IRAL-CVSP, National Technical Univ. of Athens,
Zografou campus, Athens 15773
Phone: +30 210772-2360, Fax: +30 210772-3397

Tutorial Slides

Part I: Multimodal Signal Processing, A-V Perception and Fusion
Part II: Audio-Visual HRI: Methodology and Applications in Assistive Robotics
Part III: Audio-Visual Child-Robot Interaction
Part IV: Multimodal Saliency and Video Summarization
Part V: Audio-Gestural Music Synthesis
List of References

Satellite Event:
Summer School on Speech Signal Processing (S4P) 2018

Institute of Information and Communication Technology (DA-IICT)
Gandhinagar, India, 9-11 Sep. 2018


Petros Maragos

S4P Lecture Slides

Lecture I: Nonlinear Aspects of Speech Production: Modulations and Energy Operators
Lecture II: Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics
Last modified: Thursday, 19 September 2019 | Created by Nassos Katsamanis and George Papandreou