Department Member, Sound Analysis and Synthesis
Thesis Title: MeLos: Analysis and Modelling of Speech Prosody and Speaking Style
About
R&D in Speech and Music Technologies
• My main interests cover speech and music technologies, with a major in Voice Conversion and Speech Synthesis. My main research topics include statistical modeling of speech & music signals, speech & music processing, linguistics, and musicology.
• I have a PhD. in computer sciences on the modeling of speech prosody and speaking style for speech synthesis (2011), and a master degree in musicology on a comparative study of contemporary music and literature (2007).
• My research has initially started with F0 estimation of monophonic musical instruments @ CNMAT, University of Berkeley - California (2005).
• Then, my research moved to speech processing @ IRCAM for Voice Conversion and Speech Synthesis, with the modeling of F0 and spectral envelope correlation for voice conversion (2006). My PhD. has focused on the modeling of symbolic/acoustic characteristics of speech prosody and speaking style for Speech Synthesis (2011), and includes discrete/continuous HMMs, segmental HMMs, information fusion, transcription and stylization of speech prosody, linguistics, joint short/long term modeling, and speaker-independent modeling.
• Also, I am active in Arts & Technologies production. This includes a participation in the PHASE project (Haptic Platform of Sound Application for Musical Education for Imaginative Play - Centre Pompidou, 2004); co-funder and scientific committee of EMUS (International Conferences on Speech and Music, 2008) and AGORA music festival ("La Voix, L'icône" - IRCAM, 2008); R&D for the production of "HyperMusic: Prologue" (supervised/unsupervised speech segmentation - Hector Parra,IRCAM, 2009), and "Luna Park" (speech synthesis, real-time control of speech prosody - Georges Aperghis, IRCAM, 2011).
• I am currently researcher and manager of the VOICE4GAMES project @ IRCAM on speech level normalization, unsupervised speech segmentation, speech mode classification, voice casting and speakers similarity for Video Games and Audio Gaming.
Specialties
• Speech Synthesis (Unit Selection, HMM-based)
• Speech Recognition (Speech Segmentation, Voice Type, Speakers Similarity)
• Voice Conversion
• Statistical Modeling of Speech & Music Signals (GMM, HMM, UBM, SVM)
• Spoken/Natural Language Processing
• Speech Prosody & Speaking Style
• Linguistics
• Real-Time Signal Processing (Max/Msp)
• Musicology
Contact Information
| Homepage: | |
| Address: | IRCAM |






