Research

Topics of research

Starting at birth, the way we perceive the world is shaped by the sounds, images, odors, tastes and touches reaching our senses. Fundamental aspects of how we will perceive things for our whole life are learned very early on. Studying these learning processes is challenging because they take place on a time scale of years, in individuals whose behavior and cerebral activity is very difficult to study (a.k.a. infants), and they are computationally intensive. On top of this, we often don’t really know what is supposed to be learned in the first place. This last caveat, at least, can be partially alleviated when one considers the case of early language acquisition. Indeed, while different persons might see very different things in the same image or smell different things in the same odor, the communicative function of speech strongly constrains the way it is perceived. For example, any proficient user of a given language needs to consistently identify different occurrences of the same word and discriminate them from occurrences of other words. This needs to be done in spite of the numerous sources of variability in the speech signal, such as the preceding and following words in the sentence, the presence of background noise, the speaker’s identity, spatial location or emotional state, etc.

My main motivation is to uncover more about these learning processes. I attempt to reach this objective through the study of a particular phenomenon observed in the course of early language acquisition: the formation of language-specific phonetic categories during the first year of life. It’s a topic that lies at the intersection of three distinct domains of research: developmental psychology, auditory neuroscience and speech sciences (speech technology, speech perception, phonetics and speech production). I study it through a computational modeling approach, using tools from statistics/machine-learning and computational neuroscience.

Publications

Martin, A., Guevara-Rukoz, A., Schatz, T., Peperkamp, S. Phonetic naturalness and the shaping of sound patterns: the role of learning bias and its transmission across generations Submitted to Phonology.

Schatz, T., Turnbull, R., Bach, F., Dupoux, E. (2017) A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability Submitted to INTERSPEECH, 2017 [pdf]

Schatz, T., Bach, F., Dupoux, E. (2017) ASR Systems as Models of Phonetic Category Perception in Adults Submitted to CogSci, 2017 [pdf]

Schatz, T. (2016) ABX-Discriminability Measures and Applications PhD Thesis [pdf]

Versteegh, M.*, Thiollière, R.*, Schatz, T.*, Cao, X-N., Anguera, X., Jansen, A., Dupoux, E. (2015), The Zero Resource Speech Challenge 2015. Proceedings of INTERSPEECH, 2015. [pdf]
*These authors contributed equally to this work.

Martin, A., Schatz, T., Versteegh, M., Dupoux, E., Mazuka, R., Miyazawa, K., Cristia, A. (2015), Mothers Speak Less Clearly to Infants: A Comprehensive Test of the Hyperarticulation Hypothesis. Psychological Science 26(3), pp. 341-347. [pdf]

Synnaeve, G., Schatz, T., Dupoux, E. (2014). Phonetics Embedding Learning with Side Information. Proceedings of SLT, 2014. [pdf]

Schatz, T., Peddinti, V., Xuan-Nga, C., Bach, F., Hynek, H. & Dupoux, E. (2014). Evaluating Speech Features with the Minimal-Pair ABX task (II): Resistance to Noise. Proceedings of INTERSPEECH, 2014. [pdf]

Fourtassi, A., Schatz, T., Varadarajan, B. & Dupoux, E. (2014). Exploring the Relative Role of Bottom-up and Top-down Information in Phoneme Learning. Proceedings of ACL, 2014. [pdf]

Schatz, T., Peddinti, V., Bach, F., Jansen, A., Hynek, H. & Dupoux, E. (2013). Evaluating Speech Features with the Minimal-Pair ABX Task: Analysis of the Classical MFC/PLP Pipeline. Proceedings of INTERSPEECH, 2013. [pdf]

Jansen, A., Dupoux, E., Goldwater, S., Johnson, M., Khudanpur, S., Church, K., Feldman, N., Hermansky, H., Metze, F., Rose, R., Seltzer, M., Clark, P., McGraw, I., Varadarajan, B., Bennett, E., Borschinger, B., Chiu, J., Dunbar, E., Fourtassi, A., Harwath, D., Lee, C-y., Levin, K., Norouzian, A., Peddinti, V., Richardson, R., Schatz, T. & Thomas, S. (2013). A Summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition. Proceedings of ICASSP, 2013. [pdf]

Schatz, T. and Oudeyer, P-Y. (2009). Learning Motor Dependent Crutchfield’s Information Distance to Anticipate Changes in the Topology of Sensory Body Maps. Proceedings of ICDL, 2009. [pdf]
(Best student paper award)

Software

ABXpy

A python package for performing ABX experiments on large corpora.
GIT repository.

Abkhazia

A standardized format for corpora of speech recordings with tools for performing experiments combining ASR and ABX-Discriminability Measures.
GIT repository.

FLoOD

A utility for efficiently managing the complex FLows Of Data in modern machine learning and signal processing pipelines.
Not yet released.

Datasets

Schatz, T., Cao, X-N., Kolesnikova, A., Bergvelt, T., Wright, J., Dupoux, E. (2015) Articulation Index LSCP LDC2015S12. Web Download. Philadelphia: Linguistic Data Consortium, 2015. [Available for free here]