Projects

Phase 2 Projects

Bayesian learning of a hierarchical representation of language from raw speech


Project leaders: Reinhold Häb-Umbach (Paderborn)

Researchers: Oliver Walter (Paderborn)

Administration: Ursula Stiebritz (Paderborn)

Associates: Bhiksha Raj (Pittsburg)

Summary:

Conventional automatic speech recognition (ASR) systems rely on supervised learning, where the acoustic model ist trained from transcribed speech, and the language model, i.e., the a priori probabilities of the words, from large text corpora. Both the vocabulary V and the inventory P of phonemes are fixed and known. Furthermore, a lexicon is given which contains for each word its pronunciation in terms of a phoneme sequence. While this approach has led to many speech recognition applications with impressive performance, the insights it provides to other learning problems with less supervision is rather limited.
In the unsupervised setting considered here, neither the pronunciation lexicon nor the vocabulary and the phoneme inventory are known in advance, and the acoustic training data come without labels. The goal is to discover the words and their pronunciations solely from the raw speech input and, by doing so, to train acoustic models of the words as well as language model probabilities to be able to recognize speech. The issues to be faced, among which are
• The segmentation of a streaming input into a sequence of meaningful entities, where the
entities themselves have to be learnt from the very same input,
• The learning of a hierarchical representation (words and sub-word units) from speech,
• The learning of a representation whose complexity grows with the amount of input data,
• The ability to cope with the extreme variability of the spoken input, are both challenging and generic.

In fact, it has been argued that spoken language is the most sophisticated behaviour of the most complex organism in the known universe [Moo05]. Still, infants excel at learning it seemingly without effort. Developing algorithms for unsupervised language acquisition may therefore be helpful to gain more insight into the nature of learning and its findings may be transferrable to other learning problems from sequential data.

Phase 1 Projects

Bayesian learning of a hierarchical representation of language from raw speech


Project leaders: Reinhold Häb-Umbach (Paderborn)

Researchers: Oliver Walter (Paderborn)

Administration: Ursula Stiebritz (Paderborn)

Associates: Bhiksha Raj (Pittsburg)

Summary:

Conventional automatic speech recognition (ASR) systems rely on supervised learning, where the acoustic model ist trained from transcribed speech, and the language model, i.e., the a priori probabilities of the words, from large text corpora. Both the vocabulary V and the inventory P of phonemes are fixed and known. Furthermore, a lexicon is given which contains for each word its pronunciation in terms of a phoneme sequence. While this approach has led to many speech recognition applications with impressive performance, the insights it provides to other learning problems with less supervision is rather limited.
In the unsupervised setting considered here, neither the pronunciation lexicon nor the vocabulary and the phoneme inventory are known in advance, and the acoustic training data come without labels. The goal is to discover the words and their pronunciations solely from the raw speech input and, by doing so, to train acoustic models of the words as well as language model probabilities to be able to recognize speech. The issues to be faced, among which are
• The segmentation of a streaming input into a sequence of meaningful entities, where the
entities themselves have to be learnt from the very same input,
• The learning of a hierarchical representation (words and sub-word units) from speech,
• The learning of a representation whose complexity grows with the amount of input data,
• The ability to cope with the extreme variability of the spoken input, are both challenging and generic.

In fact, it has been argued that spoken language is the most sophisticated behaviour of the most complex organism in the known universe [Moo05]. Still, infants excel at learning it seemingly without effort. Developing algorithms for unsupervised language acquisition may therefore be helpful to gain more insight into the nature of learning and its findings may be transferrable to other learning problems from sequential data.