Sophia: the star is speech recognition

A scientific meeting late August at Eurécom in order to review adaptation methods ofr speech recognition. Explanations from a specialist, the professor Christian J. Wellekens.

Although there are many technical-business meetings in Sophia Antipolis, scientific colloquiums are rare. Then, we have to salute professor Christian J. Wellekens' initiative, from the Eurécom Institute, who decided to organize, on August 29th and 30th at the Eurécom Institute, the workshop of the ISCA (International Speech Communication Association) on the theme of "adaptation methods for speech recognition".

This two-day-long meeting, which final program is under way of elaboration, will gather scientists from all over the world together. Around fifty contributions will be proposed by university searchers or laboratory members of companies such as Nokia, Panasonic, Intel, Apple, Sony, France Telecom, INRIA, Lucent, Nuance, Compaq and Swisscom. The specialist of speech recognition and a professor at Eurécom Institute, Christian J. Wellekens explains how the process of speech recognition works and what the adaptation methods bring.

The base of speech recognition

Recognition of modern speech is based on the construction of statistic models of phonemes from large databases which are composed of read or spontaneous mulit-speaker speech. These bases are labelled by language specialists what means that the exact content of sentences are known.

Training models consists in assessing their parameters from this data. This sentence is very slow and can require more than 30 hours of calculation time on a very powerful computer. Once models are known, by using a phonetic dictionary you can build a model of any word from its phonetic transcription.

During the recognition, we look for the sequence of words which represents the signal of the spoken word at best, that is to say the most probable according to the speech you received. The recognition requires sophisticated programming methods in order for it to be done in real time. Recognition rates are increasing because of grammars which makes some word sequences prohibited. They are deteriorated if pronunciations to recognize are noisy.

To improve recognition rates

Eventually, language, accents, transmission canals (on telephone line or GSM) affect results seriously. In order to improve them, we could ask another training of phonem models in application conditions. But this would require a tiresome pronunciation of numerous sentences for training from the user and a longer time for the other training.

The adaptation consists in modifying model parameters in order to improve recognition rates by using a reduced number of sentences or words that the user would pronounce and for which the recognition rate will become higher than the one obtained with a all-speaker recogniser."

Information and contacts
- Program on the website of Eurécom
- Professor Christian J. Wellekens, Tel: +33 (0) 4 93 00 26 33; Fax: +33 (0) 4 93 00 26 28; e-mail: Christian.Wellekens@eurecom.fr