One speech recognition task, namely the recognition of single words spoken by arbitrary persons, has been solved very successfully by means of statistical models (mostly hidden Markov models) of the words to be recognized. In order to determine the model parameters for a certain word, that word has to be recorded from many speakers. For a good speaker-independent word model, speech signals from at least a few thousand speakers must be available.
Therefore, the disadvantage of such a whole word model-based speech recognizer is, that its vocabulary is not flexible. Extending the vocabulary to new words requires to collect the corresponding speech signals which is a very time consuming and thus expensive process.
The goal of the WOROV project was to develop a speaker-independent word recognizer with a flexible vocabulary which can be configured for an arbitrary set of words without the need of collecting training data for new words. The basic idea was to define and train appropriate subword unit models, e.g. phone models, instead of whole word models. With the resulting set of subword unit models it is now possible to generate a word model for any desired word through concatenation of the appropriate subword unit models.
Various problems were addressed in this project such as:
Supported by: This project was mainly supported by Swisscom.