Continuous speech recognition typically leads to a large number of competing hypotheses. For disambiguation one usually resorts to statistical language models, largely disregarding the rule-based nature of language. We argue that recognition performance can be enhanced by incorporating a rule-based language model, as such a model can directly account for the various constraints which syntax imposes on natural language.
The aim of this project is the development of a rule-based language model that is able to very accurately decide whether a word sequence is grammatical or not. The issue of how such a rule-based model can be integrated into the speech recognition process has been investigated in the already completed project ISRL.
The current project has lead to the development of a high-precision, broad-coverage grammar for German and an efficient parser. As the underlying grammar formalism we have chosen HPSG (head-driven phrase structure grammar), as it provides sufficient expressiveness to model complex linguistic theories and at the same time can be processed efficiently enough (cf. report [Kau05], the grammar test site and paper [KP07]). By incorporating our grammar into a large-vocabulary continuous speech recognizer for a broadcast news transcription application, we succeeded in significantly decreasing the word error rate (see papers [KP08], [KEP09] and [KP12] or the dissertation [Kau09]).
Supported by: Swiss National Science Foundation