Continuous speech recognition has to consider the fact that in general a speech signal cannot be transformed into the correct sequence of the words that have been uttered just by means of signal processing methods. It is widely accepted that knowledge about the language is crucial to complete the task. The ARCOS-G project was aiming at a speech recognition architecture that closely integrates the signal processing component with components that apply linguistic knowledge. Therefore this system consists of two major parts: A speech element recognizer and a linguistic processing stage.
The speech element recognizer consists of an HMM-recognizer (based on the COST 249 reference recognizer; see [JWLL00] and [LJWL00]) in which a recognition network that represents an appropriate sub-part of the overall language model is used to constrain the generation of basic elements. Other speech element recognizers are being examined as well.
The task of the linguistic processing stage is to search for the most likely sequence of basic elements and the corresponding text, using linguistic knowledge and context information to reduce the search space. This stage consists of a chart parser that has been extended to deal with several kinds of language models and to operate on a wealth of uncertain and concurring basic element hypotheses, as this is what the speech element recognizer provides.
A main research focus was to develop a strategy component that controls the search process such that a high recognition accuracy can be reached within reasonable time. (further information in [Saf94], [SP94], [Saf95], [Saf96], [Saf97] and [Saf98]).
Supported by: Bundesamt für Bildung und Wissenschaft
In collaboration with: This was a project in the framework of COST Action 249 (see also report [PL01]).