A Matlab implementation of the F0 detection algorithm which is presented in the paper Nearly Perfect Detection of Continuous F0 Contour and Frame Classification for TTS Synthesis is now available for teaching and academic research. To obtain a copy of the code, send an email to the ETH Speech Processing Group (pfister@tik.ee.ethz.ch).
Below some application examples of the algorithm are presented. The optimal paths (shown red in the cepstrograms) indicate the T0 contours (F0 = 1/T0) of the speech signals.
![]() |
|
High-resolution cepstrogram on a logarithmic scale showing the optimal T0 path for a short recording of a German female voice.
| |
![]() |
|
High-resolution cepstrogram on a logarithmic scale showing the optimal T0 path for a Spanish male voice with a low F0.
| |
![]() |
|
High-resolution cepstrogram on a logarithmic scale showing the optimal T0 path for a Turkish female voice.
| |
![]() |
|
High-resolution cepstrogram on a logarithmic scale showing the optimal T0 path for a Mandarin female voice. The speech signal features a strong creaky effect at 2.6 sec which is accurately tracked by the algorithm (click for an enlarged image of the creaky segment).
| |
Thomas Ewender, Sarah Hoffmann and Beat Pfister: Nearly Perfect Detection of Continuous F0 Contour and Frame Classification for TTS Synthesis Proceedings of Interspeech 2009, Brighton (UK), September 2009 (BibTex) |