contributions on Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Speech synthesis vs. speech recognition

As mentioned in the original paper, speech synthesis requires a much larger and more complex set of contexts in order to achieve high quality synthesised speech. Examples of such contexts are the following:

Identity of neighbouring phones to the central phone. Two phones to the left and the right of the centre phone are usually considered as phonetic neighbouring contexts
Position of phones, syllables, words and phrases w.r.t. higher level units
Number of phones, syllables, words and phrases w.r.t. higher level units
Syllable stress and accent status
Linguistic role, e.g. part-of-speech tag
Emotion and emphasis

contributions on Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis

Speech synthesis vs. speech recognition

Navigation menu

Search