contributions on Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
Speech synthesis vs. speech recognition
As mentioned in the original paper, speech synthesis requires a much larger and more complex set of contexts in order to achieve high quality synthesised speech. Examples of such contexts are the following:
- Identity of neighbouring phones to the central phone. Two phones to the left and the right of the centre phone are usually considered as phonetic neighbouring contexts
- Position of phones, syllables, words and phrases w.r.t. higher level units
- Number of phones, syllables, words and phrases w.r.t. higher level units
- Syllable stress and accent status
- Linguistic role, e.g. part-of-speech tag
- Emotion and emphasis