Time Domain Synthesis

This synthesis technique uses snippets of actual recorded speech. These snippets can range in size from a single phoneme on up to a complete word. The size usually chosen is called a "diphone" and extends from the center of a given phoneme to the center of the next phoneme in the word. The diphones are spliced or blended together to form continuous speech. The drawbacks to this technique are:

Only one voice per diphone database. Each new voice requires an entirely new database each of which is in the megabyte(s) range.
Programmers or users cannot create new voices.
Limited pitch range. Not good for singing.
Limited speaking rate range.
Not all diphones in the database will splice together nicely, resulting in gurgles, false consonants and other strange discontinuities.

Click your browser BACK button or

Back to SoftVoice homepage