13
Jerry Dimitriou, Singular Logic eSpeak TTS Engine: Language Enhancement 28 November 2011 ÆGIS Conference, Brussels, Belgium

E speak aegis-workshop

Embed Size (px)

Citation preview

Jerry Dimitriou, Singular Logic

eSpeak TTS Engine: Language Enhancement

28 November 2011 ÆGIS Conference, Brussels, Belgium

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 2

What is eSpeak

• Open Source, Free to Use TTS Engine– Formant based– Minimal need for resources– More than 20 Languages already available

• Not all of them are in good state.

• Advantages– Intelligible in High Speeds– Easier to enhance languages (Rule based)– Easier to create new sounds (Phonemes)

• Disadvantages– Sound not natural (Robotic)

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 3

How eSpeak works: Text-To-Phoneme

• Step 1: Text to Phoneme Translation– Rule based with rules contained in a <lang>_rules file– Exceptions of rules in a <lang>_list file– Rules translate normal text to a stream of characters called

phonemes– Phonemes represent a standard sound which is generated:

• either through formants (vowels and voiced consonants)• by playing samples (unvoiced and fricative consonants)

– Examples:• Normal Text eSpeak Phon IPA Alphabet• Amazing → a#m'eIzIN → m e z ŋ ɐ ˈ ɪ ɪ• Brussels → br'Vs@Lz → b səlzɹˈʌ• Disability → d,Is@b'IlI2ti → d səb l tˌɪ ˈɪ ɪ ɪ

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 4

How eSpeak works: Rules

• Rules Format– <prefix>) <group of letters> ( <suffix> phonemes– a (Cable 'eI– a (tion 'eI– _r) a (tion a– Prefix and Suffix

• Non capital letters represent themselves• Capital letters represent sets of letters

– C → Any Consonant– A → Any Vowel – _ → Start of word at prefix, end of word at suffix

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 5

How eSpeak works: Exceptions

• Exception Format– <group of letters or word> phonemes and or flags– _" kwoUts– _% p3s'Ent– _0 z'i@roU– _1 w'0n– eg fO@Egz'aamp@L– ibm $abbrev– Ambidextrous $3– from fr0m $u– Flags

• $u, $abbrev, $only, $dot, $pause, etc

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 6

How eSpeak works: Exceptions

• Exception Format– <group of letters or word> phonemes and or flags– _" kwoUts– _% p3s'Ent– _0 z'i@roU– _1 w'0n– eg fO@Egz'aamp@L– ibm $abbrev– Ambidextrous $3– from fr0m $u– Flags

• $u, $abbrev, $only, $dot, $pause, etc

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 7

How eSpeak works: Exceptions

• Exception Format– <group of letters or word> phonemes and or flags– _" kwoUts– _% p3s'Ent– _0 z'i@roU– _1 w'0n– eg fO@Egz'aamp@L– ibm $abbrev– Ambidextrous $3– from fr0m $u– Flags

• $u, $abbrev, $only, $dot, $pause, etc

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 8

How eSpeak works: Phoneme-To-Sound

• Step 2: Phoneme to Sound– Having the list of phonemes, for each phoneme eSpeak generates

a sound– Previous or Next Phoneme may alter phoneme sound– Phoneme sound generation may be from a sample file or from

formant data. – Phoneme data are defined in ph_<language> files

• Eg: ph_english– Example of an entry in ph_english (Phoneme Definition)

• phoneme Ivowel starttype #i endtype #ilength 130IfNextVowelAppend(;)FMT(vowel/ii_2)endphoneme

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 9

Editing eSpeak files

• eSpeakEdit Program– Used to edit, visualize and compile eSpeak data

• Formant Phoneme Data

• Workflow for text-to-phoneme– Find an error in pronunciation, intonation etc– Check which rule (or exception) generates the error– Edit the rules or the dict file– Compile the data– Retry

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 10

Editing eSpeak files (2)

• Workflow for phoneme-to-sound– There might be cases where there is no proper sound for a

specific phoneme (usual problem the R sound)• Eg. should be shorter or longer, when stressed or

unstressed– Check all the available sounds that seem similar with the

sound you need, using espeakedit.– If something closer to what you need is found, change or

add its definition in ph_<language> file– If not, create a new phoneme, using espeakedit or record a

new sound, for unvoiced consonants.– retry

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 11

Editing Demonstration

• Demo of language edit, using espeakedit

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 12

Native speakers: How to contribute

• The biggest problem in language editing in eSpeak is ... native speakers.

• One must be a native speaker in order to be able to fix language problems

• How to contribute– Find errors in eSpeak for a certain language and report

them– Try to fix pronunciacion rules by editing rules and

exceptions– Try to fix phoneme sounds by editing phoneme data.– Send back the changes to the eSpeak community

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 13

Espeak Language Enhancement

Thank you!