80
Speech and Language Technology For Dialog-based CALL Gary Geunbae Lee, POSTECH

Speech and Language Technology For Dialog-based CALL

  • Upload
    maxime

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Speech and Language Technology For Dialog-based CALL. Gary Geunbae Lee, POSTECH. Outline. 1. Introduction. DBCALL: Educational Error Handling. 2. 3. Spoken Dialog Systems. 4. 5. PESAA: Postech English Speaking Assessment and Assistant. Field Study. CHAPTER 1. iNTRODUCTION. - PowerPoint PPT Presentation

Citation preview

Page 1: Speech and Language Technology For Dialog-based CALL

Speech and Language TechnologyFor Dialog-based CALL

Gary Geunbae Lee, POSTECH

Page 2: Speech and Language Technology For Dialog-based CALL

Outline

Introduction1

Spoken Dialog Systems2

4 PESAA: Postech English Speaking As-sessment and Assistant

5 Field Study

3 DBCALL: Educational Error Han-dling

Page 3: Speech and Language Technology For Dialog-based CALL

INTRODUC-TION

CHAPTER 1

Page 4: Speech and Language Technology For Dialog-based CALL

English Tu-toring Meth-ods Tranditional Approches

CALL Approches

<CMC> <ICALL>

<Classroom> <Textbook> <Multimedia>

Page 5: Speech and Language Technology For Dialog-based CALL

Socio-Economic Ef -fects

• Changing our current foreign language educa-tion system in public schools From vocabulary and grammar methodology To speaking ability

• Significant effect of decreasing private English education fee private English education fee in Korea, reaching up

to 16 trillion won annually

• Expect the effect of the overseas export Japan, China, etc.

Page 6: Speech and Language Technology For Dialog-based CALL

Interdiciplinary Research

NLP

• Dialog Management• Error Detection• Corrective Feedback

• Comprehensible Input and Output• Corrective Feedback• Attitude & Motivation

SLA

Evaluation

• Cognitive Effect• Affective Effect

Page 7: Speech and Language Technology For Dialog-based CALL

Second Language Acquisition Theory

Second Lan-guage Ac-quisition

• Input Enhancement• Comprehensible input• Provision of inputs with high

frequency

• Immersion• Authentic environment• Direct form-meaning map-

ping

• Noticing & Attention• Output hypothesis test • Corrective feedback• Affective factors

• Motivation• Goal achievement & rewards• Interest• Importance of L2

Page 8: Speech and Language Technology For Dialog-based CALL

Dialog-Based CALL (DB-CALL)

<Educational Robot>

<3D Educational Game>

Spoken Dialog System DB-CALL System

Page 9: Speech and Language Technology For Dialog-based CALL

Existing DB-CALL Systems

Alelo Tactical language & culture training system Learn Iraqi Arabic by playing a fun video game Dedicated to serving langauge and culture

learning needs of military

SPELL Learning English in functional situations such

as going to a restaurant, expressing (dis-)likes, etc.

The speech recogniser is programmed to recognise grammatical and some ungrammatical utter-ances

DEAL Learning Dutch in a flea market situation The model can also convey extra linguis-

tic signs such as lip-synching, frowning, nodding, and eyebrow movements

Page 10: Speech and Language Technology For Dialog-based CALL

Video Demo

Page 11: Speech and Language Technology For Dialog-based CALL

SPOKEN DIALOG SYS-TEMS

CHAPTER 2

Page 12: Speech and Language Technology For Dialog-based CALL

SPOKEN DIALOG SYSTEM (SDS)

Page 13: Speech and Language Technology For Dialog-based CALL

Tele-service

Car-navigation Home networking

Robot interface

SDS APPLICATIONS

Page 14: Speech and Language Technology For Dialog-based CALL

Automatic Speech Recognition (ASR)

FeatureExtraction Decoding

AcousticModel

PronunciationModel

LanguageModel

버스 정류장이어디에 있나요 ?

Speech Signals Word Sequence

버스 정류장이어디에 있나요 ?

NetworkConstruction

SpeechDB

TextCorpora

HMMEstimation

G2P

LMEstimation

WO

)()|(maxargˆ WPWOPWLW

Page 15: Speech and Language Technology For Dialog-based CALL

15

Spoken Language Understanding (SLU)

Dialog ActIdentification

Frame-SlotExtraction

RelationExtraction

Unification

Feature Extraction / Selection

Info.Source

+

+

+

+ +

Overall architecture for semantic analyzer

I like DisneyWorld.

Domain: ChatDialog Act: StatementMain Action: LikeObject.Location=DisneyWorld

Examples of semantic frame structure

Semantic Frame Extraction (~ Information Extrac-tion Approach)1) Dialog act / Main action Identification ~ Classification2) Frame-Slot Object Extraction ~ Named Entity Recognition3) Object-Attribute Attachment ~ Relation Extraction

How to get to DisneyWorld?Domain: NavigationDialog Act: WH-questionMain Action: SearchObject.Location.Destination=DisneyWorld

Page 16: Speech and Language Technology For Dialog-based CALL

Named Entity ↔ Dialog ActJOINT APPROACH

Joint Inference

Classification(Dialog Act / Intent)

Sequential Labeling

(Named Entity / Frame Slot)

Automatic Speech

Recognition

Dialog Management

Joint Model(e.g. TriCRFs)

x x,y,z

[Jeong and Lee, SLT2006][Jeong and Lee, IEEE TASLP2008]

Page 17: Speech and Language Technology For Dialog-based CALL

HDP-HMM for Unsupervised Dialog Acts

β ~ GEM(α), ω ~ Dir(ω0)for each hidden state k [∈ 1,2,…] πk ~ DP(α',β) ϕk ~ Dir(ϕ0), θk ~ Dir(θ0)for each dialog d λd ~ Beta(λ0) for time stamp t zt ~ Multi(πzt-) for each entity e ei ~ Multi(θzt)

for each word w xi ~ Bern(λd) [select word type] if xi = 0: wi ~ Multi(ϕzt) else wi ~ Multi(ω) [background LM]

zt

wt,i

zt+1

et,iN

V

ϕk∞

πk∞

ϕ0

α'

βα

θk∞

θ0

zt-1

ωω0xt,i

Dλ0λd

Generative Story

Page 18: Speech and Language Technology For Dialog-based CALL

CRF with Posterior Regularization for unsu-pervised NER Constraints for NER

Constraints Learning

Welcome to the New York City Bus Tour Center .I want to buy tickets for me and my child .What kind of tour would you like to take ?We would like to go on a tour dur-ing the day .We have two daytime tours: the Downtown Tour and the All Around Town Tour .Which tour goes to the Statue of Liberty ?…

BOARD_TYPE:Hop-onBOARD_TYPE:Hop-offPLACE:Times SquarePLACE:Empire State BuildingPLACE:ChinatownPLACE:Site of the World Trade CenterPLACE:Statue of LibertyPLACE:Rockefeller CenterPLACE:Central Park…

HeuristicMatch-

ing

DICT/DB/Web

UNLABELDCORPUS

# We would like to go on a tour during the day . # -> null0:1.000:We would like to go on a tour during the day . # We have two daytime tours # -> the Downtown Tour and the All Around Town Tour .0:1.000:We have two daytime tours # Which tour goes to the Statue of Liberty ? # -> null0:1.000:Which tour goes to the <PLACE>Statue of Liberty</PLACE> ? # You can visit the Statue of Lib-erty on either tour . # -> null0:1.000:You can visit the <PLACE>Statue of Liberty</PLACE> on either tour .…

HYPOTHE-SIS

Welcome O:1.000 W1=<s> O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W2=<s>,Welcome O:1.000 W3=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W4=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W5=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W6=to O:1.000 W7=Welcome,to O:1.000 W8=the O:0.924 PLACE-b:0.005 PLACE-i:0.006 TOURS-b:0.001 TOURS-i:0.064 W9=Welcome,the O:1.000 …

LABELEDFEATURES

ExtractFeatures

CRFModel with PR

Page 19: Speech and Language Technology For Dialog-based CALL

Vanilla EXAMPLE-BASED DM (EBDM) Example-based approaches

Dialog State Space

Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?System Action = inform(Floor)

Dialog CorpusUSER: 회의 실 이 어디 지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE = 회의실 ]SYSTEM: 3 층에 교수회의실 , 2 층에 대회의실 , 소회의실이 있습니다 . [System Action = inform(Floor)]

Turn #1 (Domain=Building_Guidance)

Dialog Example

Indexed by using semantic & discourse features

Having the simi-lar state

),(argmax* heSe iEei

[Lee et al., SPECOM2009]

Page 20: Speech and Language Technology For Dialog-based CALL

Error handling and N-best support

To increase the robustness of EBDM with prior knowledge

1) Error HandlingIf the system knows what the user will do next

Dynamic Help Generation

LOCATION

OFFICE PHONE NUMBER

ROOM ROLE

GUIDE

FOCUS NODE

NEXT_TASK

AgendaHelpS: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone num-ber, or 3) Selecting the desired room for navi-gation.

UtterHelpS: Next, you can say 1) “What is it?”, or 2) “What’s the phone number of [ROOM_NAME]?”, or 3) “ Let’s go there.

[Lee et al CSL2010]

Page 21: Speech and Language Technology For Dialog-based CALL

Error handling and N-best support

To increase the robustness of EBDM with prior knowledge

2) N-best supportIf the system knows which subtask will be more probable next

Rescoring N-best hypotheses (h1~hn)

LOCATION

OFFICE PHONE NUMBER

FLOOR

ROOMNAME

h2

h1

h3

h4

Subtask System Utterance System Action

LOCATION The director’s room is Room No. 201.

Inform(RoomNumber)

N-best User Utterances Subtask P(hi|S)

U1 (h1) What are office rooms in this building?

ROOM NAME 0.2

U2 (h2) What is the floor? FLOOR 0.4

U3 (h3) Where is it? LOCATION 0.3

U4 (h4) What is the phone num-ber?

OFFICEPHONE NUMBER

0.5(More proba-

ble)

Page 22: Speech and Language Technology For Dialog-based CALL

Misunderstanding handling by Confirma-tion

Dialog statehypotheses

ConfirmationAgent

(misunderstandingHandler)

EBDM

Multiple Dialog States

Representation

User Simulator

DEDB

ConfirmationStrategy

Confirmation

Task related system action

User

ASR

SLU

User’sActions

Executing Learning

[Kim et al SLT 2010]

Page 23: Speech and Language Technology For Dialog-based CALL

The Framework of ranking-based EBDM

DiscourseSimilarity

Relative Position

Scoring Mod-uleDialog

Examples

Dialog ActFeatures

Entity Con-

straint

User Intention(system intention)

RankSVM

CalculatedScores

system Intention(user intention)

EBDM

[Noh et al IWSDS2011]

Page 24: Speech and Language Technology For Dialog-based CALL

Dialog Simulation User Simulation for spoken dialog systems in-

volves four essential problems

User Intention Simulation

User Utterance Simulation

ASR Channel SimulationSpoken Dialog System Simulated Users

[Jung et al., CSL 2009]

Page 25: Speech and Language Technology For Dialog-based CALL

Design Step

Annotation Step

LanguageSynchronization Step

Training Step

Running Step

Semantic Structure

Dialog Structure

KnowledgeStructure

ModelSLUModel

DialogModel

Knowledge

Model

ASRModel

CorpusSLU

CorpusDialogCorpus

Knowledge

Source

SemanticAnnotato

r

DialogAnnotato

rKnowledgeAnnotator

DialogUtterance

Pool

KnowledgeImporter

KnowledgeBuilder

DMTrainer

SLUTrainer

ASRTrainer

SLU DMASR

ExternalComponen

tDialog Studio

Component

File

DIALOG STUDIO ARCHITECTURE

[Jung et al., SPECOM 2008]

Page 26: Speech and Language Technology For Dialog-based CALL

humansubject Wizard

User speech

mic speaker

TTS Text input

Wizard speech (Network RPC)

Architecture of WOZ

User Screen Wizard ScreenNPCs

ControlUser Character

Control

[Lee et al SLATE2011]

Page 27: Speech and Language Technology For Dialog-based CALL

User Screen (Mission)

Page 28: Speech and Language Technology For Dialog-based CALL

DBCALL: EDUCATIONAL ERROR HANDLING

CHAPTER 3

Page 29: Speech and Language Technology For Dialog-based CALL

Global Errors• Global errors are errors that affect overall sen-

tence organization. They are likely to have a marked effect on comprehension. [1] 

What is the purpose of your trip?

It’s ... I ... purpose business

Sorry, I didn’t under-stand. What did you say?You can say “I am here on busi-ness”I am here on business

Intention: inform(trip-purpose)

Page 30: Speech and Language Technology For Dialog-based CALL

Lee, S., Lee, C., Lee, J., Noh, H., & Lee, G. G. (2010). Intention-based Corrective Feedback Generation using Context-aware Model. Proceedings of International Conference on Computer Supported Education.

Hybrid Model

Level 1Data

Learner’s Utterance

Dialog ContextModel

Level 2Utterance Model

Level NUtterance Model

Level 2Data

Level NData

Dialog State

Learner‘s Intention

Level 1Utterance Model

Dialog Manager

• Robust to learners’ errors– Hybrid model combining utterance-based model and dialog

context-based model

Page 31: Speech and Language Technology For Dialog-based CALL

Formulating the prediction as probabilistic inference:

Chain ruleBayes’ ruleIgnore invariants

Dialog-Context ModelUtterance ModelMaximum EntropyFeatures: • Word• Part of speech

Enhanced K-Nearest NeighborsFeatures: • Previous system intention• Previous user intention• Current system intention• A list of exchanged information• Number of database query results

Page 32: Speech and Language Technology For Dialog-based CALL

Dialog State Space

Domain = Fruit_StorePrevious System Intention = Ask(Select_Item)Previous User Intention = Inform(Order_Fruit) System Intention = Ask(Order_Quantity)Exchanged Information State = [ITEM_NAME = ‘orange’ (C), ITEM_QUANTITY = 3 (U)]Number of DB query results = 0

Dialog Corpus

SYSTEM: Namsu, what would you like to buy today?[Intention = Ask(Select_Item)]USER: I’d like to buy some oranges[Intention = Inform(Order_Fruit), ITEM_NAME = orange]SYSTEM: How many oranges do you need?[Intention = Ask(Order_Quantity)]USER: I need three oranges[Intention = Inform(Order_Quantity), NUM = three]

Segment #2 (Domain = Fruit Store)

Dialog State

Indexed by using semantic & discourse features

User Intention = Inform(Order_Quantity)User Intention

Dialog-Context Model

Page 33: Speech and Language Technology For Dialog-based CALL

Recast Feedback Generation

ExampleExpresssion DB

Example Search

Example Ex-pressions

Pattern Matching

Feedback

IntentionRecognition

User’sUtterance

> θ No FeedbackY

N

Page 34: Speech and Language Technology For Dialog-based CALL

What is the purpose of your trip?

I am here at business

On business

I am here on business

ErrorInfo: prep_sub(at/on)

Local Er-rors

• Local errors are errors that affect single elements in a sentence. [1]

[1] Ellis., R.  (2008). The Study of Second Language Acquisition. 2nd ed. Oxford: OUP

Page 35: Speech and Language Technology For Dialog-based CALL

Local Error Detecter Archi-tecture

TextErroneous

TextGrammatical Error

Simulation

ASR ASR’

N-gram LM

Merged Hy-potheses

Error-typeClassifier

GrammaticalityChecker

N-gram LM

Feed-back

Error PatternsError Frequency

Lee, S., Noh, H., Lee, K., & Lee, G. G., (2011) Grammatical Error Detection for Corrective Feedback Provision in Oral Conversations, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco.

Page 36: Speech and Language Technology For Dialog-based CALL

Two-Step Approach

• Data Imbalance Problem– Simply produce majority class– Or, High false positive rate

• Large number of error types – Makes model learning and selection procedure vastly compli-

cated• Grammaticality checking itself can be useful for some Ap-

plications– Categorizing learners’ proficiency level – Generating implicit corrective feedback such as repetition,

elicitation, and recast feedbackI am here at business

0 0 0 1 0

None None None PRP_LXC None

Grammaticality CheckingError Type Classification

Grammatical Error Detection

1)

2)

Page 37: Speech and Language Technology For Dialog-based CALL

Grammaticality Checker

- Feature Extraction

Page 38: Speech and Language Technology For Dialog-based CALL

Grammaticality Checker

- Model Learning• Binary Classification

– Support Vector Machine• Model Selection

– Radial Basis Kernel– Search for C, γ which optimize:

• Maximize F-scoreSubject to Precision > 0.90, False positive rate < 0.01

– 5-fold cross-validation

Page 39: Speech and Language Technology For Dialog-based CALL

Error Type Classi-fication

• Error type information is useful for– Meta-linguistic feedback– Sophisticated learner model

• Simplest way– Choose the error type associated with the top ranked er-

ror pattern– Two flaws:

• does not have a principled way to break tied error patterns• does not consider the error frequency

• Weighting according to error frequency– Score(e) = TS(e) + α * EF(e)

Page 40: Speech and Language Technology For Dialog-based CALL

GES: Grammar Error Sim-ulator

Automatic Speech Recog-

nizer

Grammatical Er-ror Simulator

Incorrect Sen-tences

Correct Sen-

tences

Error Types

<LM Adaptation & Grammatical Error Detection>

Page 41: Speech and Language Technology For Dialog-based CALL

GES Applica-tion

<Grammar Quiz Generation>

Page 42: Speech and Language Technology For Dialog-based CALL

Markov Logic Net-work

• subject-verb agreement errors• omission errors of prepositions• omission errors of articles

He want go to movie theater

Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. Proceedings of the ACL 2009, Singapore, August 2009.Sungjin Lee, Jonghoon Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee. (2011) Grammatical Error Simu-lation for Computer-Assisted Language Learning, Knowledge-Based Systems

Page 43: Speech and Language Technology For Dialog-based CALL

Grammar Error Simulation

• Realistic errors– Encoding characteristics of learners’ errors using the Markov

logic• Over-generalization of some rules of the L2

• Lack of knowledge of some rules of the L2

• Applying rules and forms of the first language into the L2

Page 44: Speech and Language Technology For Dialog-based CALL

Overall Process

Page 45: Speech and Language Technology For Dialog-based CALL

NICT JLE Corpus

• Number of interviews– 167

• Number of sentences of intervie-wees– 8,316

• Average length of sentences– 15.59

• Nubmer of total errors– 15,954

<n_num crr=“x”>...</n_num>

POS(i.e. n=noun)

Grammatical system(i.e. num=number)

Corrected form

Erroneous part

Example) I belong to two baseball <n_num crr=“teams”>team</n_num>

Page 46: Speech and Language Technology For Dialog-based CALL

PESAA: POSTECH ENGLISH SPEAKING ASSESSMENT & ASSISTANT

CHAPTER 4

Page 47: Speech and Language Technology For Dialog-based CALL

English oral proficiency assessment:International test

Reading aloudDescribing a picture

Answering to questionsProposing a solution or

opinion

InterviewTalking on a topic

Discussion

Giving an opinionTalking on a subject

Answering to questions

Page 48: Speech and Language Technology For Dialog-based CALL

English oral proficiency assessment:Korean national test

• National English Ability Test (NEAT)

• Tasks– Answering short questions (communication)– Describing pictures (story telling)– Presentation

• Describing figures, tables, and graphs• Introducing products or events

– Giving an opinion (discussion)

Page 49: Speech and Language Technology For Dialog-based CALL

English oral proficiency assessment:General common tasks

• Giving an opinion / discussion

• Rubrics– Delivery

• Pronunciation• Fluency (Prosody)

– Language use• Grammar• Word choice

– Topic development• Organization• Discourse• Contents

Page 50: Speech and Language Technology For Dialog-based CALL

Requirements:Real environment

Reading aloudDescribing a

pictureAnswering to

questionsProposing a solution or

opinion

InterviewTalking on a

topicDiscussion

Giving an opin-ion

Talking on a subject

Answering to questions

Existing systems for read speech

Spontaneous speechText-independent input

NEAT

Page 51: Speech and Language Technology For Dialog-based CALL

51

Training data collection

• SNU pronunciation/prosody

Speech waveform

Spectrogram/ pitch contour

WordPLU

Sentence stress

Page 52: Speech and Language Technology For Dialog-based CALL

52

For Public Use• Boston University radio news corpus

– Speech from FM radio news announcers– 424 paragraphs (30,821 words)– ToBI labels (pitch accent stress)– 0.48 marked stress per word– PLU set: TIMIT phonetic labeling system

Page 53: Speech and Language Technology For Dialog-based CALL

53

• Aix-Marsec database

Speech waveform

Spectrogram/ pitch contour

Multi-level annota-tion

Page 54: Speech and Language Technology For Dialog-based CALL

Collecting Grammar Error Data:Picture description task

• From English learners of Korean• Story Telling based on pictures• 80 Students (5 tasks for each student)

Page 55: Speech and Language Technology For Dialog-based CALL

Collecting Grammar Error Data: Error tagsets

• JLE Tagset– Consisting of 46 tags– Systematic tag structure– Some ambiguity caused by POS specific error tag structure

• CLC Tagset– World-widely used tagset including 76 tags– Systematic & Taxonomic tag structure– JLE issue is figured out by taxonomic tag structure

• NUCLE Tagset– 27 error tags– Quiet arbitrary tag structure

• UIUC Tagset– Only for articles and prepositions

Page 56: Speech and Language Technology For Dialog-based CALL

PESAA: Pronuciation Feedback

EPD

Error information

User

Forced Alignment

Comparison

Feedback Generation

Actual pronunciation

Speech input

Material

Error Detec-tion

Error candidates

Pronouncing Simulation

ASR

Word-level transcription

Orthographic pronunciation

simulation part

recognition part

error detection & feedback part

Page 57: Speech and Language Technology For Dialog-based CALL

Pronunciation Error simulation:Pronunciation Variants

Canonical pronuncia-tionNative speaker’s pronuncia-tionNon-native speaker’s pronuncia-tion[straik]

[sɨtɨraikɨ]

Strike

Page 58: Speech and Language Technology For Dialog-based CALL

Pronunciation Error simulation:Learning context rules using Generalized TBL

nth initial ma-chine annota-

tion

Collect transfor-mations

Best transforma-tion

List of trans-formations

Machine anno-tated data

Training in-put

Left-right ngram context

Iterative initialization

n := n + 1

Merge transforma-tions

Trainingreference

Majority choice/ Context

n := 0

nth order initial-ization rules

Apply

n

Page 59: Speech and Language Technology For Dialog-based CALL

Pronunciation Error simulation:Multi-tag Result

• Example Input– Input

• Let’s go shopping• # L EH T S # G OW # SH AH P EH NG #

• Example Output– #/# L/L EH/EH T/T S/S #/# G/G OW/OW|AO #/# SH/SH AA/AH|AA P/P IH/IH NG/NG

#/#• #/# L/L EH/EH T/T S/S #/# G/G OW/AO #/# SH/SH AA/AA P/P IH/EH NG/NG #/#• #/# L/L EH/EH T/T S/S #/# G/G OW/OW #/# SH/SH AA/AA P/P IH/EH NG/NG #/#• #/# L/L EH/EH T/T S/S #/# G/G OW/AO #/# SH/SH AA/AH P/P IH/EH NG/NG #/#• #/# L/L EH/EH T/T S/S #/# G/G OW/OW #/# SH/SH AA/AH P/P IH/EH NG/NG #/#

Page 60: Speech and Language Technology For Dialog-based CALL

Pronunciation Error detection/feedback

Error candi-date infor-

mation

Feedback pref-erence

Error confi-dence

Word ASR con-fidence

Phoneme ASR confidence

Feedback deci-sion Feedback

Feedback DB

Page 61: Speech and Language Technology For Dialog-based CALL

Pronunciation Error detection/Feedback:Components

Feedback preference

Error confi-dence

Phoneme ASR confidence

Word ASR con-fidence

)|Pr( xr

),,|Pr( 11 rhef ),,|Pr( 1 rhxe

)|Pr( xh

Page 62: Speech and Language Technology For Dialog-based CALL

62

PESAA: Prosody Feedback

• Stress & Prosodic phrasing & boundary tone

Stress

Prosodic phrasingBoundary

tone

* Existence of word/sentence stress for each syllable/word

* Location of phrase breaks

* Type of boundary tone for each phrasal boundary

Page 63: Speech and Language Technology For Dialog-based CALL

63

Sentence Stress Feedback:Architecture

Alignment

TextText

Analysis

Speech Analysis

Sentence Stress

Prediction

Model

Rule ApplicationRules

PredictedSentence

Stress

ModelTraining Model

Sentence Stress

Detection

DetectedSentence

Stress

FeedbackDiff.

TextAnalysisText

Speech Signal

ModelTraining

Page 64: Speech and Language Technology For Dialog-based CALL

64

Sentence Stress Prediction

• Feature used– Position info: the number of

phonemes in word, the number of syllables in word, …

– Stress info: word stress, sentence stress (rule-based prediction), …

– Lexical info: identity of word, identity of vowel

– Part-of-speech info

Name Description

S-basic Content words

U-basic Functional words

U-adhoc Unclassified FW EX LS POS

U-aux MD special cases

U-adv RP special cases

S-frgn FW foreign words

S-vb Last VB in multi-ple verbs

Page 65: Speech and Language Technology For Dialog-based CALL

65

Sentence Stress Detection

• Feature used– Duration info: duration of vowel, duration of

syllable, normalized duration of word accord-ing to the number of syllables, …

– Intensity info: energy of vowel (+delta)– F0 info: f0 of vowel (+delta)– MFCC info: mfcc of vowel (+delta, +delta-

delta)– Lexical info: identity of vowel

Page 66: Speech and Language Technology For Dialog-based CALL

66

Sentence Stress Feedback

• Adopting output probability– Feedback candidates: syllables in “predicted

stress” with low or high output probability

Pre-dicted stress

It may

be

the

most

im por tan

t ap point

ment

De-tected stress

It may

be

the

most

im por tan

t ap point

ment

Not stressed

Stressed

Page 67: Speech and Language Technology For Dialog-based CALL

67

Sentence Stress Feedback:Snapshot

Page 68: Speech and Language Technology For Dialog-based CALL

PESAA: Grammar Feedback

Spoken English

Written English

User Input

GE Pat-terns

Spoken GE Simu-

lator

GE tagged Texts/

SpeechTrainingSoft Constraint

Correct Sentences

Spoken GE Detec-

tor

SVMTraining

ASR/CNSPEECH

Written GE Detec-

torGE tagged

Texts

Written GE

SimulatorTraining

Soft Constraint

Correct Sentences

GE Pat-terns

SVMTraining

TEXTGE Feed-

back

Page 69: Speech and Language Technology For Dialog-based CALL

Grammar Error detection:Snapshot – written input

Page 70: Speech and Language Technology For Dialog-based CALL

Grammar Error detection:Snapshot – spoken input

Page 71: Speech and Language Technology For Dialog-based CALL

FIELD STUDY

CHAPTER 5

Page 72: Speech and Language Technology For Dialog-based CALL

Field Study: Robot-Assisted Language Learning

Experimental Design1

2 Cognitive Effects

Affective Effects3

Sungjin Lee, Hyungjong Noh, Jonghoon Lee, Kyusong Lee, Gary Geunbae Lee, Seongdae Sagong, Moon -sang Kim. (2011) On the Effectiveness of Robot-Assisted Language Learning, ReCALL Journal, Vol.23(1), SSCI.Sungjin Lee, Changgu Kim, Jonghoon Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee.Affective Ef -fects of Speech-enabled Robots for Language Learning. Proceedings of the 2010 IEEE Workshop on Spoken Language Technology (SLT 2010), Berkeley, December 2010Sungjin Lee, Hyungjong Noh, Jonghoon Lee, Kyusong Lee, Gary Geunbae Lee. Cognitive Effects of Robot-Assisted Language Learning on Oral Skills. Proceedings of Interspeech Second Language Studies Workshop, Tokyo, Sep 2010.

Page 73: Speech and Language Technology For Dialog-based CALL

HRI Technol-ogy

Page 74: Speech and Language Technology For Dialog-based CALL

HRI Experimental Design

• Setting and participants– 24 elementary students– Ranging in age over 9-13– Divided into two groups (beginner, intermedi-

ate)

• Material and treatment– 68 lessons

• 17 lessons for each level and theme– Simple to complex task– 2 hours a week extended over 8 weeks

Page 75: Speech and Language Technology For Dialog-based CALL

HRI Experimental Design

1) PC room

2) Pronunciation training room

3) Fruit and Vegetablestore

4) Stationerystore

Page 76: Speech and Language Technology For Dialog-based CALL

Evaluation of Cognitive Effects

• Data collection and analysis

– Evaluation method• Pre-test/Post-test

– For the listening skills• 15 items for multiple choice question• Cronbach’s alpha

– pre-test: 0.87, post-test: 0.66

– For the speaking skills• 10 items for 1-on-1 interview• Cronbach’s alpha

– pre-test: 0.93, post-test: 0.99

Page 77: Speech and Language Technology For Dialog-based CALL

<Cognitive effects on oral skills for overall students>

Experiment Result

*p < .05

Page 78: Speech and Language Technology For Dialog-based CALL

Evaluation of Affective Factors

• Data collection• Questionnaire (4 point scale without a neutral option)

• Data analysis– For satisfaction in using robots

• Descriptive statistics– For interest in learning English, Confidence with English,

Motivation for learning English• Pre-/Post-test

Affective Factor N Ɨ R ƗƗ

Satisfaction in using robots 10 0.73Interest in learning English 16 0.93(0.9

6)Confidence with English 12 0.91(0.9

0)Motivation for learning English 14 0.91(0.8

3)N Ɨ = Number of questions, R ƗƗ = Cronbach’s alpha in the form of pre-test(post-test)

Page 79: Speech and Language Technology For Dialog-based CALL

Effects on Affective Factors

Satis

factio

n in u

sing r

obots

Intere

st in

learni

ng En

glish

Confid

ence

with En

glish

Motiva

tion f

or lea

rning

Engli

sh01234

Pre-testPost-test

Page 80: Speech and Language Technology For Dialog-based CALL

Thank you