57
Conversational Informatics, October 17, 2018 3. Methodologies for Conversational System Development Toyoaki Nishida Kyoto University Copyright © 2018, Toyoaki Nishida, Atsushi Nakazawa, Yoshimasa Ohmoto, Yasser Mohammad, At ,Inc. All Rights Reserved.

3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Conversational Informatics, October 17, 2018

3. Methodologies for Conversational System Development

Toyoaki NishidaKyoto University

Copyright © 2018, Toyoaki Nishida, Atsushi Nakazawa, Yoshimasa Ohmoto, Yasser Mohammad, At ,Inc. All Rights Reserved.

Page 2: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- Architecture

- Scripts and Markup Languages

- Corpus-based approach

- Evaluation

Technical basis for conversational informatics

[Nishida 2014]

Page 3: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Story understanding and generation

Episodic memory

Story understander

Semanticmemory

LearnerDynamic memory

Story generator

Goal

stories[Nishida 2014]

Page 4: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Dialogue management

Intended meaning

Syntax and semantic analysis Sentence generation

Utterances

Discourse analysis Discourse planning

Task

Discourse

[Nishida 2014]

Page 5: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Cognitive computing

[Georgeff 1990]

Reasoner

incoming signals

outgoing effects

Beliefs

Plans

Desires

Intentions

Sensors

Effectors

BDI-architecture

Page 6: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Affective computing

[Picard 1997]

High level

Low level

Representations /

signals

Representations /

signals

Inference and decision making

Pattern recognition and synthesis

Emotional states

Cognitive processing

Page 7: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Cognitive architecture

[Becker-Asano 2008]

Emotion module

PAD space

dominancevalence

dominance

arousal

mood awareness likelihood

social context

primary emotions

secondary emotions aware emotions

emotions X awareness

dynamics / mood

Cognition module

conscious appraisal

non-conscious appraisal

Reasoning layer

Reactive layer

reappraisal(coping)

perception action

elicitation

mapping

Page 8: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Cognitive architecture

[Lim 2010]

Reactive layer

Goals IntentionsFocus and

Coping

Intention generation

Intention selection

Deliberative layer

EnergyIntegrity

AffiliationCertainty

Competence

Personalitythresholds

Autobiographicmemory

Emotional state

EffectorsSensors

Modulatingparameters

Appraisal

Action tendenciesAppraisal

Thresholds

Relations

Properties

Knowledge base

Perception ActionPSI

Page 9: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Cognitive architecture

Reactive layer

Goals IntentionsFocus and

Coping

Intention generation

Intention selection

Deliberative layer

EffectorsSensors

Action tendencies

Appraisal

Realworld

Appraisal

Mental modelMemory Knowledge

Interpretation

[Nishida 2014]

Page 10: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- The system should be able to run multiple processes

simultaneously to handle multimodal signals.

- The system should allow the programmer to write a complex

control structure, without sacrificing real-time response.

- The system should allow the programmer to write codes

across different levels of abstraction, ranging from signals to

semantic representation.

- The system should allow the programmer to use high level

languages to write her/his idea in multiple levels of

abstraction.

- The system architecture should allow the system to be

easily scale-up.

Requirements at the platform level

[Nishida 2014]

Page 11: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

*Hierarchical processing

Signal

Parametric Representation

Symbolic Representation

Input

Parametric representationfrom sensors

Symbolic Representation

Output

Parametric representationfor motors

Page 12: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Blackboard architecture

GECA Protocol

Central NamingService

GECA Server

DataAcquisition

AccelerationSensor

JapaneseASR

MS SAPIData

Acquisition

DataGlove

DataAcquisition

MotionCapture

.Net GECA Plug

CAST

Wrapper

Scenario

EnglishASR

JSAPI

CroatianASR

JSAPI

Java GECA Plug

GECA compliant components

SubscriptionService

CG AnimationPlayer

Wrapper

AnimationDatabase

CroatianVoice Tracks

Eng. Jap. TTSs

C++ GECA Plug

Wrapper

Camera

HeadTracking

BlackboardManager 2

Database 2

BlackboardManager 1

Database 1

One individual component

network connections

sensor devices

legacy software

wrappers

GECA platform components

LTM(Juman + KNP)

[Huang 2008]

Page 13: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- Synchronizing speech, eye gaze, gesture

- Representing personality or emotional states by

gestural behaviors, facial expression, utterances,

etc.

- Coordinating body motions of multiple characters

(in conversations)

- Communication between users and other agents

Script languages / markup languages for building Believable synthetic characters

Authoring embodied conversational agents

[Prendinger 2004]

Page 14: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Script vs mark-up language

Script Interpreter animation

Mark-up Language

Action Realizer

animationAction Planner

Knowledge base

Constraints

[Nishida 2014]

Page 15: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- ConvenienceHides geometric difficulties so that even the authors who have limited knowledge of computer graphics can use it in a natural way.

- Compositional SemanticsSpecification of composite actions based on existing components.

- RedefinabilityAllows the authors to explicitly script actions in terms of other actions.

- ParameterizationAllows the authors to specify actions in terms of how they cause changes over time to each individual degree of freedom.

- InteractionScripting actions should be able to interact with the world, including objects and other agents.

[Huang 2004]

Specifying behaviors of ECA’s, communicative acts like gestures and postures, in particular.

Scripts

Principles

Page 16: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

1. Symbolic Reduction: Reduce complex grammatical forms to simpler ones.

2. Divide and Conquer: Split an input into two or more subparts, and combine

the responses to each.

3. Synonyms: Map different ways of saying the same thing to the same reply.

4. Spelling or grammar corrections.

5. Detecting keywords anywhere in the input.

6. Conditionals: Certain forms of branching may be implemented with <srai>.

7. Any combination of 1-6.

<srai>: stimulus-response, AI

[Wallace 2003]

AIML, developed by the Alicebot free software community during 1995-

2000, enables people to input knowledge into chat-bots. The basic minimalist approach comes from ELIZA.

AIML (Artificial Intelligence Mark-up Language)

Principles

Page 17: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Wallace 2003]

<category><pattern>KNOCK KNOCK</pattern><template>Who is there?</template></category>

<category><pattern>*</pattern><that>WHO IS THERE</that><template><person/> who?</template></category>

<category><pattern>*</pattern><that>* WHO</that><template>Ha ha very funny, <get name="name"/>.</template></category>

AIML (Artificial Intelligence Mark-up Language)

C: Knock knock.R: Who’s there?C: Banana.R: Banana who?C: Knock knock.R: Who’s there?C: Banana.R: Banana who?C: Knock knock.R: Who’s there?C: Orange.R: Orange who?C: Orange you glad I didn’t say banana.R: Ha ha very funny, Nancy.

In AIML the syntax <that>...</that> encloses a pattern that matches the robot’s previous utterance.

Page 18: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- Conversational interfaceleverages natural aspects of human social communication. To be blended with the conventional interface components such as windows, menus, etc. Optional support for speech recognition to realize voice commands. Characters can respond with synthetic speech, recorded audio or texts.

- Animated charactersappear in their own window. embedded in software with Visual Basic / in web page with VBScript

- Social InteractionBased on Media Equation. Use nonverbal social cues that convey attitudes, identities, and emotions. Personality, etiquette, praise, team, gender effects in user interface.

[Clark 1998]

Set of programmable software services that supports the presentation of interactive animated characters.

Microsoft Agent

Features

Page 19: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Microsoft Agent

[Microsoft 1999]

...Agent1.Characters.Load "Genie", “http: ...”Set Genie = Agent1.Characters("Genie")Set Robby = Agent1.Characters ("Robby")Genie.ShowGenie.Play “Greet“ Genie.Speak "Hello!“Robby.ShowRobby.MoveTo 100,200Robby.Play "GestureRight"Robby.HideGenie.Hide...

Genie

Robby

Page 20: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- Easy to useNot assume programming skills.

- IntelligibilityProvides names and abbreviations that clearly indicate their meaning .

- BelievabilityFacilitates scripting consistent emergent behavior. Making the agent move and speak according to its intended personality

- ExtensibilityThe features not supported by MPML can be added at the cost of scripting in the JavaScript or the Java programming language.

- Easy DistributionA converter transforms the MPML script to JavaScript so that it can run on a browser that has JavaScript.

[Prendinger 2004]

XML-based markup language for scripting character-based multi-modal presentations. Provides appropriate tagging structure that enable authors to specify features of presentations by human presenters.

MPML

Principles

Page 21: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- ConvenienceHides geometric difficulties so that even the authors who have limited knowledge of computer graphics can use it in a natural way.

- Compositional SemanticsSpecification of composite actions based on existing components.

- RedefinabilityAllows the authors to explicitly script actions in terms of other actions.

- ParameterizationAllows the authors to specify actions in terms of how they cause changes over time to each individual degree of freedom.

- InteractionScripting actions should be able to interact with the world, including objects and other agents.

[Huang 2004]

Specifying behaviors of ECA’s, communicative acts like gestures and postures, in particular.

Scripts

Principles

Page 22: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Huang 2004]

(a) Direction reference for humanoid

(b) Combination of the directions for left arm

(c) Typical joints for humanoid

(d) Elbow joint in different situations

Parameterization

+Z

+Y

+X

front

down

leftright

up

back

sacroiliac

r_ankle

r_knee

r_hip

r_wrist

r_elbow

r_shoulder

HumanoidRoot

l_ankle

l_knee

l_hip

l_wrist

l_elbow

l_shoulder

skullbase

left_front_down

left_down

left_back_down

left_frontleft_front2left

left_back

left_back_up

left_front_up

left_up

Shoulder: Front

Elbow: FrontElbow: Front

Page 23: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

- XML-based, embodied agent, character attribute, definition and animation

scripting language designed to aid in the rapid incorporation of life-like

agents into online applications or virtual reality world.

- Description

- Classification of motion- Movement, Pointing, Grasping, Gaze, Gesture.

- Head gestures taxonomy- Symbolic gesture, Iconic gesture, Deictic gesture.

- Hand gestures taxonomy- Posture, Motion, Orientation, Gestlets (A set of high-level tags that are constituted from the lower

level tags, to make up gestures like Point, Wave, etc), Fingers.

- Body gestures taxonomy- Natural, Relax, Tense, Iconic, Incline.

- Emotions- Class, Valence (positive/neutral/negative), Subject, Target, Intensity, Time-stamp, Origin.

[Arafa 2004]

CML (Character Markup Language)

Page 24: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

SAIBA framework for multimodal generation

BML: Behavior Markup Language

Intent Planning Behavior Planning Behavior RealizationFML BML

FML: Function Markup Language [Heylen 2008]

BML: Behavior Markup Language [Kopp 2006]

Page 25: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

BML: Behavior Markup Language

[Kopp 2002]

The Goals

⁻ A powerful, unifying model of representations for multimodal generation, based on BEAT, MURML, APML, and RRL.

⁻ Application independent, graphics model independent, and to present a clear-cut separation between information types (functions vs behaviors).

⁻ Work together for better multi-modal behaviors

Page 26: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

BML: Behavior Markup Language

[Kopp 2002]

Calm Cup Wipe

Unit

prep stroke hold stroke prep stroke retract

g-units

g-phrases

g-phases

t

Page 27: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Synchronization

The seven synch points

BML: Behavior Markup Language

[Kopp 2006]

ReadyStart Stroke-start Stroke-endStroke Relax End

Pre-stroke-hold Post-stroke-hold

Page 28: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

BML: Behavior Markup Language

[Kopp 2002]

The BML behavior elements

BML Element Description

<head> Movement of the head independent of eyes. Type include nodding, shaking, tossing and orienting to a given angle.

<torso> Movement of the orientation and shape of the spine and shoulder.

<face> Movement of facial muscles to form certain expressions. Types include eyebrow, eyelid and larger expressive mouth movements.

<gaze> Coordinated movement of the eyes, neck and head direction, indicating where the character is looking.

<body> Full body movement, generally independent of the other behaviors. Types include overall orientation, position and posture.

<legs> Movements of the body elements downward from the hip: pelvis, hip, legs including knee, toes and ankle.

<gesture> Coordinated movement with arms and hands, including pointing, reaching, emphasizing (beating), depicting and signaling.

<speech> Verbal and paraverbal behavior, including the words to be spoken (for example by a speech synthesizer), prosody information and special paralinguistic behaviors (for example filled pauses).

<lips> This element is used for controlling lip shapes including the visualization of phonemes.

Page 29: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

<wait> can align a behavior with a condition or an event

BML: Behavior Markup Language

[Kopp 2006]

Example 1

<bml>

<gesture id=“g1” type=“point” target=“object1”/>

<body id=“b1” posture=“sit”/>

<wait id=“w1” condition=“g1:end AND b1:end”/>

<gaze target=“object2” start=“w1:end”/>

</bml>

g1: point

b1: sit

w1: wait

gaze

t

Page 30: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

<wait> can align a behavior with a condition or an event

BML: Behavior Markup Language

[Kopp 2006]

Example 2

Waits for the “ACT1_COMPLETE” event for maximally 5.0 seconds and speaks the second sentence.

<bml>

<speech id=“s1” type=“text/plain”>

First sentence

<speech/>

<event start=“s1:end” emit=“ACT1_COMPLETE” />

</bml>

<bml>

<wait id=“w1” event=“ACT_COMPLETE” duration=“5.0”/>

<speech type=“text/plain” start=“w1:end”>

Second sentence

</speech>

</bml>

s1: speech: First sentence

t

s2: speech: Second sentence

w1: wait

5 seconds

Page 31: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Timing holds

BML: Behavior Markup Language

[Kopp 2006]

Example 3

<speech id=“s1”> <sync id=“1”/>This or <sync id=“2”/>that.</speech>

...<gesture id=“d2” type=“DEICTIC” ... stroke=“s1:2” />

...<gesture id=“d1” type=“DEICTIC” ... stroke=“s1:1” relax=”d2:ready” />

s1: speech: This or that

t

d1: DEICTIC

1 2

stroke relax

d2: DEICTIC

stroke

ready

Page 32: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Offline phase workflow

[Kipp 2007]

Generating natural gestures

Video corpus

ManualAnnotation

- speech- gesture

Annotation files

Modeling Gesture profiles

Gesture lexicon

Animation lexicon

Page 33: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

NOVACO, a coding scheme for speech and gesture, specially designed for the purpose of gesture generation.

Coding scheme

Speech transcription:

1. Words and segments2. Parts of speech3. Theme/Rheme and focus4. Discourse relations

Gesture transcription:

1. Gesture structurea. Movement phasesb. Movement phrases

2. Gesture classes1. Adaptors2. Emblems3. Deictics4. Iconics5. Metaphorics6. Beats

3. Gesture lexicon and lemmas4. Gesture properties

[Ch7-8, Kipp 2003]

Page 34: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

(1) Capturing temporal structure

Annotation scheme

[Kipp 2007 JLRE]

Calm Cup Wipe

Unit

prep stroke hold stroke prep stroke retract

g-units

g-phrases

g-phases

t

Page 35: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Annotation scheme

[Kipp 2007 JLRE]

(2) Capturing spatial form

Height: {Above head, Head, Shoulder, Chest, Abdomen, Belt, Below belt}, Distance: {Far, Normal, Close, Touch}, Radial orientation: {Inward, front side, out, far out},Arm swivel: {Touch, Normal, Out, Orthogonal}

Page 36: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Gesture notation conventions

[McNeill 1992 Hand and gesture]}LW,LL,LR,LT,RT,UL,UR,UP{}EP,P,C{}CC{

}AB,TB,DN,UP,AC,TC{}F,P{

}RH,LH,BH{

onHandPositi

ationHandOrient

Handedness

Center

Periphery

Center-center

ExtremelyPeriphery Upper

Upper-Left

Left

Lower-Left

Lower

Upper-Right

Right

Lower-Right

Page 37: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

(3) Capturing membership to lexical category

Annotation scheme

[Kipp 2007 JLRE]

The imaginary “circle” may be traced multiple times while potentially moving and/or becoming bigger/smaller in the process. If performed with two hands, the hands can move in a symmetrical fashion or alternate (i.e. the hands revolve around each other) like in this sample.”

Page 38: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Online processing

[Kipp 2007]

Generating natural gestures

InputPreproc.

Gesture creation

G-unit creation

G-unit planning

Generation

Animation Engine

Input text

Gesture script

Gesture profiles

Animation lexicon

Animated character

Offline processing

Page 39: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[p. 102, Kipp 2003]

ANVIL

Requirements as a video annotation tool

- Integrated video player- Intuitive visualization- Multiple layers- Genericness- Temporal inclusion relationship between layers- Efficient coding interface- XML data format- Import/export facilities- Cross-level coding- Online and offline access to coding scheme- Management tool for multiple annotations- Search facilities- Non-temporal elements

Page 40: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Gesture generation by imitation

Terms and levels of abstraction in annotation

[Kipp 2003]

must defineprojectcoding scheme

annotation coderadds to video

provides components for the definition of

provides syntax for

annotation fileIn written into

File format Meta-scheme AnvilIs expressed in implements

provides empty blueprint to be filled

tool

user

Page 41: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[p. 160, Kipp 2003]

Coding reliability

Segmentation reliability

Classification reliability

misseshists

hits

+

Coder 1

Coder 2

e

e

n

i

ii

e

n

i

ii

p

pp

N

ff

p

N

f

p

−=

=

=

=

=

1

0

2

1

1

,

0

Page 42: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Coding reliability

Example 1

C1 C2 C3

C1 8 1 1

C2 1 8 1

C3 1 1 8

5

4

30

8880 =

++=p

3

1

30

1010102

222

=++

=ep

10

7

3

11

3

1

5

4

=

=

Example 2

C1 C2 C3

C1 6 2 2

C2 2 6 2

C3 2 2 6

5

3

30

6660 =

++=p

3

1

30

1010102

222

=++

=ep

10

4

3

11

3

1

5

3

=

=

1+7

2+4

Page 43: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Kohavi 1995]

Adequacy of a model

=hii Dyv

iitH yvDh

acc,

)),,((1

(1) Holdout (test sample estimation)

(2) k-fold Cross Validation

−=Dyv

iiitCV

ii

yvDDn

acc,

)( )),,((1

training test

training test training

1-st ... j-th ... k-th

ground truth (e.g., annotated data)

j-th test set is randomly chosen from the whole ground truth.

Page 44: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Kopp 2007]

Direction giving by speech and gesture

Gesture morphology IDF:corresponding salient visual characteristic

Concrete referent

A qualitative, feature-based framework for describing morphological features encoding meaningful geometric and spatial properties.

Hand orientation defined in terms of extended finger direction and palm direction

How can iconic gestures express meaning?-- how images are mentally linked to referents and how gestures can depict images.

IDF (Image Description Feature)

Page 45: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Kopp 2007]

Direction giving by speech and gesture

Coverbal gesture on “There’s a church”

Page 46: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

In conversation, the listeners’ attention to a shared reference serves as positive feedback to the speaker, and attentional behaviors are indispensable as communicative signals in face-to-face conversation.

[Nakano 2007]

Attentional behaviors

(A) Attention towards the physical world:(A1) From an ECA to a physical object(A2) From a user to a physical object

(B) Attention towards the virtual world:(B1) From an ECA to a virtual object(B2) From a user to a virtual object

Because ECAs and users inhabit different worlds, sharing a reference between them is difficult. To resolve this problem, it is indispensable to seamlessly integrate these two different worlds and broaden the communication environment in human–agent interaction.

Page 47: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Nakano 2007]

Attentional behaviors towards the physical world

Data CodingAs a unit of verbal behavior, we tokenized a turn into utterance units (UU) (Nakatani and Traum 1999), corresponding to a single intonational phrase (Pierrehumbert 1980). Each UU was categorized using the DAMSL coding scheme (Allen and Core 1997). In the statistical analysis, we concentrated on the following four categories with regular occurrence in our data: Acknowledgment, Answer, Information request (Info-req), and Assertion. For nonverbal behaviors, the following four types of nonverbal behaviors were coded:- Gaze at partner (gP): Looking at the partner’s eyes, eye region, or face- Gaze at map (gM): Looking at the map- Gaze elsewhere (gE): Looking away elsewhere- Head nod (Nod): Head moves up and down in a single continuous movement on a vertical axis, but

eyes do not go above the horizontal axis.

Understood Failure Repair

Page 48: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

[Nakano 2007]

Attentional behaviors towards the physical world

Grounding judgment by user’s nonverbal signals

Step 1: Preparing for the next UU. By referring to the Agenda, the DM can identify the speech act type of the next UU, which is the key to specifying the positive/negative evidence in grounding judgment later.

Step 2: Monitoring. The GrM sends the next UU to GM, and the GM begins to process the UU. At this time, the GM logs the start time in the Discourse Model. When it finishes processing (as it sends the final command to the animation module), it logs the end time. The GrM waits for this speech and animation to end (by polling the Discourse Model until the end time is available), at which point it retrieves the timing data for the UU, in the form of time-stamps for the UU start and finish. This timing data is used in the following Judging step.

Step 3: Judging. When the GrM receives an end signal from the GM, it starts the judgment of grounding using nonverbal behaviors logged in the Discourse History. Looking up the Grounding Model by the type of the UU, the GrM identifies the positive/negative evidence in grounding judgment.

Page 49: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

CUBE-G Project

[Matthias Rehm; Elisabeth André; Yukiko Nakano; Toyoaki Nishida 2009]

CUlture-adaptive BEhavior Generation (CUBE-G)for interactions with embodied conversational agents

We investigate whether and how the non-verbal behavior of agents can be generated from a parameterized computational model. Specifying a culture’s position on the basic dimensions allows the system to generate appropriate non-verbal behaviors for the agents. The project combines a top down model-based approach with a bottom-up corpus-based approach which allows empirically grounding the model in the specific behavior of two cultures (Japanese and German), and challenges the following objectives:

1. To investigate how to extract culture-specific behaviors from corpora.

2. To develop an approach to multimodal behavior generation that is able to reflect culture specific aspects.

3. To demonstrate the model in suitable application scenarios.

Page 50: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Dimensions of evaluating embodied conversational agents

- Usability- Learnability, momorizability, and ease of use- Efficiency- Errors

- User perception- Satisfaction- Engagement- Helpfulness- Naturalness and believability- Trust- Perceived task difficulty- Likeability

Evaluation

[Ruttkay 2004]

Page 51: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Data Collection methods

- Qualitative methods- Interview and focus group- Informal or descriptive observation

- Quantitative methods- Questionnaires- Systematic observation- Log files- Heuristic evaluation- Biological measures

[Christoph 2004]

Data collection

Page 52: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

– The IAT procedure seeks to measure implicit attitudes by measuring their underlying automatic evaluation. The IAT is therefore similar in intent to cognitive priming procedures for measuring automatic affect or attitude.

– “In the IAT a subject responds to a series of items that are to be classified into four categories – typically, two representing a concept discrimination such as flowers versus insects and two representing an attribute discrimination such as pleasant versus unpleasant valence. Subjects are asked to respond rapidly with a right-hand key press to items representing one concept and one attribute (e.g., insects and pleasant), and with a left-hand key press to items from the remaining two categories (e.g., flowers and unpleasant). Subjects then perform a second task in which the key assignments for one of the pairs is switched (such that flowers and pleasant share a response, likewise insects and unpleasant). The IAT produces measures derived from latencies of responses to these two tasks. These measures are interpreted in terms of association strengths by assuming that subjects respond more rapidly when the concept and attribute mapped onto the same response are strongly associated (e.g., flowers and pleasant) than when they are weakly associated (e.g., insects and pleasant).”

Implicit Association Test

[Greenwald 1998]

Page 53: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Word1 (e.g., Music)

or

Pleasant

Word2(e.g., Sport)

or

Unpleasant

Stimulus

Left Right

Screen

Keyboard

Implicit Association Test

[Greenwald 2003]

Page 54: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

block Number

of trials

Practice/Test Items assigned to

left-key response

Items assigned to

right-key response

1 20 Practice Music Sport

2 20 Practice Pleasant Unpleasant

3 20 Practice Music+Pleasant Sport+Unpleasant

4 40 Test Music+Pleasant Sport+Unpleasant

5 20 Practice Sport Music

6 20 Practice Pleasant Unpleasant

7 40 Test Sport+Pleasant Music+Unpleasant

47

47

2

1

36

36

2

1

andBlockcyInBlockSDforLaten

kencyInBlocAverageLatkencyInBlocAverageLat

andBlockcyInBlockSDforLaten

kencyInBlocAverageLatkencyInBlocAverageLatScore

−+

−=

Implicit Association Test

[Greenwald 2003]

Page 55: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

Summary

1. Standard methodologies for conversational system development today include architecture, script and markup languages, corpus-based system development, and evaluation.

2. At the architecture level, the key issue is to identify the cognitive components and their interconnections.

3. Script language allows one to code the behaviors of embodied conversational agents without worrying about implementation details.

4. Markup languages are generally less procedural. Although some sophisticated mechanism may be needed, they allow for specifying the behaviors just with specifying constraints.

5. Corpus-based approaches are needed to care about subtleties and personalities.

6. Evaluation is necessary for turning hand-crafting practice into science and technology.

Page 56: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

References

[Arafa 2004] Yasmine Arafa, Kaveh Kamyab, and Ebrahim Mamdani. Toward a Unified Scripting Language: Lessons Learned from Deeloping CML and AML, in: Helmut Prendinger and Mitsuru Ishizuka (eds.) (2004) Life-Like Characters: Tools, Affective Functions, and Applications. Springer. pp. 39-63.

[Aylett 2005] Aylett, R., Louchart, S., Dias, J., Paiva, A., & Vala, M. (2005). Fearnot! – an experiment in emergent narrative. In T. Panayiotopoulos, J. Gratch, R. Aylett, D. Ballin, P. Olivier, & T. Rist (Eds.), Intelligent virtual agents (pp. 305–316). Springer Berlin Heidelberg.

[Aylett 2012] Aylett, R., & Paiva, A. (2012). Computational modelling of culture and affect. Emotion Review, 4(3), 253–263.

[Becker-Asano 2008] Christian Becker-Asano. WASABI: Affect Simulation for Agents with Believable Interactivity, IOS Press, 2008.

[Cassell 1999] Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H. and Yan, H. (1999). "Embodiment in Conversational Interfaces: Rea." Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 520-527. Pittsburgh, PA.

[Cassell 2001] Cassell, J., Vilhjálmsson, H., Bickmore, T. BEAT: the Behavior Expression Animation Toolkit. Proceedings of SIGGRAPH '01, pp. 477-486. August 12-17, Los Angeles, CA, 2001

[Cassell 2004] Justine Cassell, Hannes Hogni Villhjalmsson and Timothy Bickmore. BEAT: the Behavior Expression Animation Toolkit, in: Helmut Prendinger and Mitsuru Ishizuka (eds.) (2004) Life-Like Characters: Tools, Affective Functions, and Applications. Springer. pp. 163-185.

[Clark 1998] Clark, D. and Stuple, S. J. (eds.): Developing for Microsoft Agent, Microsoft Press, 1998.

[Cristoph 2004] Noor Christoph. Empirical Evaluation Methodology for Embodied Conversational Agents, in: Zsofia Ruttkey and Catherrine Pelachaud(eds.) From Brows to Trust -- Evaluating Embodied Conversational Agents, Kluwer Academic Press, pp. 67-99, 2004.

[Greenwald 1998] Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The Implicit AssociationTest. Journal of Personality and Social Psychology, 74, 1464–1480.

[Greenwald 2003] Anthony G. Greenwald, Brian Nosek and Mahzarin R. Banaji. Understanding and Using the Implicit Association Test: 1. An Improved Scoring Algorithm, Journal of Personality and Social Psychology, 2003, Vol. 85, No. 2, 197-216, 2003

[Huang 2004] Zhisheng Huang, Anton Eliens, and Cees Visser. STEP: a Scripting Language for Embodied Agents, in: Helmut Prendinger and Mitsuru Ishizuka (eds.) (2004) Life-Like Characters: Tools, Affective Functions, and Applications. Springer. pp. 87-109.

[Huang 2010] Huang, H.H., Furukawa, T., Ohashi, H., Cerekovic, A., Pandzic, I., Nakano, Y., Nishida, T., How Multiple Current Users React to a Quiz Agent Attentive to the Dynamics of Their Participation, in: Proceeding AAMAS '10 Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, pp. 1281-1288, 2010

[Kipp 2003] Michael Kipp. Gesture Generation by Imitation: from Human Behavior to Computer Character Animation, Boca Raton, Florida: Dissertation.com, December 2004.

[Kipp 2007] Michael Kipp, Michael Neff, Kerstin H. Kipp, and Irene Albrecht. Towards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis, in: C. Pelachaud et al. (Eds.): IVA 2007, LNAI 4722, pp. 15–28, 2007.

Page 57: 3. Methodologies for Conversational System Development · 2018-10-16 · - Convenience Hides geometric difficulties so that even the authors who have limited knowledge of computer

References

[Kipp 2007 JLRE] Kipp, M., Neff, M. and Albrecht, I. (2007) An Annotation Scheme for Conversational Gestures : How to economically capture timing and form. In: Journal on Language Resources and Evaluation - Special Issue on Multimodal Corpora, Vol. 41, No. 3-4, Springer, pp. 325-339.

[Kopp 2002] S. Kopp and I. Wachsmuth: Model-based Animation of Coverbal Gesture. Proceedings of Computer Animation 2002 (pp. 252-257), IEEE Press, Los Alamitos, CA, 2002

[Kopp 2006] Stefan Kopp, Brigitte Krenn, Stacy Marsella, Andrew N. Marshall, Catherine Pelachaud, Hannes Pirker, Kristinn R. Thórisson, and Hannes Vilhjálmsson. Towards a Common Framework for Multimodal Generation: The Behavior Markup Language, in: Proceedings Intelligent Virtual Agents 2006, pp. 205-217, 2006.

[Lim 2010] Lim, M. Y., Dias, J., Aylett, R., & Paiva, A. (2010). Creating adaptive affective autonomous NPCs. Autonomous Agents and Multi-Agent Systems, 24(2), 287–311. doi:10.1007/s10458-010-9161-2

[MacDorman 1008] Karl F. MacDorman, Sandosh K. Vasudevan, Chin-Chang Ho. Does Japan really have robot mania? Comparing attitudes by implicit and explicit measures, AI & Society, Volume 23 Issue 4, pp. 485-510, 2008.

[Nakano 2007] Nakano, I. Y., and Nishida, T.: Attentional Behaviors as Nonverbal Communicative Signals in Situated Interactions with Conversational Agents, in Nishida, T. (ed.). Engineering Approaches to Conversational Informatics, John Wiley & Sons., pp. 85-102, 2007.

[Nishida et al 2014] Nishida, T., Nakazawa, A., Ohmoto, Y., Mohammad, Y. Conversational Informatics–A Data‐Intensive Approach with Emphasis on Nonverbal Communication, Springer 2014.

[Prendinger 2004] Prendinger, H., Ishizuka, M. (eds.): Life-like Characters—tools, affective functions and applications. Springer, 2004.

[Rehm 2009] Matthias Rehm, Yukiko Nakano, Elisabeth André, Toyoaki Nishida, Nikolaus Bee, Birgit Endrass, Hung-Hsuan Huang, Afia Akhter Lipi, Michael Wissner. From observation to simulation: generating culture-specific behavior for interactive systems. AI & Society, Springer Press, Vol. 24, No. 3, 267-280, 2009.

[Ruttkay 2004] Zsofia Ruttkey and Catherrine Pelachaud (eds.) From Brows to Trust -- Evaluating Embodied Conversational Agents, Kluwer Academic Press, 2004.

[Wallace 2003] Richard S.Wallace. The Elements of AIML Style, ALICE A. I. Foundation, 2003.