Upload
kristopher-little
View
216
Download
0
Embed Size (px)
Citation preview
CONFUCIUS:An Intelligent MultiMedia Storytelling
Interpretation and Presentation System
Minhua Eunice MaSupervisor: Prof. Paul Mc Kevitt
School of Computing and Intelligent SystemsFaculty of Engineering
University of Ulster, Magee
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Outline
Related research Overview of CONFUCIUS Automatic generation of 3D animation Semantic representation Natural language processing Current state of implementation Relation to other work Conclusion & Future work
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
3D visualisation Virtual humans & embodied agents: Jack, Improv, BEAT MultiModal interactive storytelling: AesopWorld, KidsRoom,
Larsen & Petersen’s Interactive Storytelling, computer games Automatic Text-to-Graphics Systems: WordsEye, CD-based
language animation
Related research in NLP Lexical semantics Levin’s verb classes Jackendoff’s Lexical Conceptual Structure Schank’s scripts
Related research
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Objectives of CONFUCIUS
To interpret natural language sentences/stories and to extract conceptual semantics from the natural language
To generate 3D animation and virtual worlds automatically from natural language
To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system
Story in natural language
CONFUCIUSMovie/drama script 3D animation
non-speech audioTailored menu for script input
Speech (dialogue)
Storywriter
/playwright
User/story listene
r
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Architecture of CONFUCIUS
3D authoring tools, existing 3D
models & character models
visual knowledge (3D graphic library)
Prefabricated objects(knowledge base)
Script writer
Script parser
Natural Language Processing
Text To Speech
Sound effects
Animation generation
Synchronizing & fusion
3D world with audio in VRML
Natural language stories
Language knowledge
mapping
LCS lexicongrammar
semantic representations
visual knowledge
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Software & Standards
Java parsing semantic representation changing VRML code to add/modify animation integrating modules
Natural language processing tools Connexor Machinese DFG parser (morphologic and syntax
parsing) WordNet (lexicon, semantic inference)
3D graphic modelling Existing 3D models (virtual human/object) on Internet Authoring tools
Humanoid characters: Character Studio Props & stage: 3D Studio Max Narrator: Microsoft Agent
Modelling language & standard VRML 97 for modelling geometry of objects, props, environment H-Anim specifications for humanoid modelling
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Agents and Avatars—How much autonomy?
Autonomy & intelligence: highlow
autonomous agents
avatars interface agentsVirtual humans:
Autonomous agents have higher requirements for sensing, memory, reasoning, planning, behaviour control & emotion (sense-emotion-control-action structure) “User-controlled” avatars require fewer autonomous actions-- basic naïve physics such as collision detection and reaction still required Virtual character in non-interactive storytelling between agents and avatars--its behaviours, emotion, responses to changing environment described in story input
characters in non-interactive storytelling
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Graphics library
Simple geometry filesgeometry & joint hierarchy
Files (H-Anim)
animation library(key frames)
objects/props characters
motions
instantiation
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Level of Articulation (LOA) of H-Anim
Joints and segments of LOA1
CONFUCIUS adopts LOA1 in human animation animation engine adds ROUTEs dynamically
based on H-anim’s joints & animation keyframes CONFUCIUS’ human animation adapted for
other LOAs.
Example site nodes on hands
pushing objects
holding objects
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Semantic representations
Categories Knowledge representations Decomposite Typical applications rule-based representation
expert systems
FOPC (First Order Predicate Calculus)
sentence representation, expert systems
semantic networks
lexical semantics
Schank’s scripts
story understanding
frame-based representations
general knowledge representation & reasoning
XML-based representations
multimodal semantics
Conceptual Dependency (CD)
event-logic truth conditions
x-schema and f-structure
Lexical-Conceptual Structure (LCS)
physical knowledge representation & reasoning (inc. spatial /temporal reasoning)
Lexical Visual Semantic Representation (LVSR)
dynamic vision (movement) recognition & generation
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Lexical Visual Semantic Representation (LVSR): semantic representation between language syntax and 3D models
LVSR based on Jackendoff’s LCS adapted to task of language visualization (enhancement with Schank’s scripts)
Ontological categories: OBJ, HUMAN, EVENT, STATE, PLACE, PATH, PROPERTY
OBJ -- props/places (e.g. buildings) HUMAN -- human being/other articulated animated characters
(e.g. animals) as long as their skeleton hierarchy is defined EVENT -- actions, movements and manners STATE -- static existence PROPERTY -- attributes of OBJ/HUMAN
Lexical Visual Semantic Representation
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
PATH & PLACE predicates
PATH predicates
Direction feature
Termination feature
PLACE predicates
contact/attach feature
to 1 1 at unmarked
from 0 1 behind <-contact>
toward 1 0 end_of n/a
away_from 0 0 in unmarked
via n/a 0 in_front_of <-contact>
across n/a n/a near <-contact>
along n/a n/a on <+contact>
out unmarked
over <-contact>
top_of n/a
under unmarked
interpret spatial movement of OBJ/HUMANs 62 common English prepositions 7 PATH predicates & 11 PLACE predicates
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
NLP in CONFUCIUS
Coreference resolution
Part-of-speech tagger
Syntactic parser Morphological parser
Semantic inference
Pre-processing
Connexor FDG parser
WordNetLCS database
FEATURES
DisambiguationTemporal reasoning
Lexicaltemporal relations
Post-lexicaltemporal relations
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Visual valency & verb ontology
2.2.1. Human action verbs 2.2.1.1. One visual valency (the role is a human, (partial) movement) 2.2.1.1.1. Biped kinematics: arm actions (wave, scratch), leg actions (walk, jump, kick), torso actions (bow), combined actions (climb) 2.2.1.1.2. Facial expressions & lip movement, e.g. laugh, fear, say, sing, order 2.2.1.2. Two visual valency (at least one role is human) 2.2.1.2.1. One human and one object (vt. or vi.+instrument) e.g. throw, push, kick, open, eat, drink, bake, trolley 2.2.1.2.2. Two humans, e.g. fight, chase, guide 2.2.1.3. Visual valency ≥ 3 (at least one role is human) 2.2.1.3.1. Two humans and one object (inc. ditransitive verbs), e.g. give, show 2.2.1.3.2. One human and 2+ objects (vt. + object + implicit instr./goal/theme) e.g. cut, write, butter, pocket, dig, cook 2.2.1.4. Verbs without distinct visualisation when out of context: verbs of trying, helping, letting, creating/destroying 2.2.1.5. High level behaviours (routine events), political and social activities
e.g. interview, eat out (go to restaurant), go shopping
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Level-of-Detail (LOD)basic-level verbs & troponyms
EVENT
go
run
cause…
event level verbs
walk climb jump manner level verbs
limp stride swaggertrot
…
skip bounce hopjog romp troponym level verbs
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Current status of implementation
Collision detection example (contact verbs: hit, collide, scratch, touch)The car collided with a wall.
using ParallelGraphics’ VRML extension--object-to-object collision non-speech sound effects
H-Anim examples:3 visual valency verbsJohn put a cup of coffee on the table.
H-Anim Site node locative tags of object (on_table tag for table object)
2 visual valency verbs John pushed the door.John ate the bread.Nancy sat on the chair.
1 visual valency verbsThe waiter came to me: “Can I help you? Sir.”
speech modality & lip synchronization camera direction (avatar’s point-of-view)
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Relation to other work
Domain-independent general purpose humanoid character animation
CONFUCIUS’ character animation focuses on language-to-humanoid animation process rather than considering human modelling & motion solely
Implementable semantic representation LVSR connecting linguistic semantics to visual semantics & suitable for action execution (animation)
Categorization and visualisation of eventive verbs based on visual valency
Reusable common sense knowledge base to elicit implied actions, instruments, goals, themes underspecified in language input
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Prospective applications Children’s education Multimedia presentation Movie/drama production Computer games Virtual Reality
Conclusion & Future work
Humanoid animation explores problems in language visualization & automatic animation production
Formalizes meaning of action verbs and spatial prepositions Maps language primitives with visual primitives Reusable common senses knowledge base for other systems
Further work Discourse level interpretation Action composition for
simultaneous activities Verbs concerning multiple
characters’ synchronization & coordination(e.g. introduce)