Master Thesis Vogt

Embed Size (px)

Citation preview

  • 7/27/2019 Master Thesis Vogt

    1/69

    Contents1 Introduction 3

    1.1 Overview of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Related Work 5

    2.1 Imitation Learning For Humanoid Robots . . . . . . . . . . . . . . . . . 52.2 Postural Expression of Emotions . . . . . . . . . . . . . . . . . . . . . . . 62.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    3 Mathematical Foundation 83.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3.2.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 93.2.2 Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . 103.2.3 Isometric Feature Mapping . . . . . . . . . . . . . . . . . . . . . . 113.2.4 Manifold Sculpting . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.3 Dimension Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    4 Learning Interaction Models 164.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Interaction Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 194.4 Learning Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.4.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.4.2 Artificial Neural Net . . . . . . . . . . . . . . . . . . . . . . . . . 244.4.3 Echo State Network . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    4.5 Real-time Human Posture Approximation and Interaction . . . . . . . . 274.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    5 Emotions and Behavior Modifications 295.1 Walt Disneys Principles of Animation . . . . . . . . . . . . . . . . . . . 29

    5.1.1 Squash and Stretch . . . . . . . . . . . . . . . . . . . . . . . . . . 295.1.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.1.3 Anticipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.1.4 Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.1.5 Arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.1.6 Secondary Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    1

  • 7/27/2019 Master Thesis Vogt

    2/69

    Contents

    5.2 Expressing Basic Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.1 Happiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.2 Sadness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.3 Anger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    6 Software Architecture 40

    6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 416.3 Interaction Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . 416.4 Behavior Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.6 Emotional and Behavioral Filters . . . . . . . . . . . . . . . . . . . . . . 456.7 Additional Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    7 Evaluation 46

    7.1 Arm Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.2 Yoga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487.3 Defending Oneself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    8 Conclusion 56

    8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    A Appendix 58

    A.1 A more Technical Interaction Learner Example . . . . . . . . . . . . . . . 58A.2 An Emotion Filter Example . . . . . . . . . . . . . . . . . . . . . . . . . 59A.3 DVD Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    Eidesstattliche Erklarung 69

    2

  • 7/27/2019 Master Thesis Vogt

    3/69

    1 IntroductionCurrent research in the field of robotics can be divided into two main directions; oneis the development of task-oriented robots that work for humans in limited environ-ments and the other direction is the creation of collaborative robots that can coexistwith humans in our open and ever-changing environment. When considering these twodirections industrial robots rank among the former. In contrast to that humanoids andmainly pet robots are developed with the intention to have them placed in human soci-ety and day to day life.In order to do so, humanoids need to be programmed in an economic and timesaving

    way. Traditionally, each joint angle is manipulated one at a time, which is a tedious andcomplex task when the desired motion is supposed to be life-like. Additionally, existingsoftware is often too complex for unskilled users, resulting in the need of an expert.In contrast to that is the human learning process. Humans learn new skills throughimitation [DN02] and this could be of great value to the field of robotics as well. Sinceimitation is a convenient and less complex learning technique, which does not requirespecial training. In a human-centered environment this feature is desirable for humanoidrobots.When observing human two-person interactions it becomes clear that people are accus-tomed to work with other people. Many types of communication rely on human form

    and behavior. Hence, a humanoid robot should take advantage of these communicationchannels and should have the ability to respond to interactions in order to be includedin human society. This would be a great leap forward, because it would lead to systemsthat are open to multiple interaction partners as well as to systems that are not limitedby their environment.In this thesis a novel technique for learning such interactions will be developed. Inclassic imitation learning methods one demonstrator is used to show a motion [SK08].The behavior is then adopted, learned and played back by a humanoid robot. In doingso, no human-robot interaction capabilities are provided since the learned behavior issimply played back. The focus of this thesis is on learning two-person interactions ratherthan the imitation of a single motion. In contrast, the proposed approach uses recorded

    motion data of two demonstrators to learn a generalized interaction model. This modelcalculates motor movements for human-robot interactions.The learning method utilizes motion capture data that has been reduced in dimension-ality. Well-known as well as state-of-the-art dimension reduction techniques are imple-mented to create low-dimensional behavior trajectories. The fundamental idea of thenew learning technique is to create a continuous mapping of one low-dimensional behav-ior to another. In doing so, responsive and reactive motor skills can be calculated duringan human-robot interaction. This allows the robot to change its posture depending on

    3

  • 7/27/2019 Master Thesis Vogt

    4/69

    1.1 Overview of this Thesis

    the users movement, creating a life-like and believable interaction. The underlyingmodel is learned oine and prior to the interaction, so no additional calculations arenecessary during an ongoing interaction.

    1.1 Overview of this ThesisAfter reviewing recent developments in the field of imitation and interaction learning inchapter 2, disadvantages of current approaches will be pointed out. After that funda-mental mathematical concepts are introduced in chapter 3.The novel interaction learning mechanism will be presented in chapter 4. Starting withthe acquisition of human motion data with a Microsoft Kinect camera, the data basisfor the new learning approach is shown. Recorded human motion data can be very high-dimensional since each joint angle is recorded separately. Hence, dimension reductionshould be applied to reduce the computational complexity. Dierent dimensionality re-duction techniques are presented to project the data to fewer dimensions in section 4.3.The learning of a single interaction is emphasized in section 4.4. The underlying map-ping algorithms, namely linear regression, artificial neural nets and echo state networksare introduced. When a user interacts with the robot, it needs to adopt suitable posturesdepending on observed user poses. To calculate these, the interaction model is used. Insection 4.5 the underlying algorithm is explained.Additional behavioral and emotional modifications that can be added to a learned in-teraction model are presented in chapter 5. For that fundamental rules and techniquesof classic hand-drawn cartoon animation will be reviewed and their implementation fortwo-person interaction models is explained. After that, chapter 6 shows the basic soft-ware architecture and which libraries were used to implement all learning algorithms.

    Within chapter 7 three recorded two-person interactions are evaluated. First, a simplearm mirroring example that will be used to introduce interaction models is presented inmore detail. After that two experiments will be conducted were complex motor skillsare learned in a virtual environment. Additionally, all calculated interaction models arecompared regarding the applied mapping algorithm.Chapter 8 summarizes the proposed two-person interaction learning approach. The con-tributions to the field of humanoid robotics as well as computer animation are pointedout. Finally, future research directions are presented.

    4

  • 7/27/2019 Master Thesis Vogt

    5/69

    2 Related WorkIn this chapter recent work on imitation learning in the field of humanoid robotics aswell as current research on the expression of basic emotions and attitude through bodypostures will be summarized.

    2.1 Imitation Learning For Humanoid Robots

    The last 10 years in the field of humanoid robotics have shown that the complexity oftasks that such robots have to perform increased steadily. This makes it hard and di-cult to program them manually. Since the fundamental idea of humanoid robotics is toput these in human social environments new skills should be learned intuitively withoutthe need of an expert.Programming by Demonstration (PbD) has received increasing attention, since it in-volves one of the main principles of human learning: imitation. Imitation, as statedby Thorpe [Tho63], is the ability of humans and higher animals to reproduce a novel,unseen behavior. This skill can be of value for a humanoid robot as well and this is oneof the main reasons for the recent interest in PbD [SK08].Chalodhorn et. al [CR10] developed a model free approach to map low-dimensionaldata, acquired through a motion capture system to a humanoid robot. The humanoid

    learns stable motions with 3D Eigenposes created from motion capture data combinedwith its own sensory feedback. It is generally known that human motion data cannotbe transferred to a humanoid robot without further optimization. This is due to thevariance in statures and is generally known as the correspondence problem [ANDA03].The approach of Chalodhorn has been proposed as a solution for this issue.The authors of [MGS10] present a technique that allows a robot to learn generic andinvariant movements from a human tutor. They state that the underlying interaction isbased on a kinematically controlled model of the demonstrator, which is then used as amodel-based filter. In order to have their robot ASIMO play back a motion an extensivegeneralization step is added to have it adopt the behavior to its body schema.

    Another recent trend is to combine dierent learning methods together with human tu-tor interactions. Cognitive abilities can be added to a humanoid robot. An architecturefor that can be found in [MGS05] and [BMS+05].In [IAM+09] an interaction learning approach is presented that is based in haptic in-teractions. The focus is on giving a robot the ability to engage direct physical contactwith its interaction partner. In order to do so, the authors implemented several machinelearning techniques for adapting the behavior of the robot during real-time interaction.The important feature is that the humanoid robot learns during an interaction with a

    5

  • 7/27/2019 Master Thesis Vogt

    6/69

    2.2 Postural Expression of Emotions

    human. The major drawback of this approach is the need of human in order to evaluatethe interaction.In contrast to that, a framework is presented in [GHW10] where a robot gets the abilityto recognize people and remember details about past interactions. The authors describethat an interaction memory is built with data from already recognized people and their

    related interaction data. The implemented storage structure is XML-based and providesinformation about starting time, location and duration of an interaction. The approachis limited by the interaction medium, since speech is the only way to start an interaction.Additionally, the person starting the interaction has to face robot.Most recently, a Pleo robot has been used by [CSG+11] to implement a low-cost plat-form to treat physical and mental disorders. The robot is trained via demonstration andreinforcement. The dinosaur robot can then play back learned motions to the beat ofmusic. A dance movement in this context is learned by tracking a yellow block held bythe user. In doing so, each leg can be trained separately. The authors stated that due tolow-cost approach a relatively basic robot has been bought. Consequently, the robots

    hardware lacks certain features, like a high resolution camera for high frame rates thatare crucial for the implemented learning approach.Recent work on imitation learning has shown great interest in learning a single behavior.One demonstrator is used to train a motion and a robot plays it back after adapting itto its own body composition. In doing so, the motion is learned by imitating the shownmovement. The learning of interactions through imitation has not been in the focus ofrecent research. In contrast to that, a method will developed in this thesis where twopersons demonstrate an interaction and imitation learning is used to adapt the behaviorof one demonstrator to a humanoid robot to allow human-robot interactions.

    2.2 Postural Expression of Emotions

    Emotions can be conveyed through body postures. For that, the authors of [BbK03] col-lected several postural expressions of emotions by utilizing a motion capture system with32 markers. These postures where then presented to an audience, who had to classifythese postures into emotion categories. Bianchi and colleagues clearly state that basicemotions can be conveyed by virtual body postures recorded with motion capture de-vices. This corresponds to previous works of Bull [Bul78], who conducted several studiesin the field of postural expression of emotions in humans from a psychological point ofview. For example he documented that interest can be communicated by slightly leaning

    forward while having the legs drawn back. Additionally, Atkinson et. al [ADGY04] ar-gued that five emotional states (anger, disgust, fear, happiness, and sadness) can clearlybe expressed by body postures even with varying exaggeration.Walt Disney stated in the early 1930s that emotion and attitude can be expressed incartoons by using body features. One of his most famous examples is the half-filledflour sack. These drawings show, that even with most simplistic shapes attitudes canbe conveyed [Joh95].In regard of computer animation emotional additions to virtual humans can be crucial

    6

  • 7/27/2019 Master Thesis Vogt

    7/69

    2.3 Conclusion

    for their liveliness and believability, because they aect peoples cognitive processes,perceptions, beliefs and the way they behave [MTKM08]. Recent development in neuro-science has shown that body postures are more import in conveying an emotional statesin cases of incongruent aective displays [GSG+04].The model-based approach of Garcia et. al [GRGT08] tries to create lifelike reactions

    and emotions in virtual humans. The underlying reaction model consists of a decisiontree that is based on a statistical analysis of the reaction of people. An emotion updatemodel is utilized to increases the vividness of a computer animation. The resulting ac-tion is calculated with a combination of key frame interpolation and inverse kinematics.In doing so, dierent levels of an emotion can be expressed, which is not backed bypsychology theories.The fundamental results in human psychology regarding the expression of postural emo-tions have proven to be applicable for humanoid robots as well. For the Nao robot forexample, Marzooqi et. al [MC11] have conducted a study whereby the software shippedwith the robot has been used to express emotions successfully. The study summarizes

    that anger, happiness and sadness can be conveyed on the humanoid robot. Also [ET10]showed that these emotions can be expressed on the robot platform.A more practical application for emotions in humanoid robots has been implementedby [KNY05]. The authors stated that emotions are crucial for autism therapy. Also[Adr07] reported that bodily postures can be used to emulate empathy in socially assis-tive robotics.

    2.3 Conclusion

    The mentioned interaction learning approaches achieve remarkable results for specific

    robot platforms. Nevertheless, they are seldom transferable to other systems or robots.Also the proposed algorithms and methods do not provide the functionality to manip-ulate learned behaviors in regard of their visual appeal. That is, a learned movementwill be executed the same way it has been learned. Additionally, most of the existingtechniques focus on learning a single behavior rather than an interaction involving twopersons.The aim of this thesis is to overcome these limitations. In doing so, a novel approach willbe developed for learning two-person interactions by imitating one interaction partner.Additionally, an implementation of well-know character animation methods is presented.The basis for these are the fundamental rules of Walt Disneys Principles Of Animation

    [Joh95]. In conjunction with that emotional additions are presented to control a learnedmodel even further.

    7

  • 7/27/2019 Master Thesis Vogt

    8/69

    3 Mathematical FoundationRecorded human motion data can be high-dimensional since each joint angle is recordedseparately. Hence, dimensionality reduction should be applied. In this chapter funda-mental concepts for dimension reduction will be introduced. The algorithms utilizedhere are explained on manifolds used in literature. Later, a method will be presentedto project unseen points in low-dimensional embeddings into the manifolds originaldimension.

    3.1 IntroductionThe goal of dimension reduction is to decrease the size of a dataset while preserving theinformation within it. This is usually done by finding hidden structures and the leastdimensional embedding space [Ros08, Ben10].To demonstrate each algorithm synthetic, non-linear, two-dimensional datasets lying ina three-dimensional space, namely a Swiss roll, a S-shaped curve and a fishbowl willbe used (see figure 3.1). The two-dimensional shapes of each dataset are well knownand often used in literature [SMR06, SR04, SR00, M06, TSL00], hence the embedding,obtained by dierent dimension reducers can be compared. For that the Pearsonscorrelation and the Spearmans coecient can be used [SMR06].

    Figure 3.1: Two-dimensional non-linear manifolds lying in a three-dimensional space.From left to right: the Swiss roll, fishbowl and S-shaped surface.

    3.2 Dimensionality Reduction

    The focus of this chapter is on four dimension reduction algorithms, namely PCA (sec-tion 3.2.1), LLE (section 3.2.2), IsoMap (section 3.2.3) and Manifold Sculpting (section3.2.4). Within the succeeding sections the basic mathematical foundation of each willbe discussed.

    8

  • 7/27/2019 Master Thesis Vogt

    9/69

    3.2 Dimensionality Reduction

    3.2.1 Principal Component Analysis

    Principal component analysis (PCA), also known as Hotelling transform or empiricalorthogonal function (EOF), tries to reduce the dimensionality of data based on thecovariance matrix

    Pof variables (see equation 3.1). The PCA seeks to reduce the

    dimensions by finding fewer orthogonal linear combinations (the principal components)of the original variables depending on the largest variance. The first principal component(PC) is the linear combination with the largest variance. The second PC is the linearcombination with the second largest variance and orthogonal with the first PC and soforth. It is customary to standardize the variables, because the variance depends on thescale of the variable [Hot33].

    X

    pp

    =1

    nXXT (3.1)

    The o-diagonal terms ofP

    quantify the covariance between the corresponding featuresand the diagonal terms capture the variance in the individual features. The idea of PCA

    is to transform the data so that the covariance terms are zero. When using the spectraldecomposition theorem, it is possible to write

    Pas

    X= UUT (3.2)

    where U is an orthogonal matrix containing the eigenvectors and is a diagonal matrixwith ordered eigenvalues. The total variation equals the sum of the eigenvalues of thecovariance matrix.

    pX

    i=1

    V ar(PCi) =pX

    i=1

    i (3.3)

    Depending on the eigenvalues the contribution of the first l principal components canbe calculated with the following fraction:

    Pli=1 iPpi=1 i

    (3.4)

    The computed eigenvectors hold the principal components of the data set. Furthermore,

    Figure 3.2: The datasets introduced in section 3.2 reduced in dimensionality with prin-cipal component analysis. Color is used to indicate how points in the result correspondto points on the high-dimensional manifold.

    9

  • 7/27/2019 Master Thesis Vogt

    10/69

    3.2 Dimensionality Reduction

    the calculated PCs are the basis of the low-dimensional subspace. It is possible to projectany point from the original space into this space and vice versa. Dimension reductioncan be achieved by subtracting the sample mean value and then calculating the dotproduct of the result with each of the l PCs.

    3.2.2 Locally Linear Embedding

    Locally linear embedding (LLE), as presented by [SR04] is an unsupervised learningalgorithm that computes a low dimensional embedding while preserving neighbor re-lationships. Essentially, the algorithm tries to calculate a low dimensional embeddingwith the property that neighboring high dimensional points remain neighbors in lowdimensional space.LLE is based on the assumption that the data is well-sampled and that the underlyingmanifold is smooth. Within in this context smooth and well-sampled mean that thedatasets curvature is suciently sampled, in a manner that each high dimensional point

    has at least 2d neighbors. Under this prerequisite a neighborhood on the manifold canbe characterized as a linear patch.The LLE algorithm calculates a low dimensional dataset as follows:

    1. Gathering of all neighboring points - Calculate all neighbors from every point xiwithin the dataset (possibly with a k-dimensional tree).

    2. Calculation of weights for patch creation - Compute the weights Wij to approximatexi as a linear combination with its neighbors while minimizing the reconstructionerror in equation 3.5.

    E(W) =X

    i

    |xi X

    j

    Wijxj|2 (3.5)

    3. Mapping of embedded coordinates - Map embedded coordinates by determining thevector y with the weights Wij. This is done by minimizing the quadratic form inequation 3.6 with its bottom nonzero eigenvectors.

    (y) =X

    i

    |yi X

    j

    Wijyj|2 (3.6)

    While minimizing the cost function in equation 3.5 two constraints need to be takeninto account. Firstly, each data point xi will be reconstructed only with its neighbors,

    resulting in Wij = 0 if xi is not part of the set and secondly the rows of the weightmatrix need to sum to one. The optimal weight matrix W is then found by solving aset of constrained least squares problems.One drawback of LLE is its sensitivity to noise. Even small noise can cause failure inobtaining low-dimensional coordinates [CL11]. Also, the algorithm is highly delicate toits two main parameters the amount of neighbors and the regularization parameter.

    10

  • 7/27/2019 Master Thesis Vogt

    11/69

    3.2 Dimensionality Reduction

    Figure 3.3: The datasets introduced in section 3.2 reduced in dimensionality withlocally linear embedding. Color is used to indicate how points in the result correspondto points on the high-dimensional manifold.

    3.2.3 Isometric Feature Mapping

    The isometric feature mapping algorithm (IsoMap) is a multidimensional scaling ap-

    proach generalized to non-linear manifolds. Within IsoMap the dimensionality reductionproblem is viewed as graph problem, which requires the distances between all pairs i andj from N data points in the high dimensional space X as input. The output calculatedby IsoMap are coordinate vectors in a lower k-dimensional space Y that represents theintrinsic geometry of the underlying data [TSL00].The IsoMap algorithm, as stated by Tenebaum and colleagues [TSL00], is composed ofthe following three steps:

    1. Estimation of the neighborhood graph - First of all the algorithm determines theneighbors on the manifold based on their Euclidean distances dX(i, j)1. The neigh-bor relationships are then represented as a weighted graph with an edge weight of

    dX(i, j).

    2. Calculation of the shortest path in the neighborhood graph - The geodesic distancesdM(i, j) between all points on the manifold are estimated by computing the shortestpath lengths dG(i, j) in the graph G. dG(i, j) is set to dX(i, j) if i and j are linkedby an edge, otherwise to 1. Each value ofn = 1, 2, . . . , N with the entities dg(i, j)are replaced by min{dG(i, j), dg(i, n) +dg(n, j)}. Finally, the values of the shortestpaths are stored in a matrix DG = {dG(i, j)}

    3. Construction of lower dimensional embedding - Multidimensional scaling is appliedto DG in order to achieve dimensionality reduction. The preserving embedding in

    the k-dimensional Euclidean space is created by minimizing the cost function forall coordinate vectors Y:

    E= ||(DG) (DY)||L2 (3.7)

    DY denotes the matrix containing low dimensional Euclidean distances of twopoints {dY(i, j) = ||yi yj||}. Within the context of equation 3.7, the operator

    1Within this thesis a k-dimensional tree is used to determine the neighbors of a given point

    11

  • 7/27/2019 Master Thesis Vogt

    12/69

    3.2 Dimensionality Reduction

    converts distances to inner products. More precisely, is defined by HSH/2,where S is the matrix of squared distances Sij = D

    2ij and H is the centering

    matrix (Hij = ij 1/N) [TSL00].

    Figure 3.4: The datasets introduced in section 3.2 reduced in dimensionality withisometric feature mapping. Color is used to indicate how points in the result correspondto points on the high-dimensional manifold.

    The accuracy of IsoMap depends highly on the amount of neighbors kn being used inorder to create the weighted graph. This parameter has to be set independently for eachproblem as it can have a considerable impact on the calculated results (see figure 3.5).For reasons of simplification an automatic regulator can be used [SMR06].

    (a) kn = 8 (b) kn = 24 (c) kn = 256

    Figure 3.5: The accuracy of IsoMap depends highly on the neighborhood parameterk, which is shown with three dierent values for the Swiss roll dataset. Using a valuetoo small, discontinuities of the graph can occur, causing the manifold to fragment intodisconnected clusters. In contrast, values to large will include data points from otherbranches of the manifold, shortcutting them. This would lead to errors in the final

    embedding [SMR06].

    3.2.4 Manifold Sculpting

    A novel approach presented by Gashler et. al [M06], referred to as Manifold Sculpting(MS) is an NLDR algorithm, which iteratively transforms data by balancing two op-posing heuristics, one that scales information out of unwanted dimensions and one that

    12

  • 7/27/2019 Master Thesis Vogt

    13/69

    3.2 Dimensionality Reduction

    preserves the local structure of the data.First of all, the algorithm searches for all k-nearest neighbor points Ni for a given pointpi. During the second step the algorithm computes the Euclidean distances ij betweenpi and its k-nearest neighbors nij. Meanwhile, the angle i between two line segmentspinij and nijmij, where mij is the most collinear neighbor ofnij with pi, is calculated2.

    The algorithm tries then to retain the values of and i during the transformation andprojection.

    Figure 3.6: The datasets introduced in section 3.2 reduced in dimensionality withManifold Sculpting. Color is used to indicate how points in the result correspond topoints on the high-dimensional manifold.

    Before transforming the data a pre-processing step can be included in order to achievefaster convergence. [M06] describe that a principle component analysis can be applied tomove the information in the data to fewer dimensions. The first d principal componentsare calculated, where d is the number of dimensions that will be preserved during theprojection. Later, the dimensional axes are rotated to align with these PCs.

    The next step iteratively transforms the data until the local changes fall below a thresh-old, which can be set depending on the desired output quality. The dimensions whichwill not be preserved during transformation Dscaled, are scaled down within each step,so the values slowly converge to zero. The preserved dimensions Dpreserved are scaled upto keep their average neighbor distance. Then, the neighbor relationships are recoveredwith an error heuristic. For that, all entries of Dpreserved are adjusted with a simplehill-climbing technique in that direction that yields improvement. Once the transforma-tion is done, Dscaled will contain only values close to zero, which will be dropped duringprojection in order to achieve dimension reduction.Manifold sculpting is robust to sampling holes while preserving high quality results underthe assumption that a high sample rate is used [M06]. However, its computational com-

    plexity associated with the required hardware resources need to be taken into accountwhen analyzing larger datasets.

    2The point with an angle closest to is called co-linear neighbor.

    13

  • 7/27/2019 Master Thesis Vogt

    14/69

    3.3 Dimension Expansion

    3.3 Dimension Expansion

    When playing back an animation on a robot or avatar, one needs to control all jointangles. For that a low dimensional model, e.g. a dataset reduced in dimensionality needsto be transformed continuously back into its original dimension.

    The idea behind some dimension reduction approaches is that neighboring points in highdimensional space remain neighbors in the low dimensional space. Those relationshipscan also be exploited during dimension expansion as follows:

    1. Search for low-dimensional neighbor points - Firstly, a search in the low dimen-sional space for the k-nearest neighbors (P1l , . . . , P

    kl ) of a given low-dimensional

    point Pl is conducted.

    2. Creation of a weight matrix - Then, a weight matrix W is created so that Pl isapproximated by a linear combination of its k neighbors.

    Pl =kX

    i=0

    WiPil (3.8)

    3. Restore high-dimensional neighbors utilizingWi - The idea is then that neighbor-ing points within low dimensional space remain neighbors in high dimensionalspace. Hence, every neighbor ofPl has an exact high dimensional representation(P1h , . . . , P

    kh ) that can be identified by its index i (1 i k). A high dimensional

    point Ph is then found by multiplying the high-dimensional neighbors Pih with the

    weight matrix W.

    Ph =kX

    i=0

    WiPi

    h

    (3.9)

    This algorithm is applied to all low-dimensional points that do not equal any point withinthe given dataset. It is obvious that this dimension expansion technique needs high- andlow-dimensional representations of a dataset in order to calculate a high-dimensionalrepresentation Ph of an unknown low-dimensional point Pl.

    3.4 Summary

    In this chapter important mathematical concepts that are crucial for the interaction

    learning approach which will be developed in this thesis have been introduced. Severaldimensionality reduction techniques utilizing well-known as well as current state-of-the-art algorithms were explained. It has been pointed out that some try to calculatelow-dimensional embedding with linear approximations, while others use approachesoriginated in graph theory.Figure 3.7 shows the low-dimensional embeddings of the introduced dimension reductiontechniques applied to three datasets. The overall precision of the produced results canvary and each concept has its advantages, as well as limitations. The analysis of each

    14

  • 7/27/2019 Master Thesis Vogt

    15/69

    3.4 Summary

    algorithm was performed on simple datasets used in literature, which are in general notadoptable to other applications or domains. Hence, a preference for a specific techniquewas not presented and each algorithm will be analyzed in regard of its applicability forrecorded interaction data in the following chapter.Due to the nature of some dimension reduction techniques a projection into the original

    dimension is not possible. A simple mathematical concept to transform unseen inputdata in low-dimensional embeddings to high-dimensional space has been presented.

    PCA

    Original

    Dataset

    LLE

    IsoMap

    Manifold

    Sculpting

    Figure 3.7: The figures shows all introduced dimension reduction methods appliedto three simple datasets used in literature. As it can be seen the results vary greatlyregarding their shape and precision.

    15

  • 7/27/2019 Master Thesis Vogt

    16/69

    4 Learning Interaction Models4.1 Overview

    In this chapter a novel interaction learning method will be introduced that allows hu-manoid robots as well as virtual humans to interact with people (see figure 4.1). Thefoundation of this approach is an interaction model which is created from demonstratedtwo-person interactions.

    Figure 4.1: Human motion is recorded utilizing a depth camera. The acquired be-haviors are then used to learn a low-dimensional mapping onto a virtual humans orhumanoid robots behavior. For that a two-person interaction model is learned.

    The learning technique is based on low-dimensional behavior embeddings that are cal-culated for both demonstrators. In doing so, a small and yet complete dataset is createdfrom a shown two-person interaction. In order to animate the virtual human or manipu-late a robots joint angles, an algorithm will be introduced that learns how to map bothshown movements while preserving temporal features. This mapping is essential, sinceit will be used to predict a users posture and calculate suitable robot or virtual humanpostures in order to have users interact with them.

    16

  • 7/27/2019 Master Thesis Vogt

    17/69

    4.2 Interaction Data Acquisition

    Additionally, this model can generalize observed human motion so that various versionsof a single action are possible. This allows synthetic humanoids and robots to adaptknown behaviors to changing situations. This is especially useful when considering thathumans tend to have a low repetitive accuracy when performing tasks [TLKS08].In the following it will be discussed how human motion data can be acquired using a

    Microsoft Kinect depth camera. Furthermore, the dimension reduction techniques intro-duced in section 3.2 applied to recorded human behaviors are explained. One dimensionreducer will be emphasized since it is ideal for the interaction learning approach thatwill be developed in this thesis. Later, the algorithm for extracting a generalized modelof two person interactions will be introduced. The role of dimension reduction will beexplained, since the interaction model is based low-dimensional training data. In con-junction with that, the approximation of human poses during real time interaction isaddressed.

    4.2 Interaction Data AcquisitionThe interaction learning approach presented here is based on the fact that humans oftenlearn from imitation. In doing so, they observe others and adopt behaviors [Zen06].This ability can be valuable for synthetic humanoids as well, giving them a tool to learnhow to interact with people. The necessary motor movements can be calculated fromobserved human joint angles. This is possible due to the similarity in body compositions[Ben10].Motion capture systems have become increasingly popular for life-like virtual characteranimation [PB02]. The motion data used in this thesis is recorded with a MicrosoftKinect depth camera utilizing the same-titled software development kit. There are sev-

    eral reasons for this choice. Mainly, the consumer market availability and the low-costnature of this motion capture device have been determining the decision.

    Figure 4.2: The figure illustrates the recognized joint rotations. Not all axes of rotationhave been added for reasons of clarity and comprehensibility. A complete list of all jointangles can be found in [Ber11]

    17

  • 7/27/2019 Master Thesis Vogt

    18/69

    4.2 Interaction Data Acquisition

    The underlying framework which extracts human body postures is developed by Berger[Ber11] and supports a continuous recording from up to two persons simultaneously.The recognized joint angles are displayed in illustration 4.2. Also the Microsoft Kinectcamera is used for joint angle extraction during real-time interaction. For that, jointangle values are transformed into the low-dimensional behavior space and used as input

    data for a learned two-person interaction model in order to react interactively.A trivial imitation example is shown in figure 4.3. Two persons were instructed tomirror the others arm movement. After the left person (A) received a secret sign hestarted pulling up his right arm. Shortly after starting this movement the right person(B) noticed the changing shoulder angle and he corrected his own left arm pose. Whenthe shoulder angle of the left person reached approximately 45 degrees he was shownanother secret sign to lower his arm again. During the next chapters this example willbe used to teach a virtual human, to mirror the pose of the right person.

    Figure 4.3: The figure illustrates steps during a mirrored two-person arm movement.The recording of this behavior consists of 240 frames measuring 50ms each.

    The precision of calculated joint angles depends on the frame rate used to record the hu-man behavior. In general, low frame rates lead to small datasets with a lack of accuracy,whereas high frame rates result in larger datasets with increased precision. Furthermore,

    a higher sampling rate leads to redundant joint angle values for slow motions. In contrastto that lower rates increase the risk of having too few measurement points, resulting inchoppy animations. For the behaviors used in this thesis a frame rate between 16fps and25fps has been proven to be well-suited.In order to map an animation on a virtual human the joint angle values need to betransformed from the recorded human space into the coordinate system of the virtualrepresentative. Varying degrees of freedom complicate this calculation. This applies alsohumanoid robots and is generally known as the correspondence problem [ANDA03].To overcome this issue for humanoid robots, multiple genetic algorithms combined withtrajectory adaption are applied [Ber11]. In doing so, acquired movements are trans-formed into the robots space, allowing it to adopt human behaviors within a simulation.Due to the low accuracy of the simulation engine in use, not all behaviors can be playedback on a robot.In this chapter it has been briefly explained how a Microsoft Kinect depth camera canbe used to record human motion. Acquired joint angles are stored in a high-dimensionalmatrix. Since each value is stored in one column with one row per time step this leadsto large records for longer motions. In the following section the previously introduceddimensionality reduction algorithms are applied to decrease the size of each behavior.

    18

  • 7/27/2019 Master Thesis Vogt

    19/69

    4.3 Dimensionality Reduction

    4.3 Dimensionality Reduction

    The human body is an articulated object with very high number of degrees of freedom(DOF). One problem that arises when recording human motion is that not all measuredvariables are important in order to understand the underlying phenomena. A walking

    gait for example, is a one-dimensional manifold embedded in a high-dimensional visualspace [EL04]. From an observer perspective the shape of a walking person deforms overtime within its physical constraints. When we consider the silhouette as a point in high-dimensional space it becomes obvious that it moves on a low-dimensional manifold overtime. Dimensionality reduction should be applied to strip othe redundant information,producing a more economic representation of the recorded motion. Also dimensionalityreduction is not only beneficial for computational eciency but can also improve theaccuracy of data analysis [Fod02].

    (a) Person A pulling up its right arm

    (b) Person B pulling up its left arm

    Figure 4.4: An arm movement dataset reduced to two dimensions with (from left toright) PCA, LLE, IsoMap and Manifold Sculpting. Color is used to display the temporalcoherence within the dataset. Meaning that during the execution of the movement alow dimensional point would move along the curve, with a starting color from dark bluethrough to an ending color of light gray.

    In the following sections the previously introduced arm mirroring dataset (see chapter

    4.2) will be used to compare the dimension reduction techniques introduced in chapter3.2 in regard of their applicability for the interaction learning approach presented inthis thesis. The main focus is on practicability and readability of the produced low-dimensional trajectory rather than its precision. A justification for that will be given inchapter 4.4.The first diagram of figure 4.4 is a visualization of the embedding calculated with prin-cipal component analysis. Since the eigenvalues are sorted in a descending manner, thefirst few principal components encode most of the information. In the domain of robot

    19

  • 7/27/2019 Master Thesis Vogt

    20/69

    4.3 Dimensionality Reduction

    motion a few PCs, e.g. five to eight are sucient enough to store up to 97% of theinformation [Ber09, Ben10]. For that, the first l principal components have to have acumulative proportion greater than 0.97 in order to restrain the information lost to 3%.

    (a) Person A pulling up its right arm (b) Person B pulling up its left arm

    Figure 4.5: Principal Component Analysis applied to an arm movement dataset. Grayregions mark key postures. Since the arm was pulled up and down again the PCA createsan enclosed trajectory

    Each point within the calculated space has a corresponding posture [Ben10], hence thename posture space. Some of these are shown in figure 4.5. Because of the underlyinglinearity, postures can be obtained from points that do not equal points on the recordedtrajectory.The second diagram in figure 4.4 is an illustration of the embedding space calculated bylocally linear embedding. The measured joint angle values are not noiseless1, since theywere captured using an infrared camera. Hence, LLE failed to obtain low-dimensionalcoordinates because of its sensitivity to noise [CL11]. The low amount of measurementframes boosts this problem even further.

    A low-dimensional embedding calculated with isometric feature mapping can be seen infigure 4.4 (second graphic from right). Similar to LLE, the IsoMap algorithm was notable to create a low-dimensional space. This is once again due to the low amount ofmeasurement points.In contrast to that, the Manifold Sculpting algorithm is able to obtain a correct em-bedding (see figure 4.6). But MS lacks the ability of transforming additional pointsfrom high-dimensional space into the low-dimensional embedding. This results in the

    1An evaluation concerning noise when using a Microsoft Kinect camera can be found in [Ber11]

    20

  • 7/27/2019 Master Thesis Vogt

    21/69

    4.4 Learning Interaction Models

    need of an external transformation tool for unseen points in order to obtain their low-dimensional coordinates.

    (a) Person A pulling up its arm (b) Person B pulling up its arm

    Figure 4.6: Manifold Sculpting used to reduce an arm movement dataset in dimen-sionality. Gray regions mark key postures.

    The output of each algorithm varies greatly regarding its visual appearance and continu-ity. Within the context of this thesis, the smooth trajectory curvatures and the simplemathematical concept of PCA makes it the ideal dimension reducer. This also corre-sponds to arguments of Chalodhorn et. al [CR10] to use PCA for learning algorithms.Nevertheless, all techniques have their advantages as well as limitations. For the inter-action learning approach presented here any dimensionality reduction technique can beused, as long as the embedding is suciently smooth within low-dimensional space.

    4.4 Learning Interaction Models

    The low-dimensional representations, introduced in section 4.3 are a compressed versionof a single behavior and equivalent to an array of temporary postures adopted fromhuman teachers during an interaction. That is, that these models are complete recordsof a shown two-person interaction packed in two low-dimensional spaces. Each point in

    this space corresponds to a posture. So they could be adopted by a humanoid robot orvirtual human when transformed back into their original dimension.In the example in section 4.3 a dataset has been recorded where one person mirrors thearm movement of another one. Now, this scenario will be altered in a way, that a virtualhuman will replace the second person (person B). Within a simulation an avatar will becontrolled by the first person in order to interact with the virtual human. The virtualhuman is then instructed to mirror the arm movement of the avatar.When a person is instructed to repeat a shown behavior it is unlikely that the executed

    21

  • 7/27/2019 Master Thesis Vogt

    22/69

    4.4 Learning Interaction Models

    behavior equals the one in the recording [Ben10]. That is, postures and movement speedswill most likely vary. Since the virtual human does not know how fast the arm movementwill be, it needs to react interactively in order to mimic the behavior correctly. Becauseof the concurrent recording of both humans, the temporal coherence remains within thedatasets.

    The low-dimensional model of the second person is now assigned to a robot or virtualhuman (see figure 4.7). During a user interaction a suitable pose from the posture spacehas to be adopted with the least amount of delay, avoiding unnatural waiting periods.

    (a) Person A pulling up its arm (b) Behavior of person B assigned to a Nao robot

    Figure 4.7: The figure illustrates the arm mirroring data of person A and the assignedpostures for a virtual Nao robot. Once again, color is used to mark the temporalcoherence between both datasets.

    The question is, when and how the virtual human has to react on an observed humanpose. This is done by searching the assigned low-dimensional model, looking for a pos-ture that is suitable for the observed low-dimensional posture of the first person. Thus,the virtual human needs a continuous mapping from one behavior to another. This be-comes obvious when analyzing the low-dimensional behavior trajectory. Every human

    pose has a corresponding point in the low-dimensional space, so when repeating an in-teraction, this point moves along the behavior trajectory (see figure 4.8). Due to thelow repetitive accuracy of human movements the newly found point will most likely notlie exactly on the trajectory, but rather in range of it.Also humans vary in size and their silhouette, which results also in diering low-dimensionalpoints when they adopt the same pose, making a continuous mapping of both recordedbehaviors indispensable. An aggregation of both low-dimensional spaces into one is notpossible due to the fact that some dimension reduction techniques use non-linear ap-

    22

  • 7/27/2019 Master Thesis Vogt

    23/69

  • 7/27/2019 Master Thesis Vogt

    24/69

    4.4 Learning Interaction Models

    low-dimensional space can be seen in figure 4.9 (right). This distance can be consideredan error heuristic for the interaction model. A distance of zero between both pointsresults in an equal posture. If the error is greater than zero the two poses will appearto be similar. With raising error values the similarity will decrease. Two red arm posesin figure 4.9 are extracted from regions with the largest error. The desired trajectory

    is highlighted green, whereby the learned curve is marked red. Resulting dierences inarm poses are indicated for regions with the largest values.

    Figure 4.9: Left: The arm mirroring data of person A mapped with linear regressiononto the behavior of person B. The desired values are marked green. Calculated resultsare highlighted red. Right: The calculation error obtained during LR mapping. Coloredcircles are used to show how regions with the highest error correspond to the behaviortrajectory.

    The advantage of LR is its computational scalability and flexibility, compared to classicor recurrent neural nets. However, the disadvantage of this approach lies in the accuracyof the produced results, when dealing with complex non-linear datasets (see chapter 7).

    4.4.2 Artificial Neural Net

    Learning input-output relationships from recorded motion data can be considered asthe problem of approximating an unknown function. An artificial neural net (ANN)is known to be well suited for this [MC01]. This learning algorithm is inspired by thestructure of biological neural nets and its adaptive system changes its structure based

    on internal and external information.In contrast to LR artificial neural nets can learn from data. This makes it possible to usepreceding human postures in order to calculate virtual postures. When only examininga single human pose it is unapparent what the persons movement direction is. Falselyinterpreting the direction can result in unnatural robotic behavior. But when using mul-tiple human poses, the persons movement history can be analyzed and the synthetichumanoids joint angle values can be set accordingly. Figure 4.10 shows how multiplehuman postures are mapped onto a single virtual human posture using an ANN. The

    24

  • 7/27/2019 Master Thesis Vogt

    25/69

    4.4 Learning Interaction Models

    pose history is also called sliding window and its size (the amount of postures stored init) has to be set independently for each recorded behavior.

    Figure 4.10: Multiple human postures are mapped on single robot/virtual humanposture with an artificial neural net. The dierent layers of the ANN are highlightedblue, green and red.

    The ANN consists of three layers an input, a hidden and an output layer. How manyneurons a layer consists of and which connectivity value is used, depends on the recordedmotion data. These values are set automatically by the software. All points of the hu-man low-dimensional posture space are added to the ANNs input layer. In doing so,the neural net is trained with the supplied dataset. When using the net for prediction,

    several input points (with the size of the sliding window) are combined to produce asingle virtual human posture.

    Figure 4.11: The left figure illustrates the mapping learned with an ANN. Coloredregions indicate how areas with the highest mapping error correspond with the overallerror diagram on the right.

    25

  • 7/27/2019 Master Thesis Vogt

    26/69

    4.4 Learning Interaction Models

    The obtained error during the transformation is illustrated in figure 4.11. Colored re-gions indicate how areas with the highest error correspond to the overall behavior tra-jectory. As mentioned earlier, the supplied motion data describes a smooth trajectoryin high-dimensional space. This characteristic has to remain within the low-dimensionalembedding because a smooth behavior trajectory has been proven to be well-suited as

    training data. Noisy input data can lead to overtrained networks that adapt the noiseand do not generalize for unseen input points. Especially for strong nonlinear embed-dings overtraining can occur.

    4.4.3 Echo State Network

    Pioneering approaches in reservoir computing are Echo State Networks (ESNs). ESNsare based on the observation that randomly created recurrent neural nets possess certainalgebraic properties and training a linear readout from it is often sucient to achieveexcellent performance [JMP07, LJ09].

    Within an ESN a recurrent neural network, called dynamic reservoir, is randomly cre-ated and remains unchanged during training. It is initiated by the input data and storesin its states a nonlinear transformation of the input history. This allows ESNs to de-velop a self-sustained temporal activation dynamic [LJ09]. The desired output signal isthen generated as a linear combination with linear regression, using the training data astarget output.A mapping learned by an ESN for the arm mirroring example is shown in figure 4.12.During the first few steps the ESN has too few points in its history, resulting in largeEuclidean distances between calculated and desired output values. After the input his-tory has been set up the error drops tremendously.

    Figure 4.12: The left figure illustrates the mapping learned with an ESN. Once againcolored regions indicate how areas with the highest mapping error correspond with theoverall error (diagram on the right).

    ESNs can be more ecient compared to ANNs [LJ09] since smaller amounts of neuronsare used. In contrast to that, single parameter changes can be computationally expen-

    26

  • 7/27/2019 Master Thesis Vogt

    27/69

    4.5 Real-time Human Posture Approximation and Interaction

    sive, because of the resulting size of update cycles. Only relatively small datasets shouldbe used. An additional downside is the variety of control parameters. In the majority ofcases expert knowledge is required in order to set these optimal. As an example figure4.13 illustrates three dierent reservoir sizes and their impact on the learned trajectory.

    Figure 4.13: Reservoir sizes and their influence on the mappings smoothness. Theamount of points stored in the ESN can have a considerably impact on the learningresult. The shown reservoir sizes are (from left to right): 5, 15 and 25 points.

    4.5 Real-time Human Posture Approximation and

    Interaction

    Once the behaviors are recorded and an interaction model has been learned, a virtual

    human can be used for real-time interaction (see figure 4.14). One human is replaced bya virtual human and the generalizing model is used to have the avatar interact with theremaining interaction partner in virtual reality. Since the interaction will not be executedthe same way it has been recorded, the virtual human needs to analyze the personscurrent posture. The person is once again captured utilizing the Microsoft Kinect depthcamera. Then its extracted joint angle values are reduced in dimensionality. Sincesome dimensionality reduction algorithms do not support unseen input data, a generalapproach has to be used for the transformation.The well-known k-dimensional tree search structure is employed to project the observedhigh-dimensional human posture to a previously created low-dimensional embedding.

    The algorithm operates the same way as the method introduced in chapter 3.3. Thenthe transformed point is used as an input value for the interaction model in order tocalculate the virtual humans posture.During real-time interaction this process takes place several times per second, creating acontinuous flow of approximated human postures in low-dimensional space. Especially,for ESNs this feature is used to analyze temporal features of a persons behavior.

    27

  • 7/27/2019 Master Thesis Vogt

    28/69

  • 7/27/2019 Master Thesis Vogt

    29/69

    5 Emotions and BehaviorModifications

    In this chapter emotional and behavioral modifications that can be added to a learnedtwo-person interaction model are introduced. The implementation of Walt DisneysPrinciples Of Animation within a calculated interaction model will be presented. Indoing so, it will be pointed out that some of these are already included in the modelwhile others can be added manually. Then the conveyance of three basic emotionsutilizing interaction models for synthetic humanoids will be addressed. After that the

    importance of body postures for expressing emotional states will be emphasized.

    5.1 Walt Disneys Principles of Animation

    Between the late 1920s and the late 1930s animations at Walt Disneys Studios becamemore and more life-like and sophisticated. They started to create characters that ex-pressed emotions and were visually appealing to viewers eye. It was apparent to WaltDisney that each action, that was going to happen in a scene, had to be unmistakablyclear to the audience. For that animators analyzed human motion in nature and gath-ered every detail. Eventually, they isolated and named certain animation procedures

    and the newly invented practices and fundamental rules were from now on known asThe Principles Of Animation [Joh95].The applicability of these in 3D computer animation has been pointed out by the work ofLasseter [Las87]. In doing so, the author describes that all principles that have proven tobe so well suited for classic 2D animation also apply in the animated three dimensionalworld.Since a virtual human or humanoid robot has to be convincing and appealing to the hu-man eye, an excerpt of these has been implemented as animation filters for interactionmodels. The focus of the following sections is on basic concepts how these principlescan be implemented for virtual humans and humanoid robots at the same time. Since

    the underlying data for both target platforms is equal the resulting motions can be ex-changed mutually. Since the reproduction of movements on the Nao robot need furtheroptimization the examples are only based on evaluated experiments in virtual reality.

    5.1.1 Squash and Stretch

    The definition of rigidity and mass of characters and the distortion of such during anaction is known as Squash and Stretch. Objects stretch and squash during an animation

    29

  • 7/27/2019 Master Thesis Vogt

    30/69

    5.1 Walt Disneys Principles of Animation

    depending on their mass and rigidity. This does not mean that objects necessarily haveto deform. An articulated object, like the puppet in figure 5.1 for example, can fold overitself and stretches by extending out fully without deforming [Joh95].

    Figure 5.1: Squash and stretch is the most important rule for life-like character ani-mation. The figure shows that a character stretches while jumping and folds over whenlanding on the ground.

    The most important rule for Squash and Stretch is that objects have a constant volumeregardless if it they are stretched out or pressed together. This principle is also used in

    animation timing. Hereby, an object is stretched to reduce the strobing eect in fastmovements. This is due to the fact that the human eye stitches single frames togetherto a smooth animation. When objects are not overlapping, the human brain perceivesseparate images, destroying the illusion of movement.The underlying natural basis of Squash and Stretch can be observed in nearly all livingflesh, regardless if it is a bony human arm or clumsy dog. Each character will showconsiderable movements during an action.Concerning the interaction leaning approach presented in thesis, the principle ofSquashand Stretch is already encoded in recorded movements. Since the underlying motiondata is acquired from humans, the played back recording has a natural and life-like

    basis. However, each motion dataset contains noise, which would diminish the naturalimpression of a behavior. Hence, all datasets have to be filtered in high-dimensionalspace in order to preserve human movements correctly.

    5.1.2 Timing

    Walt Disney paid peculiar attention to the time and speed of an action, since propertiming is crucial for a characters believability [Joh95]. Also, the impression of weightis mostly defined by an objects speed. For example, a heavy person would move slowlyand lethargic. In contrast, a light object like a canon ball would move very fast indicat-

    ing a light mass.The timing of a character has also a great impact on its emotional appearance. In gen-eral a slow moving human appears to be relaxed and a fast moving person seems to benervous or excited [Las87].In reference to interaction models the timing can be influenced as well, but only bythe user interacting with the virtual human. Since the underlying recordings have beenacquired simultaneously the temporal coherence remains within the data set. If the ex-ecution time is altered only for one behavior the approximated virtual human postures

    30

  • 7/27/2019 Master Thesis Vogt

    31/69

    5.1 Walt Disneys Principles of Animation

    would refer to prior or future human postures, because the poses of the synthetic hu-manoid are fully managed by the interaction model. This means if a person interactsslowly the virtual human would move slowly as well in order to preserve the desiredinteraction.

    5.1.3 Anticipation

    In 2D animation an action occurs in three steps: preparation, proper execution andtermination. Anticipation describes the preparation of an action. Correct preparationof actions is one of the oldest rules in theatre. The attention of the audience has to beguided, so they clearly understand what is going to happen next. This also applies topeople watching a cartoon or 3D animation. The amount of anticipation added by theanimator has a serious impact on the action that will follow. For a very fast movementthe amount would be much higher than for a slow action. If this is not done correctly ananimation can appear abrupt, unnatural and sti. As an example one can image a man

    starting to run. He draws back in the opposite direction, gathering like spring, aimingat the track [Joh95] before starting to jump o.Concerning interaction models anticipation can also be added. For this, a filter hasbeen implemented that changes the beginning of a recorded behavior, so that a virtualhuman appears to build momentum. The anticipation filter analyzes a given behaviorand identifies key postures during the first few seconds. After that, the movementdirection is calculated. This direction is then inverted and a new starting posture isfound by following the behavior trajectory backwards. Visually speaking this wouldappear as a backward motion of the synthetic humanoid.

    Figure 5.2: The upper line shows the first steps during the defend behavior that willbe introduced in chapter 7. The second line shows the virtual human repeating thismovement with anticipation added.

    The newly calculated pose is then used as starting pose and the algorithm utilizes apenalized regression spline to interpolate to the extracted key posture creating a smooth

    31

  • 7/27/2019 Master Thesis Vogt

    32/69

    5.1 Walt Disneys Principles of Animation

    accelerated movement. This calculation is done in high-dimensional space and onlyaects the first few seconds of a behavior, since the remaining part has to continueunchanged in order to keep the interaction plausible.Figure 5.2 shows how the virtual humans moves during the first few time steps whileexecuting the defend behavior. The first line displays the poses without anticipation.

    The second line shows the synthetic humanoid with anticipation added. As one can seethe virtual human uses its arms to build momentum. For that the arms are pulled backduring the beginning of the behavior. Then the arms snap to the desired protectiveposture and the interaction continues unchanged.

    5.1.4 Exaggeration

    In 2D animation the principle of exaggeration refers to a characteristic of a cartoon wherepersons are drawn on the edge of realism. Walt Disney instructed the animators to drawsad scenes even sadder or a bright character even brighter. Sometimes, a characters

    physical features were altered featuring extreme physical manipulations and supernat-ural or surreal properties. But to exaggerate an action does in general not mean thatan animation has to become more violent or distorted. Instead, Walt Disney wanted tohave cartoons wilder and extreme in form by remaining true to reality [Joh95]. That iswhy he also described exaggeration as extreme unnatural but unmistakable clear.

    Figure 5.3: The first lines shows screenshots of a virtual human playing back thelearned defense behavior without additional exaggeration added. The same behavior

    more exaggerated is shown below.

    In order to produce exaggerated behaviors with the interaction model approach presentedhere a filter based on low-dimensional data has been implemented. Ben Amor [Ben10]pointed out that exaggerated postures can emerge at the edges of low-dimensional pos-ture spaces created with PCA. Since principal component analysis can be used as a di-mensionality reducer for interaction models as well, this characteristic can be exploitedfor creating such behaviors.

    32

  • 7/27/2019 Master Thesis Vogt

    33/69

    5.1 Walt Disneys Principles of Animation

    Figure 5.4: The PCA space of the defense motion, showing the original behavior curve(red) and the exaggerated trajectory (red) created through magnification. Also two keypostures are displayed for both versions of the action.

    In order to move low-dimensional points of a virtual humans pose further to the edgeof the postures space, a magnification filter has been implemented. The magnificationfactor can be set depending on the desired results. Figure 5.3 shows the original andexaggerated defense behavior captured with seven time steps. As it can be seen in thefigure, the virtual human moves its arms higher and crouches lower on the ground. Thefinal poses also features a straightened back and an erected head.

    5.1.5 Arcs

    In the context of 2D animation arcs describe the change of one extreme position toanother. This rule has been introduced by animators to avoid mechanical movements.With some limitations nearly all living creatures describe an arc of some kind in theiractions [Joh95]. Since arcs describe the movement direction of body parts, their char-

    acteristics also encode inbetween timings, which can kill the essence of an action whennot set properly.As computer animation evolved, splines were used to create smooth behaviors, whichalso have been utilized for interaction models. Since every pose of the virtual human isbased on postures that humans obtained during the recording, the action has a naturalbasis that can be described with arcs.Dierent interpolation algorithms can be applied in the user interface to add additionalsmoothness to an action. For that the algorithm searches for extremes in the high-

    33

  • 7/27/2019 Master Thesis Vogt

    34/69

    5.1 Walt Disneys Principles of Animation

    Figure 5.5: The diagram shows a single joint angle value (right hip joint) during thefirst 100 time steps of the defense behavior with additional spline fitting applied. Thered curve describes the original joint angles, whereby the green trajectory shows thenewly fitted angles.

    dimensional joint angle curve and uses them as control points. A resulting curve can beseen in figure 5.5. Penalized regression splines have been proven to be well suited, be-cause many parametric options allow the most customizability, while providing sucientprecision.

    5.1.6 Secondary Action

    A secondary action results directly from a primary action and has to be kept subordinate.As an example, one can image a person wearing a heavy coat. In a following scene theperson starts turning around. The primary action is then the persons turning, whilesecond action is the warping movement of the coat. In this case, the coats behavior isdirectly dependent on the persons reaction and thereby subsidiary.Also secondary actions can be expressed with facial animations, when the main actionof the scene is being told by the body movements. The diculty lies then within themovement speed, since a facial expression can simply be unseen when a movement speedis too fast. Thus, the animation has to be staged obvious but still secondary.Secondary actions that involve movements of body parts are always recorded, as longas the person showing the behavior did display some. Additionally, small behaviors canbe added to the main animation. A second behavior is integrated into the animationby partly adding high-dimensional data of user-defined joint angles. A bicubic spline isthen used to combine both behaviors to a smooth movement. Over time the influenceof the secondary action on the movement is decreased till it is not visible any more.In figure 5.6 a knee raising action has been added to a virtual humans behavior. Asalready mentioned the influence of the secondary action on the main action can be setdepending on the desired output. In the example a relation of one to five has been used.

    34

  • 7/27/2019 Master Thesis Vogt

    35/69

    5.1 Walt Disneys Principles of Animation

    Figure 5.6: The first line shows a virtual humans motion. The second row displaysthe same primary action with another animation added on top. As one can see duringthe secondary action the virtual human raises its knee.

    Thus, a fifth of the primary action is changed compared to the original behavior.Since the secondary action involves the usage of one leg, which has not been uses inthe original action the high dimensional space changes. Hence, the low-dimensionalembedding diers from the previous one. Figure 5.7 shows how the low-dimensionalembedding changes when the mentioned action is added.

    (a) Original action (b) A secondary action added

    Figure 5.7: How a secondary action changes a low-dimensional embedding of an actioncan be seen in the figure. Color is used to indicate how points on the trajectory without

    a secondary action correspond to the curve with an additional secondary action.

    35

  • 7/27/2019 Master Thesis Vogt

    36/69

    5.2 Expressing Basic Emotions

    5.2 Expressing Basic Emotions

    In the work of [Wal98] the author analyzed human emotions regarding their posturalexpression. He concluded that basic emotions (joy, sadness, pride, shame, fear, anger,disgust, contempt) can be observed as particular postures or movements. In his findings

    he also showed that these emotions are expressed the same way regardless of humancultures.It has been pointed out by various researchers that these emotions can be expressed byhumanoid robots, like a Nao robot [BHML10, MC11, MLLM] as well as virtual humans[GRM04, SR08]. Thus both research fields work for a common goal, to express artificialemotions realistic and convincing. In contrast to virtual humans most humanoid robotsdo not have facial features. So the expression of emotional states hinges on the questionwhether an emotion can conveyed through body postures or not.Since the intention of this thesis is to combine the research fields of humanoid roboticsand computer animation a framework has been developed to express emotions in both

    worlds. The recorded behavior of a virtual human or robot is altered. That is whyemotional states are in the context of this thesis behavioral modifications rather thanself-contained expressions to give the user the illusion of dierent emotional states. Inorder to express dierent emotions or attitudes, the joint angle values of the recordedmotions are changed and reassigned to the virtual human or humanoid robot.Filters are implemented to convey a mood of a scene, not to express an emotion at allcosts, since the underlying action has to remain the same. Filters can be based on highor low-dimensional data. It is noted that modifications of low-dimensional spaces aectall joint angles. In contrast to that high-dimensional filters manipulate recorded dataof joints one at the time. Dierent ranges have to be kept in mind, in order to restrainthe filters influence. This means in other words that the resulting angle values have toremain within the physical constraints of the robot.Additionally, joints can be excluded from further manipulation. This feature is especiallyuseful when behaviors already display certain emotional features. The head angles forexample should be excluded when the viewing direction is already upwards. This impliesalso to other joint angles. In order to limit these, the user can set parameters in thesoftwares configuration file.In the example where a synthetic humanoid learns to defend itself (see chapter 7) anadditional emotional change can be applied. Based on the works of Wallbott [Wal98]the following three modifications can be added.

    5.2.1 Happiness

    Wallbott [Wal98] described that happiness can be observed as a collection of variouspurposeless movements, combined with jumping, dancing for joy and clapping of hands.Also the body is held erect and the head upright. The animation filter for happiness isimplemented with regard to these characteristics.A penalized regression spline is used to flatten recorded neck angles and limit the virtualhumans gaze direction. Additionally, the spine rotations are set to create an upright

    36

  • 7/27/2019 Master Thesis Vogt

    37/69

    5.2 Expressing Basic Emotions

    pose. The second line in figure 5.8 displays a added happiness filter for the defensebehavior that will be learned in chapter 7. In contrast to that the first line shows thesame behavior without additional emotions added.

    Figure 5.8: First line: the first few seconds of a behavior without an emotion. Secondline: happiness added to the movement.

    The figure indicates that the head is always upright during the movement and the armsare resting wide open on the side. As soon as the animation starts the virtual humansmovements appear smooth and curved, implying a cheerful mood. Also the back isalways straight and upright. This increases the impression of happiness of the virtual

    character even further.

    5.2.2 Sadness

    According to Wallbott [Wal98] a sad person behaves motionless, passive and with ahead hanging on a contracted chest. The emotion filter implementing these characteris-tics utilizes once again splines for motion smoothing. Since this emotion aects all bodyjoints the spline is applied to all joint angle values.In order to create the impression of a sad character the following steps where imple-mented.

    Viewing Direction. Firstly, the viewing direction has been changed so that thecharacter is gazing on the ground. Additionally, random slow head movementshave been added to decrease the bodies rigidity.

    Spine Rotation. Secondly, the characters back is slightly bowed forwards.

    Movement Speed. A penalized regression spline is then used to smoothen themovement speed of each joint angle. For that, each hinge joint is analyzed and

    37

  • 7/27/2019 Master Thesis Vogt

    38/69

    5.2 Expressing Basic Emotions

    extremes and turning points are used as spline control points. In doing so the finalmovement is only changed slightly but the execution appears smoother and morelethargic, implying sadness and sorrow.

    The second line in figure 5.9 shows a sadness filter applied to a behavior that will be

    learned in chapter 7. The first line shows the same motion with no additional emotionsadded. As one can see the body posture of the virtual human is slumped with a hanginghead. Slow motions and the lethargic appearance boost the expression of sadness evenfurther.

    Figure 5.9: First line: A behavior with no emotion added. Second line: sadness added

    to the motion.

    5.2.3 Anger

    In his articles Wallbott [Wal98] showed that a person expressing anger usually exhibitsa trembling body with the intention to push or strike violently away. Also, shaking fists,an erected head and a well expanded chest have been documented [Wal98].The anger filter implemented for an interaction model boosts hectic movements of thevirtual human. For doing so, joint angle changes between time steps are increased.

    Additional, both feet are planted firmly on the ground. The last line in figure 5.10displays six time steps of the defense behavior with an added anger filter.As it can be seen in the picture, the bent arms of the synthetic humanoid are resting onthe side in a protective pose. When the behavior is played back the movements are fastand hectic accentuating anger.

    38

  • 7/27/2019 Master Thesis Vogt

    39/69

    5.3 Summary

    Figure 5.10: The first few seconds of the defense behavior with no additional emo-

    tion added can be seen in the first line. The second line shows the same motion withadditional anger added. This behavior featured a fast and hectic moving virtual human.

    5.3 Summary

    Several fundamental rules of animation applied to interaction models have been pre-sented in this chapter. In doing so, it is has been pointed out that some are alreadyincluded, while others like exaggeration or anticipation can be added manually. Theunderlying implementation of such in the form of filters has been presented. It has beenshown that an implementation of Walt Disneys The Principles Of Animation in the

    form of low- and high-dimensional filters can be achieved. A distinction between fullmotion modification and single joint angle manipulation must be made.In order to convey a certain mood in the virtual character or humanoid robot three basicemotions have been implemented in filters as well. High-dimensional behavioral modi-fications need to made. The underlying body postures have been created by analyzingWallbotts studies [Wal98]. It has been pointed out that, in the field of humanoid roboticsthese emotions have already been transferred to a Nao [BHML10, MC11, MLLM]. Hence,another evaluation of such has not been presented.

    39

  • 7/27/2019 Master Thesis Vogt

    40/69

    6 Software Architecture6.1 Overview

    In the following sections main software components implementing the interaction learn-ing algorithm will be explained. Additionally, filters for modifying recorded behaviorsare explained in a detailed fashion. Finally, external components that are not essentialfor the interaction learning algorithm, but highly recommended are introduced.Figure 6.1 gives a brief overview of the main software components implementing thetwo-person interaction model learning approach.

    Figure 6.1: Main software components and additional libraries that build the softwarebasis for learning interaction models.

    The main software components that are crucial for the interaction learning approach

    developed in this thesis can be segmented in four parts: Dimension Reducers - This component implements four main dimensionality re-

    duction techniques namely, PCA, Manifold Sculpting, LLE and IsoMap. Addi-tionally Breadth First Unfolding and NeuroPca can be used.

    Learning Algorithms - Within this part dierent learning algorithms for mappingtwo low-dimensional posture space are implemented. They include ANNs, ESNsand Linear Regression.

    40

  • 7/27/2019 Master Thesis Vogt

    41/69

    6.2 Dimensionality Reduction

    Filters - Additionally behavior modifications based on low or high-dimensionaldata are combined in this software package. They include classes for basic emotionexpression (happiness, sadness, anger) as well as algorithms implementing somePrinciples Of Animation.

    Trajectory Visualization - This component is uses Qwt to display low-dimensionalbehavior trajectories. Additional changes to all points can be made with drag-and-drop.

    Most of the main components are compiled in one file. For reasons of portability andreusability the following features have been compiled in shared libraries including theSQL connector, the Kinect client, all interaction learning algorithms and the OGREvisualization.The following sections each component will be explained briefly regarding its main fea-tures. For a more detailed description a full source code documentation can be foundon the attached DVD.

    6.2 Dimensionality Reduction

    Dimensionality reduction is an important step in the process of interaction learning.The software basis for this has been implemented with algorithms of the Waes li-brary [Gas11]. A factory design pattern is used to provide required dimension reductionfunctionalities to the user interface, where each reducer can be selected with drop-downmenu. The source code fragment for this can be seen in listing 6.1.Since the calculation of low-dimensional embeddings can be very time consuming andcomputationally expensive a multi-threaded approach is used to balance the work loadon multiple processors. As soon as the computation is done, the user interface is notifiedand user feedback is created.

    6.3 Interaction Learning Algorithm

    Three dierent learning algorithms have been implemented utilizing dierent machinelearning algorithms, namely Linear Regression, Artificial Neural Nets and Echo StateNetworks. The code fragment 6.1 shows how two behaviors can be mapped utilizing aartificial neural net .

    1 // C r ea t e d i m en si o n r e d u c e r f o r e ac h b e h a v i o r 2 Dimens ionReducerFactory dimReducerFactory = new

    DimensionReducerFactory () ;3 DimensionReducer s tReducer = dimReducerFactory>g e t R e d u c e r (

    Dime nsio nRed ucer Fact ory : :PCA) ;4 DimensionReducer ndReducer = dimReducerFactory>g e t R e d u c e r (

    Dime nsio nRed ucer Fact ory : :PCA) ;5 i n t t a r g e t D i m e n s i o n = 6 ;

    41

  • 7/27/2019 Master Thesis Vogt

    42/69

    6.4 Behavior Database

    6

    7 // Reduce d i m e n s i o na l i t y o f e ac h b e h a vi o r 8 s tReducer>s e t Da t a ( f i r s t P e r s o n B e h a v i o r D a t a ) ;9 s tReducer>tran s for m ( targ etDim ens ion ) ;

    10

    11 ndReducer>s e t D a t a ( s e c o n dP e r s o n Be h a v i o r Da t a ) ;12 ndReducer>tran s for m ( targ etDim ens ion ) ;13

    14 // C re at e i n t e r a c t i o n l e a rn e r and s e t b e ha v io r d at a 15 I n t e r a c t i o n L e a r n e r N e u r a l N e t l e a r n e r = new

    I n t e r a c t i o n L e a r n e r N e u r a l N e t ( ) ;16 l e a r n e r>s etStData (s tReducer>getTransfo rmedData () ) ;17 l e a r n e r>setNdData(ndReducer>getTransfo rmedData () ) ;18

    19 // S t a r t t h e l e a r n i n g a l g o ri t h m 20 l e a r n e r>run ( ) ;

    Listing 6.1: A simple example where two behaviors are reduced in dimensionality andmapped with an ANN.

    6.4 Behavior Database

    In order to calculate a low-dimensional model of a behavior the required motion datahas to be provided. Each behavior is stored in a GClasses::GMatrix class as soon asit is read from the database. This relational database can be accessed utilizing the SQLlanguage. This client software is also used by [Ber11] to store and load robot behavior

    data.Code listing 6.2 shows how a connection to the database can be created.

    1 // C r ea t e a d a t a b a se c o n ne c t i on 2 Sq lConnector s q l C o n =3 new S q l C o n n e c to r ( i p , d bName , u s e r , c r e d e n t i a l s ) ;4

    5 // c o nn ec t t o t h e d a t ab a se 6 i f ( ! sqlCon>con nect () ) ;7 return ;8

    9 // C re at e s t o r a g e f o r a new b e h a vi o r 10 GCl ass es : : GMatrix b e h av i o r = new GClas ses : : GMatrix( 26 ,0 ) ;11

    12 // R e qu es t a b e h a vi o r f rom t h e d a t ab a se ( i f e x s i s t i n g )13 i f ( ! ( s q lCon>getB eha v iour (name , ty pe , beh av ior ) ) )14 return ;15

    16 // C l o s e t h e c o n ne c t i on 17 i f(s q lCon>is Conn ected () )

    42

  • 7/27/2019 Master Thesis Vogt

    43/69

    6.5 Visualization

    18 sqlCon>d i s c o n n e c t ( ) ;

    Listing 6.2: C++ code to read a behavior from the motion database

    If a connector object is created and the connection is established successful the followingfunctions are available:

    getBehaviourList - Receives a list of all behaviors in the database

    getBehaviour - A behavior can be requested from the database. There receiveddata will be stored in a GClasses::GMatrix object.

    pushData - Push a new behavior to the database by proving a unique name, thetype of recorded data (human motion data or robot joint angles) and aGClasses::GMatrix object containing the actual data.

    dropBehaviour - A behavior can be deleted by simply supplying a name

    In the database each behavior is stored in a separate table with ascending indexesidentifying the time steps. Additionally, pelvis positions and rotations are stored formotion visualization purposes.

    6.5 Visualization

    In order to visualize low-dimensional data Owt and Qt4 extensions have been written.Figure 6.2 shows the main user interface for behavior visualizations. The screen shotdisplays two low-dimensional behavior trajectories reduced in dimensionality with PCA.

    Each blue square is a posture obtained by the demonstrator during the recording. Itcan be moved in the low-dimensional posture space by the user. In doing so, additionalmodifications can be added.

    Figure 6.2: A user interface screen shot is shown. Two behaviors have been reducedin dimensionality with PCA. Each blue square is a posture that has been executed bythe human demonstrator during a recording. This point can be moved by the user inlow-dimensional space.

    43

  • 7/27/2019 Master Thesis Vogt

    44/69

    6.5 Visualization

    One has to keep in mind that only two dimensions are modified at a time since theremaining ax