[IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) - Kitakyushu, Japan (2013.12.10-2013.12.12)] 2013 Second International Conference on Robot,

Human-Robot Interaction System with Quantum-Inspired Bidirectional Associative

Memory

Naoki Masuyama∗, Chu Kiong Loo∗ and Naoyuki Kubota†∗Faculty of Computer Science and Information Technology,

University of Malaya,

50603 Kuala Lumpur, MALAYSIA

Email: [email protected]

Email: [email protected]†Department of Systems Design,

Tokyo Metropolitan University,

6-6 Asahigaoka, Hino, Tokyo, 191-0065, JAPAN

Email: [email protected]

Abstract—This paper discussed the Interaction System withRobot Partner using Quantum-Inspired Bidirectional Asso-ciative Memory (QBAM). We have been developed QBAMwhich has the superior Memory Capacity and Recall Reliabilitycompare with conventional models. Due to these advantages,the proposed system can be stored much information andits relationships. Using QBAM, we construct the interac-tion system that can be associated with gesture, object andvoice information. In proposed system, Steady-state geneticalgorithms are applied in order to detect objects via imageprocessing. Spiking neural networks are applied to memorizethe spatio-temporal patterns of gesture. For voice recognition,we use Julius that is open source large vocabulary continuousspeech recognition engine. The results of experiment showsthat proposed system is able to contribute for the facilitationof communication with Robot Partner.

Keywords-Human-Robot Interaction; Associative Memory;Quantum Computing;

I. INTRODUCTION

One of the significant problems in aging society is elderly

care. In fact, the number of elderly people lives alone is

increasing and they are much lacking a chance of com-

munication with other people that compare with elderly

people who live with family. Due to this, these people would

increase the probability of cognitive decline and so on. In

order to improve this situation, communication robot has

been developed [1], [2]. According to relevance, human

thought is not just transmitted, but is in fact a shared

event between two people [3]. Such a shared cognitive

environment is called a mutual cognitive environment [4].

Conventionally, if the robot provides a topic, it is selected

a suitable one based on past history of conversation with

human, human behavior, or choose general topics such as

weather and news from the Internet. If we apply associative

memory under shared cognitive environment, however, the

robot can provide the more suitable topic of current content

for conversation with human. As a result, communication

with robot and human will be more active.

In the early 1980’s, Hopfield proposed an auto-associative

memory model to store and recall information as the human

brain. In the late 1980’s, Kosko extended the Hopfield

model to a bidirectional associative memory (BAM) [5].

These models, however, are suffered from quite low Memory

Capacity and Noise Tolerance. To improve the performance,

A General model for BAM (GBAM) was proposed [6].

This model is defined a weight matrix in each layer, and

developed an algorithm for learning the asymptotic stability

condisions. As another approach, there is a model that apply-

ing quantum mechanics for Hopfield model [7]. This model

shows quantum information processing in neural structures

results in an exponential increase of storage capacity, and

it can explain the extensive memorization and inferencing

capabilities of humans. Inspired by this model, we have

been developed Quantum-Inspired Bidirectional Associative

Memory (QBAM) [8].

In this paper, we propose the interaction system with

Robot Partner using QBAM. It can be associated with

gesture, objet and voice information. In section II, we briefly

introduce the structure of QBAM and its superior abilities in

terms of Memory Capacity and Noise Tolerance. In section

III, we explain the total architecture of interaction system

with QBAM and its computational intelligence technologies.

In section IV presents the experimental results that com-

municate between Robot Partner and human through object

recognition, gesture recognition and voice recognition.

II. QUANTUM-INSPIRED BIDIRECTIONAL ASSOCIATIVE

MEMORY

In this section, it will be shown the structure of Quantum-

Inspired Bidirectional Associative Memory (QBAM).

A. Structure of QBAM

Let{X(k), X(k), . . . , X(k)

}and

{Y (k), Y (k), . . . , Y (k)

},

for k = 1, 2, . . . , be the bipolar pattern to be stored. kdenotes the number of pattern pairs, M and N denote the

2013 Second International Conference on Robot, Vision and Signal Processing

978-1-4799-3184-2/13 $31.00 © 2013 IEEE

DOI 10.1109/RVSP.2013.23

66

number of neurons in X-layer and Y-layer, respectively, WTij

and Wij denote the weight matrix in X-layer and Y-layer,

respectively.

• X-layer to Y-layer⎧⎪⎪⎨⎪⎪⎩X(k)=

M∑j=1

WTij x

(k)i (1a)

y(k)j =sgn

(Xk

j

)(1b)

• Y-layer to X-layer⎧⎪⎪⎨⎪⎪⎩Y (k)=

N∑i=1

Wijy(k)j (2a)

x(k)i =sgn

(Y ki

)(2b)

The weight matrix WTij and Wij are as follows:

• X-layer to Y-layer

WTij =

1

K

N∑j=1

M∑i=1

vTj ui

(3)

• Y-layer to X-layer

Wij =1

K

M∑i=1

N∑j=1

uTi vj

(4)

ui and vj are calculated by Gram-Schmidt orthogonaliza-

tion that according to a1 = A1/ ‖A1‖ (p = 1), bp = Ap

−∑k−1

i=p−1 (ai,Ai) ai and ap = bp/ ‖bp‖ (2 ≤ p ≤ k),where A denotes the vector of performing orthogonalization,

a and b denote orthonormalized vector and orthogonalized

vector, respectively. The update rule of weight matrix WTij

and Wij are defined as Eqs. (5) and (6), where t denotes

time steps, exponential T denotes transpose. x(k)(stored) and

y(k)(stored) denote kth stored pattern, and x(k) and y(k) denote

inner state of X-layer and Y-layer, respetively. M and

N denote the number of neurons in X-layer and Y-layer,

respectively. F (0 < F ≤ 1) denotes variation amount in

position for the center of Fuzzy sets.

B. Simulation Experiment

In this section, we compare with QBAM, BAM [5] and

GBAM [6]. It will be shown that the simulation experiment

of Memory Capacity and noise tolerance on each model.

We set the number of stored pattern pair sets K = 75(Alphabet, Number, Image, etc.). These patterns represent

25 (5 by 5) bipolar patterns as neuron for X-layer and Y-

layer, respectively.

• Memory Capacity

Memory Capacity is an important performance element

for associative memory. We consider that if the percentage

of successful recall for number of K pair sets is over 90%, Kpair sets can be stored in the memory. It is known that with

n neurons in both X-layer and Y-layer without noise, BAMs

capacity is not larger than“0.15n”, and GBAMs capacity is

around “1.0n”. In Fig. 1, GBAM is less than 90% around

K = 26. The capacity of QBAM, however, can be stored

all pair sets.

• Noise Tolerance

Noise tolerance is also significant function in associative

memory. We measure the noise tolerance by add the noise

Figure 4. The architecture of robot system.

(a) Object Recognition (b) Gesture Recognition

Figure 5. Example of Computational Intelligence.

randomly on input data in X-layer. In Fig. 7, we set the

number of pair sets K = 3, because this value is the

maximum memory capacity of BAM in n = 25. In Figs.

7 and 8, GBAMs correct recall rate is decrease in 60% of

noise rate. On the other hand, QBAM shows outstanding

noise resistance even if high noise rate.

III. SYSTEM ARCHITECTURE AND COMPUTATIONAL

INTELLIGENCE

We have been developed Quantum-Inspired Bidirectional

Associative Memory and its application [8], and conducted

simulation experiments in terms of Memory Capacity and

Noise Tolerance that are compared with conventional meth-

ods. As a result, we have confirmed that QBAM has the

superior ability. It will be applied for the proposed system.

A. Interaction System with Associative Memory

We developed the interactive communication system with

associative memory. Fig. 4 shows structure of the system.

The system is composed of Robot Partner, Microsoft Kinect,

microphone and server PC. Kinect can get the RGB data

and the distance data. Using inputs data that are received

from Kinect, server PC detects the object color and shape

by Steady-State Genetic Algorithm (SSGA), or recognizes

the gestures by Spiking Neural Networks (SNNs) (Fig. 5).

For voice recognition, we apply Julius that is open source

large vocabulary continuous speech recognition engine [9].

It works in real time, and recognition accuracy shows over

90% in 20,000 words reading test. Microphone collects hu-

man voice for Julius. Julius is connecting with server PC by

TCP/IP of module mode. In this system, we apply Japanese

language model to Julius. Thus It will be worked as Japanese

recognition engine. Server PC also calculates relationship

between object, gesture and word by QBAM. Then, based

on the relationship, server PC sends the utterance or behavior

67

• X-layer to Y-layer

WT (t+1)ij =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

WT (t)ij +

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

−F, If xk(stored)

M∑i=1

WT (t)ij xk ≤ 0 and W

T (t)ij ≥ 0

F, If xk(stored)

M∑i=1

WT (t)ij xk ≤ 0 and W

T (t)ij < 0

WT (t)ij , Otherwise

(5)

• Y-layer to X-layer

W(t+1)ij =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

W(t)ij +

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

−F, If yk(stored)

N∑j=1

W(t)ij yk ≤ 0 and W

(t)ij ≥ 0

F, If yk(stored)

N∑j=1

W(t)ij yk ≤ 0 and W

(t)ij < 0

W(t)ij , Otherwise

(6)

Figure 1. The result of Memory Capacityin K = 75.

Figure 2. The result of Noise Tolerance inK = 3.

Figure 3. The result of Noise Tolerance inK = 20.

Figure 6. The example of Robot Partner behaviors (Loser Bye-Bye, UpperBye-Bye and Up & Down).

order to Robot Partner by TCP/IP. Fig. 6 shows example of

Robot behaviors.

B. Multi-modal Communication with Robot Partner

Robot Partner with a human-like body and human-like

abilities could in principle be capable of intuitive multimodal

communication with human [2]. Moreover, it would be

possible to transfer information while having effectiveness

and robustness. Multiple modalities, such as gesture and

utterance can be represented several information. Simul-

taneously, it can be supplemented such information with

effective.

C. Object Recognition

It will be shown method for object recognition. We focus

on color-based object and shape recognition using SSGA

based on template matching [4]. We used an octagonal tem-

plate that is used for detecting a target that the jth point g(i,j)of the ith template is represented by (gi,1+gi,j cos(gi,j+m),gi,1+ gi,j sin(gi,j+m), i = 1, 2, . . .G, j = 3, 4, . . . , 2m+2.

Oi(= (gi,1, gi,2)) is the centre of a candidate template on

the image. G and m are the number of candidate templates

and the searching points used in a template, respectively.

A superscript O stands for the parameter for object recog-

nition. Therefore, a candidate template is composed of the

numerical parameters of (gi,1, gi,2, . . . , gi,2m+2). The fitness

value is calculated as follows:

fi = CTarget − η · COther (7)

where η is a coefficient for penalty (η > 0), and CTarget

and COther denote the number of pixels of the colors

corresponding to a target and to other colors included in the

template, respectively. The target color is selected accord-

ing to the pixel color that occupies most of the template

candidate, so that the largest area of a single color is then

extracted on the reduced color space of the image.

68

Furthermore, we apply k-means algorithm for the cluster-

ing of candidate templates in order to find several objects

simultaneously. The inputs to the k-means algorithm are the

central positions of the template candidates: ui (= (gi,1, gi,2,i = 1, 2, . . . , K)). The number of clusters is K . When

the reference vector of the kth cluster is represented by

rk = (rk,1, rk,2, . . . , rk,m), the Euclidian distance between

the ith input vector ui = (gi,1, gi,2, . . . , gi,k) and the kth

reference vector is defined as di,k = ||ui − rk||.

Next, the reference vector minimizing the distance di,kis selected by ci = argmink ||ui − rk|| ,where ci is the

cluster number that the ith input belongs to. After selecting

the nearest reference vector to each input, the kth reference

vector is updated by the average of the inputs belonging

to the kth cluster. If the update is not performed during

the clustering process, the updating process is complete.

The crossover and selection are performed with the template

candidates from each cluster. Therefore, SSGA tries to find

different objects within each cluster according to the spatial

distribution of objects in the image.

D. Gesture Recognition

First of all, the internal state hi(t) is calculated as

following:

hi(t) = tanh(hsyni (t) + hext

i (t) + hrefi (t)

)(8)

here, a hyperbolic tangent is used to avoid the bursting of

neuronal fires. hexti (t) is the input to the ith neuron from the

external environment, and hsyni (t), which includes the output

pulse from other neurons, is calculated by

hsyni (t) = γsyn · hi(t− 1) +

N∑j=1,j �=i

wj,i · hEPSPj (t) (9)

Furthermore, hrefi (t) is indicates the refractoriness factor of

the neuron, w(j,i) is a weight coefficient from the jth to

ith neuron, hEPSPj (t) is the excitatory postsynaptic potential

(EPSP) that is approximately transmitted from the jth neu-

ron at the discrete time t, N is the number of neurons, and

γsyn is the temporal discount rate. The presynaptic spike

output is transmitted to the connected neurons according to

the EPSP, which is calculated as follows:

hEPSPj (t) =

T∑n=0

κnpi(t− n) (10)

where κ is the discount rate (0 < κ < 1.0), pi(t) is the

output of the ith neuron at the discrete time t, and T is the

time sequence to be considered. If the neuron is fired, R is

subtracted from the refractoriness value in the following:

hrefi (t) =

{γref · href

i (t− 1)−R, If pi(t− 1)=1γref · href

i (t− 1), Otherwise(11)

Figure 7. Spiking neurons arranged on image.

where γref is the discount rate. When the internal potential

of the ith neuron is larger than the predefined threshold, a

pulse is outputted as follows:

Pi(t) =

{1, If hi(t) ≥ qi0, Otherwise

(12)

where qi is the threshold for firing. The weight parameters

are trained based on the temporal Hebbian learning rule as

follows:

wj,i ← tanh(γwgt ·wj,i + ξwgt ·hEPSP

j (t−1)·hEPSPj (t)

)(13)

where γwgt is the discount rate and ξwgt is the learning rate.

Self-Organizing Map (SOM) is often applied to extract

a relationship among observed data, since it can ascertain

the hidden topological structure from the data. The inputs

to SOM given as the weighted sum of pulse outputs from

the neurons v = (v1, v2, . . . , vN ), where vi is the state of

the ith neuron. In order to consider the temporal pattern,

we use hEPSPi (t) as vi, although the EPSP is used when

the presynaptic spike output is transmitted. When the ithreference vector of SOM is represented by ri, the Euclidian

distance between an input vector and the ith reference vector

is defined as di = ||v− ri||, where ri = (r(1,i),r2,i,...,r(N,i))

and the number of reference vectors (output units) is M .

Next, the kth output unit that minimizes the distance di is

selected by k = argmini ||v − ri||.Furthermore, the reference vector of the ith output unit is

trained by

ri ← ri + ξSOM · ζSOMk,i · (v − ri) (14)

where ξSOM is a learning rate (0 < ξSOM < 1.0), and ζSOMk,i

is a neighbourhood function (0 < ζSOMk,i < 1.0).

The robot extracts human hand motion from a series

of images using SSGA, in which the maximal number of

images is TG. The sequence of hand positions in represented

by G(t) = (Gx(t), Gy(t)), where t = 1, 2, . . . , TG. Here,

the spiking neurons are arranged on a planar grid (Fig. 7)

and N = 45. By using the value of a human hand position,

the input to the ith neurons is calculated by the Gaussian

membership function as follows:

hexti (t) = exp

(−||ci −G(t)||2

2σ2

)(15)

where ci = (c(x,i),c(y,i)) is the position of the ith spiking

neuron on the image, and σ is the standard deviation. The

69

Table IRELATIONSHIP BETWEEN OBJECT, GESTURE AND WORD.

IDRelationship

Object Gesture Word

0 No Object No Gesture No Word

1 Red Circle Lower Bye-Bye RED CIRCLE

2 Green Triangle Upper Bye-Bye GREEN TRAIANGLE

3 Blue Rectangle Up & Down BLUE RECTANGLE

4 Blue Rectangle – –

sequence of pulse outputs pi(t) is obtained using the human

hand positions G(t). Due to the adjacent neurons with the

trajectory of the hand position are easily fired as a result of

the temporal Hebbian learning, the SNNs can memorize the

temporal firing patterns of gestures.

IV. EXPERIMENTAL RESULTS

This section presents the experimental results of commu-

nication between Robot Partner and human using proposed

model through object recognition, gesture recognition and

voice recognition. We defined 5 types of relationship for

associative memory between object, gesture and word as

Table I. For example, depends on situation, “Red Circle”

as input ID “1” can associate “Lower Bye-Bye” or “RED

CIRCLE” as output ID “1”. Here, each input and output are

composed by bipolar pattern.

Fig. 8(a) shows the sequentially input as object, gesture

and word. Figs. 8(b), 8(c) and 8(d) show sequentially

output of Relationship that is associated by BAM, GBAM

and QBAM, respectively. According to Table I, the output

waveform should show the same form as the input one

except for input ID 4. We define that if the non-defined

information is inputted to the system, it regards there is not

input. In Fig. 8(b), due to the quite low Memory Capacity,

BAM cannot recall the correct output. In Fig. 8(c), GBAM is

improved from BAM, however, still recall process has some

failures. On the other hand, in Fig. 8(d), output of QBAM

shows the same waveform with input one except for input

ID 4.

Figs. 9, 10 and 11 show the recall rate between input

and output in BAM and QBAM. Here, in each figures, the

axis labels on the right side represent input information, the

axis labels on the lower side represent recalled information.

The results of BAM in each figure, almost all the outputs

are converged to local minima. In contrast, the results of

QBAM are following to relationships of Table I correctly.

From the results of experiment with robot system, we

regard that QBAM is effective method for the associative

communication system with Robot Partner.

V. CONCLUSION

This paper has proposed the interaction system with Robot

Partner using QBAM. Thanks to superior ability of QBAM,

(a) Input ID

(b) BAM

(c) GBAM

(d) QBAM

Figure 8. The result of Relationships.

(a) BAM

(b) QBAM

Figure 9. The result of recall rate (In: Object, Out: Gesture and Word).

70

(a) BAM

(b) QBAM

Figure 10. The result of recall rate (In: Gesture, Out: Word and Object).

the experimental results show that the system can be store

much more information than in case of use the conventional

methods. If we apply other types of information such as

daily conversation instead of simple information, the system

can be create the mature communication with Robot Partner.

As a future work, we will develop QBAM to Quantum-

Inspired Multidirectional Associative Memory. It would be

more effective for interaction system with Robot Partner.

ACKNOWLEDGMENT

This research is supported by UM RU Operation grant

(Project No. RU014-2013).

REFERENCES

[1] J. Cassell, “Embodied conversational agents: representationand intelligence in user interfaces,” AI magazine, vol. 22, no. 4,p. 67, 2001.

[2] N. Kubota and Y. Toda, “Multimodal communication forhuman-friendly robot partners in informationally structuredspace,” Systems, Man, and Cybernetics, Part C: Applicationsand Reviews, IEEE Transactions on, vol. 42, no. 6, pp. 1142–1151, 2012.

[3] D. Sperber and D. Wilson, Relevance – Communication andCognition. Oxford Univ. Press, 1995.

[4] A. Yorita and N. Kubota, “Cognitive development in partnerrobots for information support to elderly people,” AutonomousMental Development, IEEE Transactions on, vol. 3, no. 1, pp.64–73, 2011.

(a) BAM

(b) QBAM

Figure 11. The result of recall rate (In: Word, Out: Object and Gesture).

[5] B. Kosko, “Adaptive bidirectional associative memories,” Ap-plied optics, vol. 26, no. 23, pp. 4947–4960, 1987.

[6] H. Shi, Y. Zhao, and X. Zhuang, “A general model for bidirec-tional associative memories,” Systems, Man, and Cybernetics,Part B: Cybernetics, IEEE Transactions on, vol. 28, no. 4, pp.511–519, 1998.

[7] G. G. Rigatos and S. G. Tzafestas, “Parallelization of a fuzzycontrol algorithm using quantum computation,” Fuzzy Systems,IEEE Transactions on, vol. 10, no. 4, pp. 451–460, 2002.

[8] N. Masuyama, C. K. Loo, and N. Kubota, “Quantum me-chanics inspired bidirectional associative memory for humanrobot interaction,” in 3rd International Workshop on AdvancedComputational Intelligence and Intelligent Informatics, 2013.

[9] A. Lee and T. Kawahara, “Recent development of open-sourcespeech recognition engine Julius,” in Proceedings: APSIPAASC 2009, 2009, pp. 131–137.

71

Documents

[IEEE 2013 Second International Conference on Robot, Vision and Signal Processing (RVSP) - Kitakyushu, Japan (2013.12.10-2013.12.12)] 2013 Second International Conference on Robot,