Upload
naoyuki
View
216
Download
2
Embed Size (px)
Citation preview
Human-Robot Interaction System with Quantum-Inspired Bidirectional Associative
Memory
Naoki Masuyama∗, Chu Kiong Loo∗ and Naoyuki Kubota†∗Faculty of Computer Science and Information Technology,
University of Malaya,
50603 Kuala Lumpur, MALAYSIA
Email: [email protected]
Email: [email protected]†Department of Systems Design,
Tokyo Metropolitan University,
6-6 Asahigaoka, Hino, Tokyo, 191-0065, JAPAN
Email: [email protected]
Abstract—This paper discussed the Interaction System withRobot Partner using Quantum-Inspired Bidirectional Asso-ciative Memory (QBAM). We have been developed QBAMwhich has the superior Memory Capacity and Recall Reliabilitycompare with conventional models. Due to these advantages,the proposed system can be stored much information andits relationships. Using QBAM, we construct the interac-tion system that can be associated with gesture, object andvoice information. In proposed system, Steady-state geneticalgorithms are applied in order to detect objects via imageprocessing. Spiking neural networks are applied to memorizethe spatio-temporal patterns of gesture. For voice recognition,we use Julius that is open source large vocabulary continuousspeech recognition engine. The results of experiment showsthat proposed system is able to contribute for the facilitationof communication with Robot Partner.
Keywords-Human-Robot Interaction; Associative Memory;Quantum Computing;
I. INTRODUCTION
One of the significant problems in aging society is elderly
care. In fact, the number of elderly people lives alone is
increasing and they are much lacking a chance of com-
munication with other people that compare with elderly
people who live with family. Due to this, these people would
increase the probability of cognitive decline and so on. In
order to improve this situation, communication robot has
been developed [1], [2]. According to relevance, human
thought is not just transmitted, but is in fact a shared
event between two people [3]. Such a shared cognitive
environment is called a mutual cognitive environment [4].
Conventionally, if the robot provides a topic, it is selected
a suitable one based on past history of conversation with
human, human behavior, or choose general topics such as
weather and news from the Internet. If we apply associative
memory under shared cognitive environment, however, the
robot can provide the more suitable topic of current content
for conversation with human. As a result, communication
with robot and human will be more active.
In the early 1980’s, Hopfield proposed an auto-associative
memory model to store and recall information as the human
brain. In the late 1980’s, Kosko extended the Hopfield
model to a bidirectional associative memory (BAM) [5].
These models, however, are suffered from quite low Memory
Capacity and Noise Tolerance. To improve the performance,
A General model for BAM (GBAM) was proposed [6].
This model is defined a weight matrix in each layer, and
developed an algorithm for learning the asymptotic stability
condisions. As another approach, there is a model that apply-
ing quantum mechanics for Hopfield model [7]. This model
shows quantum information processing in neural structures
results in an exponential increase of storage capacity, and
it can explain the extensive memorization and inferencing
capabilities of humans. Inspired by this model, we have
been developed Quantum-Inspired Bidirectional Associative
Memory (QBAM) [8].
In this paper, we propose the interaction system with
Robot Partner using QBAM. It can be associated with
gesture, objet and voice information. In section II, we briefly
introduce the structure of QBAM and its superior abilities in
terms of Memory Capacity and Noise Tolerance. In section
III, we explain the total architecture of interaction system
with QBAM and its computational intelligence technologies.
In section IV presents the experimental results that com-
municate between Robot Partner and human through object
recognition, gesture recognition and voice recognition.
II. QUANTUM-INSPIRED BIDIRECTIONAL ASSOCIATIVE
MEMORY
In this section, it will be shown the structure of Quantum-
Inspired Bidirectional Associative Memory (QBAM).
A. Structure of QBAM
Let{X(k), X(k), . . . , X(k)
}and
{Y (k), Y (k), . . . , Y (k)
},
for k = 1, 2, . . . , be the bipolar pattern to be stored. kdenotes the number of pattern pairs, M and N denote the
2013 Second International Conference on Robot, Vision and Signal Processing
978-1-4799-3184-2/13 $31.00 © 2013 IEEE
DOI 10.1109/RVSP.2013.23
66
number of neurons in X-layer and Y-layer, respectively, WTij
and Wij denote the weight matrix in X-layer and Y-layer,
respectively.
• X-layer to Y-layer⎧⎪⎪⎨⎪⎪⎩X(k)=
M∑j=1
WTij x
(k)i (1a)
y(k)j =sgn
(Xk
j
)(1b)
• Y-layer to X-layer⎧⎪⎪⎨⎪⎪⎩Y (k)=
N∑i=1
Wijy(k)j (2a)
x(k)i =sgn
(Y ki
)(2b)
The weight matrix WTij and Wij are as follows:
• X-layer to Y-layer
WTij =
1
K
N∑j=1
M∑i=1
vTj ui
(3)
• Y-layer to X-layer
Wij =1
K
M∑i=1
N∑j=1
uTi vj
(4)
ui and vj are calculated by Gram-Schmidt orthogonaliza-
tion that according to a1 = A1/ ‖A1‖ (p = 1), bp = Ap
−∑k−1
i=p−1 (ai,Ai) ai and ap = bp/ ‖bp‖ (2 ≤ p ≤ k),where A denotes the vector of performing orthogonalization,
a and b denote orthonormalized vector and orthogonalized
vector, respectively. The update rule of weight matrix WTij
and Wij are defined as Eqs. (5) and (6), where t denotes
time steps, exponential T denotes transpose. x(k)(stored) and
y(k)(stored) denote kth stored pattern, and x(k) and y(k) denote
inner state of X-layer and Y-layer, respetively. M and
N denote the number of neurons in X-layer and Y-layer,
respectively. F (0 < F ≤ 1) denotes variation amount in
position for the center of Fuzzy sets.
B. Simulation Experiment
In this section, we compare with QBAM, BAM [5] and
GBAM [6]. It will be shown that the simulation experiment
of Memory Capacity and noise tolerance on each model.
We set the number of stored pattern pair sets K = 75(Alphabet, Number, Image, etc.). These patterns represent
25 (5 by 5) bipolar patterns as neuron for X-layer and Y-
layer, respectively.
• Memory Capacity
Memory Capacity is an important performance element
for associative memory. We consider that if the percentage
of successful recall for number of K pair sets is over 90%, Kpair sets can be stored in the memory. It is known that with
n neurons in both X-layer and Y-layer without noise, BAMs
capacity is not larger than“0.15n”, and GBAMs capacity is
around “1.0n”. In Fig. 1, GBAM is less than 90% around
K = 26. The capacity of QBAM, however, can be stored
all pair sets.
• Noise Tolerance
Noise tolerance is also significant function in associative
memory. We measure the noise tolerance by add the noise
Figure 4. The architecture of robot system.
(a) Object Recognition (b) Gesture Recognition
Figure 5. Example of Computational Intelligence.
randomly on input data in X-layer. In Fig. 7, we set the
number of pair sets K = 3, because this value is the
maximum memory capacity of BAM in n = 25. In Figs.
7 and 8, GBAMs correct recall rate is decrease in 60% of
noise rate. On the other hand, QBAM shows outstanding
noise resistance even if high noise rate.
III. SYSTEM ARCHITECTURE AND COMPUTATIONAL
INTELLIGENCE
We have been developed Quantum-Inspired Bidirectional
Associative Memory and its application [8], and conducted
simulation experiments in terms of Memory Capacity and
Noise Tolerance that are compared with conventional meth-
ods. As a result, we have confirmed that QBAM has the
superior ability. It will be applied for the proposed system.
A. Interaction System with Associative Memory
We developed the interactive communication system with
associative memory. Fig. 4 shows structure of the system.
The system is composed of Robot Partner, Microsoft Kinect,
microphone and server PC. Kinect can get the RGB data
and the distance data. Using inputs data that are received
from Kinect, server PC detects the object color and shape
by Steady-State Genetic Algorithm (SSGA), or recognizes
the gestures by Spiking Neural Networks (SNNs) (Fig. 5).
For voice recognition, we apply Julius that is open source
large vocabulary continuous speech recognition engine [9].
It works in real time, and recognition accuracy shows over
90% in 20,000 words reading test. Microphone collects hu-
man voice for Julius. Julius is connecting with server PC by
TCP/IP of module mode. In this system, we apply Japanese
language model to Julius. Thus It will be worked as Japanese
recognition engine. Server PC also calculates relationship
between object, gesture and word by QBAM. Then, based
on the relationship, server PC sends the utterance or behavior
67
• X-layer to Y-layer
WT (t+1)ij =
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩
WT (t)ij +
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩
−F, If xk(stored)
M∑i=1
WT (t)ij xk ≤ 0 and W
T (t)ij ≥ 0
F, If xk(stored)
M∑i=1
WT (t)ij xk ≤ 0 and W
T (t)ij < 0
WT (t)ij , Otherwise
(5)
• Y-layer to X-layer
W(t+1)ij =
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩
W(t)ij +
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩
−F, If yk(stored)
N∑j=1
W(t)ij yk ≤ 0 and W
(t)ij ≥ 0
F, If yk(stored)
N∑j=1
W(t)ij yk ≤ 0 and W
(t)ij < 0
W(t)ij , Otherwise
(6)
Figure 1. The result of Memory Capacityin K = 75.
Figure 2. The result of Noise Tolerance inK = 3.
Figure 3. The result of Noise Tolerance inK = 20.
Figure 6. The example of Robot Partner behaviors (Loser Bye-Bye, UpperBye-Bye and Up & Down).
order to Robot Partner by TCP/IP. Fig. 6 shows example of
Robot behaviors.
B. Multi-modal Communication with Robot Partner
Robot Partner with a human-like body and human-like
abilities could in principle be capable of intuitive multimodal
communication with human [2]. Moreover, it would be
possible to transfer information while having effectiveness
and robustness. Multiple modalities, such as gesture and
utterance can be represented several information. Simul-
taneously, it can be supplemented such information with
effective.
C. Object Recognition
It will be shown method for object recognition. We focus
on color-based object and shape recognition using SSGA
based on template matching [4]. We used an octagonal tem-
plate that is used for detecting a target that the jth point g(i,j)of the ith template is represented by (gi,1+gi,j cos(gi,j+m),gi,1+ gi,j sin(gi,j+m), i = 1, 2, . . .G, j = 3, 4, . . . , 2m+2.
Oi(= (gi,1, gi,2)) is the centre of a candidate template on
the image. G and m are the number of candidate templates
and the searching points used in a template, respectively.
A superscript O stands for the parameter for object recog-
nition. Therefore, a candidate template is composed of the
numerical parameters of (gi,1, gi,2, . . . , gi,2m+2). The fitness
value is calculated as follows:
fi = CTarget − η · COther (7)
where η is a coefficient for penalty (η > 0), and CTarget
and COther denote the number of pixels of the colors
corresponding to a target and to other colors included in the
template, respectively. The target color is selected accord-
ing to the pixel color that occupies most of the template
candidate, so that the largest area of a single color is then
extracted on the reduced color space of the image.
68
Furthermore, we apply k-means algorithm for the cluster-
ing of candidate templates in order to find several objects
simultaneously. The inputs to the k-means algorithm are the
central positions of the template candidates: ui (= (gi,1, gi,2,i = 1, 2, . . . , K)). The number of clusters is K . When
the reference vector of the kth cluster is represented by
rk = (rk,1, rk,2, . . . , rk,m), the Euclidian distance between
the ith input vector ui = (gi,1, gi,2, . . . , gi,k) and the kth
reference vector is defined as di,k = ||ui − rk||.
Next, the reference vector minimizing the distance di,kis selected by ci = argmink ||ui − rk|| ,where ci is the
cluster number that the ith input belongs to. After selecting
the nearest reference vector to each input, the kth reference
vector is updated by the average of the inputs belonging
to the kth cluster. If the update is not performed during
the clustering process, the updating process is complete.
The crossover and selection are performed with the template
candidates from each cluster. Therefore, SSGA tries to find
different objects within each cluster according to the spatial
distribution of objects in the image.
D. Gesture Recognition
First of all, the internal state hi(t) is calculated as
following:
hi(t) = tanh(hsyni (t) + hext
i (t) + hrefi (t)
)(8)
here, a hyperbolic tangent is used to avoid the bursting of
neuronal fires. hexti (t) is the input to the ith neuron from the
external environment, and hsyni (t), which includes the output
pulse from other neurons, is calculated by
hsyni (t) = γsyn · hi(t− 1) +
N∑j=1,j �=i
wj,i · hEPSPj (t) (9)
Furthermore, hrefi (t) is indicates the refractoriness factor of
the neuron, w(j,i) is a weight coefficient from the jth to
ith neuron, hEPSPj (t) is the excitatory postsynaptic potential
(EPSP) that is approximately transmitted from the jth neu-
ron at the discrete time t, N is the number of neurons, and
γsyn is the temporal discount rate. The presynaptic spike
output is transmitted to the connected neurons according to
the EPSP, which is calculated as follows:
hEPSPj (t) =
T∑n=0
κnpi(t− n) (10)
where κ is the discount rate (0 < κ < 1.0), pi(t) is the
output of the ith neuron at the discrete time t, and T is the
time sequence to be considered. If the neuron is fired, R is
subtracted from the refractoriness value in the following:
hrefi (t) =
{γref · href
i (t− 1)−R, If pi(t− 1)=1γref · href
i (t− 1), Otherwise(11)
Figure 7. Spiking neurons arranged on image.
where γref is the discount rate. When the internal potential
of the ith neuron is larger than the predefined threshold, a
pulse is outputted as follows:
Pi(t) =
{1, If hi(t) ≥ qi0, Otherwise
(12)
where qi is the threshold for firing. The weight parameters
are trained based on the temporal Hebbian learning rule as
follows:
wj,i ← tanh(γwgt ·wj,i + ξwgt ·hEPSP
j (t−1)·hEPSPj (t)
)(13)
where γwgt is the discount rate and ξwgt is the learning rate.
Self-Organizing Map (SOM) is often applied to extract
a relationship among observed data, since it can ascertain
the hidden topological structure from the data. The inputs
to SOM given as the weighted sum of pulse outputs from
the neurons v = (v1, v2, . . . , vN ), where vi is the state of
the ith neuron. In order to consider the temporal pattern,
we use hEPSPi (t) as vi, although the EPSP is used when
the presynaptic spike output is transmitted. When the ithreference vector of SOM is represented by ri, the Euclidian
distance between an input vector and the ith reference vector
is defined as di = ||v− ri||, where ri = (r(1,i),r2,i,...,r(N,i))
and the number of reference vectors (output units) is M .
Next, the kth output unit that minimizes the distance di is
selected by k = argmini ||v − ri||.Furthermore, the reference vector of the ith output unit is
trained by
ri ← ri + ξSOM · ζSOMk,i · (v − ri) (14)
where ξSOM is a learning rate (0 < ξSOM < 1.0), and ζSOMk,i
is a neighbourhood function (0 < ζSOMk,i < 1.0).
The robot extracts human hand motion from a series
of images using SSGA, in which the maximal number of
images is TG. The sequence of hand positions in represented
by G(t) = (Gx(t), Gy(t)), where t = 1, 2, . . . , TG. Here,
the spiking neurons are arranged on a planar grid (Fig. 7)
and N = 45. By using the value of a human hand position,
the input to the ith neurons is calculated by the Gaussian
membership function as follows:
hexti (t) = exp
(−||ci −G(t)||2
2σ2
)(15)
where ci = (c(x,i),c(y,i)) is the position of the ith spiking
neuron on the image, and σ is the standard deviation. The
69
Table IRELATIONSHIP BETWEEN OBJECT, GESTURE AND WORD.
IDRelationship
Object Gesture Word
0 No Object No Gesture No Word
1 Red Circle Lower Bye-Bye RED CIRCLE
2 Green Triangle Upper Bye-Bye GREEN TRAIANGLE
3 Blue Rectangle Up & Down BLUE RECTANGLE
4 Blue Rectangle – –
sequence of pulse outputs pi(t) is obtained using the human
hand positions G(t). Due to the adjacent neurons with the
trajectory of the hand position are easily fired as a result of
the temporal Hebbian learning, the SNNs can memorize the
temporal firing patterns of gestures.
IV. EXPERIMENTAL RESULTS
This section presents the experimental results of commu-
nication between Robot Partner and human using proposed
model through object recognition, gesture recognition and
voice recognition. We defined 5 types of relationship for
associative memory between object, gesture and word as
Table I. For example, depends on situation, “Red Circle”
as input ID “1” can associate “Lower Bye-Bye” or “RED
CIRCLE” as output ID “1”. Here, each input and output are
composed by bipolar pattern.
Fig. 8(a) shows the sequentially input as object, gesture
and word. Figs. 8(b), 8(c) and 8(d) show sequentially
output of Relationship that is associated by BAM, GBAM
and QBAM, respectively. According to Table I, the output
waveform should show the same form as the input one
except for input ID 4. We define that if the non-defined
information is inputted to the system, it regards there is not
input. In Fig. 8(b), due to the quite low Memory Capacity,
BAM cannot recall the correct output. In Fig. 8(c), GBAM is
improved from BAM, however, still recall process has some
failures. On the other hand, in Fig. 8(d), output of QBAM
shows the same waveform with input one except for input
ID 4.
Figs. 9, 10 and 11 show the recall rate between input
and output in BAM and QBAM. Here, in each figures, the
axis labels on the right side represent input information, the
axis labels on the lower side represent recalled information.
The results of BAM in each figure, almost all the outputs
are converged to local minima. In contrast, the results of
QBAM are following to relationships of Table I correctly.
From the results of experiment with robot system, we
regard that QBAM is effective method for the associative
communication system with Robot Partner.
V. CONCLUSION
This paper has proposed the interaction system with Robot
Partner using QBAM. Thanks to superior ability of QBAM,
(a) Input ID
(b) BAM
(c) GBAM
(d) QBAM
Figure 8. The result of Relationships.
(a) BAM
(b) QBAM
Figure 9. The result of recall rate (In: Object, Out: Gesture and Word).
70
(a) BAM
(b) QBAM
Figure 10. The result of recall rate (In: Gesture, Out: Word and Object).
the experimental results show that the system can be store
much more information than in case of use the conventional
methods. If we apply other types of information such as
daily conversation instead of simple information, the system
can be create the mature communication with Robot Partner.
As a future work, we will develop QBAM to Quantum-
Inspired Multidirectional Associative Memory. It would be
more effective for interaction system with Robot Partner.
ACKNOWLEDGMENT
This research is supported by UM RU Operation grant
(Project No. RU014-2013).
REFERENCES
[1] J. Cassell, “Embodied conversational agents: representationand intelligence in user interfaces,” AI magazine, vol. 22, no. 4,p. 67, 2001.
[2] N. Kubota and Y. Toda, “Multimodal communication forhuman-friendly robot partners in informationally structuredspace,” Systems, Man, and Cybernetics, Part C: Applicationsand Reviews, IEEE Transactions on, vol. 42, no. 6, pp. 1142–1151, 2012.
[3] D. Sperber and D. Wilson, Relevance – Communication andCognition. Oxford Univ. Press, 1995.
[4] A. Yorita and N. Kubota, “Cognitive development in partnerrobots for information support to elderly people,” AutonomousMental Development, IEEE Transactions on, vol. 3, no. 1, pp.64–73, 2011.
(a) BAM
(b) QBAM
Figure 11. The result of recall rate (In: Word, Out: Object and Gesture).
[5] B. Kosko, “Adaptive bidirectional associative memories,” Ap-plied optics, vol. 26, no. 23, pp. 4947–4960, 1987.
[6] H. Shi, Y. Zhao, and X. Zhuang, “A general model for bidirec-tional associative memories,” Systems, Man, and Cybernetics,Part B: Cybernetics, IEEE Transactions on, vol. 28, no. 4, pp.511–519, 1998.
[7] G. G. Rigatos and S. G. Tzafestas, “Parallelization of a fuzzycontrol algorithm using quantum computation,” Fuzzy Systems,IEEE Transactions on, vol. 10, no. 4, pp. 451–460, 2002.
[8] N. Masuyama, C. K. Loo, and N. Kubota, “Quantum me-chanics inspired bidirectional associative memory for humanrobot interaction,” in 3rd International Workshop on AdvancedComputational Intelligence and Intelligent Informatics, 2013.
[9] A. Lee and T. Kawahara, “Recent development of open-sourcespeech recognition engine Julius,” in Proceedings: APSIPAASC 2009, 2009, pp. 131–137.
71