Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
저 시-비 리- 경 지 2.0 한민
는 아래 조건 르는 경 에 한하여 게
l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.
다 과 같 조건 라야 합니다:
l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.
l 저 터 허가를 면 러한 조건들 적 되지 않습니다.
저 에 른 리는 내 에 하여 향 지 않습니다.
것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.
Disclaimer
저 시. 하는 원저 를 시하여야 합니다.
비 리. 하는 저 물 리 목적 할 수 없습니다.
경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.
Master’s Thesis of Engineering
Use of Probabilistic Machine
Learning for Modeling and State
Estimation of Soft Robot Systems
비모수 기계학습 방법을 이용한 소프트 로봇 시스
템의 모델링
February 2020
Graduate School of Engineering
Seoul National University
Mechanical Engineering Major
DongWook Kim
비모수 기계학습 방법을 이용한 소프트
로봇 시스템의 모델링
Modeling Soft Robot Systems Using
Nonparametric Machine Learning Techniques
Advisor Yong-Lae Park
Submitting a master’s thesis of Public
Administration
October 2019
Graduate School of Engineering
Seoul National University Mechanical Engineering Major
DongWook Kim
Confirming the master’s thesis written by
DongWook Kim
December 2019
Chair Lee, Kun Woo (Seal)
Vice Chair Park, Yong-Lae (Seal)
Examiner Ahn, Sung-Hoon (Seal)
i
Abstract Due to hysteresis and stochastic behavior of soft robots, it is
challenging to model, control, and estimate the real state of them.
Reflecting previous attempts on modeling hysteresis and borrowing
the intuitiveness of the Bayesian network, we propose a novel way
to model hysteresis from a Bayesian perspective. By assuming that
current output distribution of the hysteretic system follows a
Gaussian distribution of mean and covariance is comprised of only on
the previous input-output tuples and the current input, we construct
a hysteresis Bayesian network that can efficiently model transition
and sensory model and best estimate the states using extended
Kalman filtering. To reflect stochastic behavior, we use Gaussian
process regression to provide a probabilistic model for the function
approximator of transition and sensor model, also variances for
Kalman filtering. Furthermore, we use Bayesian linear regression
with Gaussian process regression prior to compensating for
statistical disturbances and noises. Simulation and real-world soft
robot application results show that our method works better than
state-of-the-art soft robot modeling techniques, especially for the
noisy environment. In conclusion, the proposed method can
effectively, rapidly construct the model and estimate the state while
maintaining robustness against disturbances and noises.
Keyword : Soft Robotics, Learning-based System Modeling,
Nonparametric Machine Learning, Gaussian Process Regression
Student Number : 2018-27197
ii
Table of Contents
Chapter 1. Introduction ............................................................ 1
Chapter 2. Mathematical Preliminaries .................................... 5
2.1. Bayesian Network and Hidden Markov Model
2.2. Gaussian Process Regression
2.3. Reproducing Kernel Hilbert Space and Kernel Functions
Chapter 3. Hysteresis Bayesian Network ............................... 9
3.1. Fundamental Assumptions and Properties of Hysteresis
3.2. Hysteretic System Modeling
3.3. Gaussian Process Regression as a Function Approximator
Chapter 4. State Estimation ................................................... 21
4.1. State Estimator Using Extended Kalman Filtering
4.2. Observability of Hysteresis Bayesian Network
4.3. Robust Redesign Using Bayesian Linear Regression
Chapter 5. Application ........................................................... 32
5.1. Numerical Examples
5.2. Modeling PneuNet-Type Actuator for Proprioceptive Sensing
Chapter 6. Discussion ............................................................ 42
iii
6.1. Robust Redesign and Noise Level
6.2. Issues in Real-Time Implementation
6.3. Fundamental Limitations of Learning and Filtering
Chapter 7. Conclusion ............................................................ 47
Bibliography ........................................................................... 48
Abstract in Korean ................................................................ 55
1
Chapter 1. Introduction
Despite the radical development of soft robotics, there are not
many scenes in our society where soft robots are playing a significant
role yet. Soft robot systems developed so far have been made for
wearable [1], [2], physical Human-Robot Interaction (p-HRI) [3],
[4], and disaster relief [5], and have focused people’s attention on
its unique locomotion and behavior [6]-[8]. However, common
problems with soft robots make researchers hesitant to use them.
First, because of the use of soft materials such as silicone rubber,
soft robot systems have an infinite number of state variables in
theory, so it is challenging to model the robotics system itself [9].
Second, even if the model is given, their system behavior is
stochastic so that users cannot plan the control in a deterministic
view [10]. Third, soft robots suffer in hysteresis due to their
viscoelastic behavior of the soft material [11]. Typically, the
hysteresis phenomenon is known to have a property that the current
output is proportional to all inputs in the past, and it is known to be
very difficult to analyze [12]. Therefore, at present, it is difficult to
model a system showing hysteresis properties having an explicit
solution. So although many people are interested in the research of
soft robotics, they are confronted with these limitations.
Many studies have attempted to identify the nature of hysteresis
in the past. Hysteresis modeling methods can be roughly classified
into three types: operator method, differential equation method, and
2
machine learning method. First, representative examples of operator
methods are the Preisach model using relay operators [13], the
Prandtl-Ishlinskii (PI) model [14] and the Kransnosel’skii-
Pokrovskii (KP) model [15] using play operators, the Koopman
operator [16], and the Maxwell-Slip model [17] based on backlash
and friction operator. The operator based methods had been the
interest of nonlinear control theory, so the Lyapunov based nonlinear
control design such as adaptive and robust control has been widely
developed [18]-[20]. However, it is difficult to make various
hysteresis curves because they all make the hysteresis curve with
the weighted summation of the basis operator. Besides, many of them
are assuming hysteresis is rate-independent, where hysteresis in
soft robot systems is typically not. Second, representative models
using differential equations are the Bouc-wen model [21], the
Duhem model [22], and the polynomial nonlinear state-space
(PNLSS) model [23], and they are also easy to apply nonlinear
control such as backstepping, T-S fuzzy control because they use
the model given by the state-space-like system equations [24]-
[26]. However, they share the same disadvantage with the operator
based model that the number of hyperparameters is minimal, so it
limits the type of the hysteresis curve. Finally, researchers started
to use machine learning, which is an Artificial Neural Network (ANN)
that is an optimizer for operator based model [27], Recurrent Neural
Network (RNN) which is easy to analyze time-dependent sequence
[28], Time-Delay Artificial Neural Network (TDANN) for the
3
streamline of RNN. Deep learning has the advantage of being able to
model whatever the theoretical complexity is, but since it is
eventually modeled as a black box, it does not provide us the meaning
of each network. Further, deep-network based methods using Adam
optimizers tend to show chaotic behavior when they are faced with a
new input that has never been included in a training region. Moreover,
hyperparameters of the deep network are determined with the brute
force (i.e., the best result after trying several combinations), so it is
not known why certain hyperparameter combinations are optimal.
Nevertheless, recent developments of soft robots are using deep
learning because it is more natural and more effective to create
models than the previous two methods [28]-[30].
Then, is there any way to be useful both in terms of control and
accuracy while preserving all of the inherent features of hysteresis?
Considering the most basic definition that the output of hysteresis is
determined by referring to all inputs from starting point to the
present, we can easily infer that the computation burden becomes
more significant when operating time gets longer. So we concluded
that appropriate assumptions are needed to analyze hysteresis. In the
meantime, we found that there exists an assumption to a similar
problem raised in the control and automation field, which is the
Markov assumption defined in the dynamic Bayesian network. The
Markov assumption says that the current state depends only on the
finite number of previous states. Furthermore, the first-order
Markov chain assumes that the current state only depends on the
4
previous states and not on the other states [31]. By mirroring the
concept of the Markov chain, we made a proposition that the output
distribution of the hysteretic system is only dependent at the present
input, immediately preceding input and the latest output. Also, under
this assumption, the hysteretic system can be similarly written into
terms of a hidden Markov model (HMM), which consists of a
transition model, sensor model, and initial state distribution. We
propose a Bayesian network that follows this property as a
Hysteresis Bayesian Network (HBN).
In this thesis, we introduce the HBN as a novel way to model the
hysteresis in a probabilistic view. First, we show that the output error
of the proposed HBN is bounded so that it could be modeled in a
probability distribution formula. Second, the bounded uncertainty is
treated as a residual term so that the probability distribution is
approximated by Gaussian process regression (GPR). Third, the
uncertainty of the dynamic and sensor model is compensated by using
extended Kalman filtering (EKF) for the sub-optimal state estimator
of the HBN. Fourth, we introduce Bayesian linear regression (BLR)
to strengthen the robustness of the algorithm. The performance of
the HBN is investigated by applying it to the artificial hysteretic
system and the actual soft robot system. We also examined whether
previous hysteresis analysis methods can be expressed through HBN.
5
Chapter 2. Mathematical Preliminaries
2.1. Bayesian Network and Hidden Markov Model
A Bayesian network is a directed acyclic graph that represents a
probability distributions of correlated random variables. Each node in
the graph is a random variable, and they have a conditional probability
distribution on its parent nodes. If the relations between nodes are
time-dependent, they are called dynamic model which consists of
state variables Xt , evidence variables Et , realization of evidence
variables et. As mentioned, the first-order Markov chain assumes
that the current state depends only on the previous state. Following
this rule, we can define HMM state transition map in a more simplified
way ( P(Xt|X0:t−1) = P(Xt|Xt−1) , where Xa:b = {Xa, Xa+1, … , Xb} ). Also,
following the sensor Markov assumption that the evidence Et
depends only on the current state, the sensor model (or observation
model) can also be simplified (P(Et|X0:t, E1:t−1) = P(Et|Xt)). Finally, a
full joint distribution of the HMM can be written as follows:
P(X0:t, E1:t) = P(X0)∏P(Xi|Xi−1)P(Ei|Xi)
t
i=1
(1)
By using exact inference methods (i.e., Viterbi algorithm) or
sampling methods (i.e., Gibbs sampling, Monte-Carlo approximation),
we can derive some marginal distributions of the Bayesian network.
Also, some special inferences exist such as filtering, smoothing, and
prediction.
6
2.2. Gaussian Process Regression
If we think about nonlinear regression, our goal is to know the
relationship of inputs X and outputs Y expressed in a nonlinear
mapping f: X → Y, when there exists a dataset D = (X, Y). In this case,
curve fitting is possible using polynomial basis functions in the sense
of maximum likelihood estimation (MLE) or maximum a posteriori
(MAP). While this approach is quite strong, GPR has higher
compatibility to obtain high-level features because GPR is a general
linear regression with an infinite number of basis functions. Rather
than finding an explicit form of the function, the GPR finds the
probability distribution of the nonlinear function f in nonparametric
scheme, which represents the relationship between input X and
output Y.
The multivariate Gaussian distribution is the core technique of
explaining GPR. Let f be the result value of the known function of the
training case, and let f∗ be the function value of the test case. And
their respective input values are X and X∗. Then the joint distribution
of these two cases can be written as follows:
[ff∗] ~N ([
μμ∗
] , [Σ Σ∗
Σ∗ Σ∗∗]) (2)
where μ and μ∗ are the means, and Σ, Σ∗, Σ∗∗ are the covariances. As
we know the train results f, the conditional distribution of f∗ given f
is written as:
f∗|f~N(μ∗ + Σ∗TΣ−1(f − μ), Σ∗∗ − Σ∗
TΣ−1Σ∗) (3)
Borrowing the terms of multivariate Gaussians, the GPR model is
7
constructed by the mean function m(x), which is the mean of the
desired Gaussian process (GP), not the distinct form of the function
itself, and the covariance function k(x, x′), which is an autocorrelation
function of a mean-zero Gaussian process. The relation between the
functions constructing GP and m(x), k(x, x′) can be written as:
f~GP(m(x), k(x, x′)) (4)
The most popular kernel function is squared exponential kernel,
which is expressed as:
k(x, x′) = σY2 exp (−
(x − x′)2
2λ2) (5)
where σY and λ are the hyperparameters to optimize. As a sequel to
multivariate Gaussian case, the posterior distribution given training
data D can be written as:
f|D~GP(mD(x), kD(x, x′))
mD(x) = m(x) + Σ(X, x)TΣ−1(f − m)
kD(x, x′) = k(x, x′) − Σ(X, x)TΣ−1Σ(X, x′)
(6)
where X is training dataset and x is the test input, mD, kD are mean
and kernel function when dataset D is given, and Σ(X,∙) is the kernel
covariance matrix given all training dataset X . Therefore, the
estimated value Y′ for the new input X′ is the value obtained by
substituting X′ for the mean function of the input, and the uncertainty
is the value assigned to the covariance function.
The optimization of hyperparameters are obtained by maximizing the
log likelihood of the training dataset, which is given as
8
log 𝑝(𝑦|𝑋) = −1
2yT(K + σn
2I)−1y −1
2log|𝐾 + 𝜎𝑛
2𝐼| + α (7)
where α = −𝑛
2log 2𝜋. This result is also available by observing that
the mean of the output y is zero, and the covariance is given K with
the noisy observation covariance candidate σn.
2.3. Reproducing Kernel Hilbert Space and Kernel Function
GPR can not only be driven by the posterior distribution of the GP
but also from the representer theorem that minimizes the following
cost function and find mapping f(∙) = ∑ αik(∙, xi)ni=1 :
J[f] =1
2σn2∑(yi − f(xi))
2n
i=1
+1
2‖f‖H
2
=: E(f|D) + R(f)
(8)
Here, D = {(x1, y1), … , (xn, yn)} is a training set, ‖∙‖H2 is a Hilbert norm,
σn2 is a covariance of the noise level, and k(∙,∙) is a kernel function.
Function E and R are the mean-squared-error of the supervised
learning, and the regularization term respectively. That is, GPR is a
generalized linear regression that comprises features that has the
same number of the training dataset. Larger dataset means the
regressor can represent a variety of features, but training time
increases together (by the calculation of matrix inverse) which is a
trade-off between accuracy and time cost.
9
Chapter 3. Hysteresis Bayesian Network
3.1. Fundamental Assumptions and Properties of Hysteresis
In this section, we construct a GPR model for the function y = f(xt−k:t),
where xt−k:t ≔ [𝑥𝑡−𝑘, 𝑥𝑡−𝑘+1, … , 𝑥𝑡]𝑇. First, definition of the hysteresis
modeling and fundamental assumptions need to be introduced.
Throughout the whole paper, we express the system (or function) in
a continuous space, discrete-time domain. If the system is defined
in a continuous-time domain, it could be converted to discrete-time
domain by sampling with no aliasing occurs. Let us think of a simple
example of a 1-dimensional hysteresis curve between input x ∈ ℝ
and output y ∈ ℝ, and xt denotes the value of x at discretized time t.
Hysteresis is a causal mapping that the output y is dependent on the
whole past inputs of x and the initial condition: yt = 𝑓(𝑥0:𝑡, 𝑦0). For
modeling simplicity and without loss of generality, we have the
following three assumptions:
Assumption 1. Hysteresis output y is dependent only on the past k +
1 inputs of x satisfying t ≥ k, i.e., there exists a function f: ℝk+1 → ℝ
such that yt = 𝑓(𝑥𝑡−𝑘:𝑡).
Assumption 2. A difference between two time-adjacent inputs is
bounded by some positive real-valued number η ∈ ℝ+, i.e.,
∆𝑥𝑡 ≔ ‖𝑥𝑡+1 − 𝑥𝑡‖ ≤ η (9)
Assumption 3. Hysteresis function yt = 𝑓(𝑥𝑡−𝑘:𝑡) is locally Lipschitz
10
with respect to xt−k:t, i.e., for all xt−k:t𝐴 , xt−k:t
𝐵 ∈ ℝk+1:
‖𝑓(𝑥𝑡−𝑘:𝑡𝐴 ) − 𝑓(𝑥𝑡−𝑘:𝑡
𝐵 )‖ ≤ 𝐿‖𝑥𝑡−𝑘:𝑡𝐴 − 𝑥𝑡−𝑘:𝑡
𝐵 ‖ (10)
with Lipschitz constant L ∈ ℝ+.
Assumption 1 is a necessary condition for modeling simplicity that
the size of the input vector is constrained to k + 1. If not, its size
increases when the operation time gets longer, then it is hard to
model the static function f. Assumption 2 restricts the maximum
evolution of the input variable, which is also interpreted as a dynamic
bandwidth. Assumption 3 constraints the type of the hysteresis
function to locally Lipschitz (i.e., continuously differentiable) type
only.
11
Figure 1. If two different solutions for different input sequence meet
in a single point, the next input-output tuple can be estimated within
a certain interval.
12
3.2. Hysteretic System Modeling
We interpret these defined assumptions in a two-dimension space,
where two axes are x and y. Consider the red line (labeled A) and
blue line (labeled B) in Figure 1. Each line contains the history of
inputs, and its corresponding outputs which are the calculation result
of the hysteretic function yt = 𝑓(𝑥𝑡−𝑘:𝑡). Now, consider the situation
that each trajectory intersects at time t, (i.e., (xt𝐴, 𝑦𝑡
𝐴) = (𝑥𝑡𝐵, 𝑦𝑡
𝐵)).
Then, from Assumption 2, the next input xt+1𝐴 and xt+1
𝐵 must be
defined in the interval [xt𝐴 − 𝜂, 𝑥𝑡
𝐴 + 𝜂] . If the next inputs of two
trajectory A and B are the same again, even if their value at t were
the same, their output value at t + 1 may different. First, we define
the difference of input over k + 1 time interval:
∆𝒙𝒕 ≔ 𝑥𝑡−𝑘+1:𝑡+1 − xt−k:t ∈ ℝk+1 (11)
Now, we can easily show that difference between yt+1 and yt for any
trajectory is bounded using multi-dimensional mean value theorem:
‖𝑦𝑡+1 − 𝑦𝑡‖ = ‖𝑓(𝑥𝑡−𝑘+1:𝑡+1) − 𝑓(𝑥𝑡−𝑘:𝑡)‖
= ‖∇𝑓(𝑥𝑡−𝑘:𝑡 + 𝛼∆𝒙𝒕) ∙ ∆𝒙𝒕‖, ∃α ∈ [0,1]
≤ ‖∆𝒙𝒕‖‖∇f(xt−k:t + 𝛼∆𝒙𝒕)‖
= η√𝑘 + 1‖∇𝑓(𝑥𝑡−𝑘:𝑡 + 𝛼∆𝒙𝒕)‖
(12)
Substituting the result of Equation (12) to the difference between the
next output of A and B in Figure 1 can be obtained by using
Assumption 3:
13
‖𝑦𝑡+1𝐴 − 𝑦𝑡+1
𝐵 ‖
= ‖𝑓(𝑥𝑡−𝑘+1:𝑡+1𝐴 ) − 𝑓(𝑥𝑡−𝑘+1:𝑡+1
𝐵 )‖
= ‖𝑓(𝑥𝑡−𝑘:𝑡𝐴 + ∆𝒙𝒕
𝑨) − 𝑓(𝑥𝑡−𝑘:𝑡𝐴 ) − 𝑓(𝒙𝒕−𝒌:𝒕
𝑩 + ∆𝒙𝒕𝑩)
+ 𝑓(𝑥𝑡−𝑘:𝑡𝐵 )‖
≤ η√𝑘 + 1(‖∇𝑓(𝑥𝑡−𝑘:𝑡𝐴 + 𝛼∆𝒙𝒕
𝑨)‖ + ‖∇𝑓(𝑥𝑡−𝑘:𝑡𝐵 + 𝛼∆𝒙𝒕
𝑩)‖)
≤ η√𝑘 + 1(L + M), ∃(α, β) ∈ [0,1]
(13)
where L,M are the Lipschitz constants. The result shows that if
values of xt, 𝑦𝑡 and xt+1 are identically defined for two different
solutions A and B of the hysteretic system yt = 𝑓(𝑥𝑡−𝑘:𝑡), their next
output’s distance ‖𝑦𝑡+1𝐴 − 𝑦𝑡+1
𝐵 ‖ is bounded with a function of
maximum state evolution η and time-dependency k. If k = 1 or k =
2, which means that there is no hysteresis (yt = 𝑓(𝑥𝑡) or weak time
dependency yt = 𝑓(𝑥𝑡 , 𝑥𝑡−1) ), fixing xt, 𝑦𝑡 , and xt+1 leads identical
solution, which is trivial. As a result, hysteretic system can be
expressed as a function of xt, 𝑦𝑡, and xt+1 with the residual term ϵt,
which is bounded by η√𝑘 + 1(𝐿 + 𝑀) . If the target function has a
smooth, continuous form, we can assume that the residual can be
treated as an uncertainty term of the GP, which is a covariance term.
If the target function do not shows Gaussian behavior, we can use a
kernel trick by defining a new mapping that transports the hysteresis
output y to the latent variable z, which its latent space shows a
Gaussian behavior. Gaussian process that uses the result of latent
space is called warped Gaussian process. In summary, the hysteresis
function can be simplified as:
14
𝑦𝑡 = 𝑓(𝑥𝑡−𝑘:𝑡) = h(xt−1, yt−1, xt) + ϵt (14)
If the target function function has a smooth, continuous form, we can
assume that the residual can be treated as an uncertainty term of the
GP, which is a covariance term. However, if the target is non-smooth,
we can introduce a latent variable as a residual term, and the latent
variable ζ has some prior distribution p(ζ), and the function h can be
written as a marginal:
h(xt−1, yt−1, xt) = ∫𝑔(𝑥𝑡−1, 𝑥𝑡, 𝑦𝑡−1, 𝜁𝑡)𝑝(𝜁𝑡)𝑑𝜁𝑡 (15)
Wang and Neal proved that introducing a latent variable in the target
function can represent non-Gaussian, heteroscedastic residuals. If
we are modeling general nonlinear function f that can have any
strange property, we should model our residual that might not follow
Gaussian distribution, and ζ can realize it. Now, we propose the full
system consisting of a dynamic model and sensor model both having
hysteresis property.
Proposition 1. Consider our hysteretic system has a hysteretic
transition and sensory model. If Assumptions 1-3 hold, then the
hysteretic system is dependent on a finite number of variables as
shown below:
𝑥𝑘 = 𝐻𝐷[𝑥𝑘−1, 𝑥𝑘−2, 𝑢𝑘−1, 𝑢𝑘−2] + 𝑤𝑘−1
𝑦𝑘 = 𝐻𝑆[𝑥𝑘−1, 𝑦𝑘−1, 𝑥𝑘] + 𝑣𝑘
(16)
Where Ht and Hs are nonlinear probability distribution functions,
each representing the transition model and sensor model,
respectively, x is the system state, u is the system input, and y is
15
the system output, and w, v are the bounded residual term induced
by past time-dependent properties.
Proof. If Assumptions holds and the relation between two variables
of input x and output y is hysteretic, tuple {xk+1, 𝑦𝑘+1} will only
depend on {xk, 𝑦𝑘}. As xk induces yk, we can conclude that yk+1 is a
function of yk, 𝑥𝑘+1, and xk. In other words, the current output of the
hysteretic system depends only on its preceding output, and two
recent inputs. Returning to the system Equation 16, first, the
transition model is a hysteretic function of xk−1 and uk−1. Then, xk
will depend only on xk−1(indicating both preceding output and recent
inputs), xk−2, 𝑢𝑘−1 and uk−2 (two recent inputs) by reflecting the
above representation. Second, the sensor model is a hysteretic
function of xk, so yk will depend on yk−1 (preceding output), xk, and
xk−1 (two recent inputs).
16
Figure 2. Hysteresis Bayesian Network (HBN) for hysteretic
systems is under critical assumptions that limit the correlation
between nodes.
17
Finally, we can construct the Bayesian network reflecting the relation
proposed on Proposition 1 We call this new Bayesian network as
`Hysteresis Bayesian Network (HBN),' which is shown in Figure 2.
3.3. Gaussian Process Regression as a Function Approximator
To represent a GPR for our target function yt = 𝑓(𝑥𝑡−𝑘:𝑡), we use our
result from the previous section, given in Equation (14). Construction
of the kernel function is specified into three steps: initial setting,
warping step, and advantaging step. Each step holds the prior
knowledge of what we know about the hysteresis phenomenon.
First, our initial setting is a squared-exponential kernel, which is
widely used in GPR. However, a dimension of the inputs of xt and yt
may not match, so it is more acceptable to use a normalized metric,
also known as the Mahalanobis distance. The kernel looks like:
𝑘(𝑥, 𝑦) =𝜎𝑁
2
2(−𝑒𝑥𝑝((𝑥 − 𝑦)𝑆−1(𝑥 − 𝑦)𝑇)) (17)
where matrix S is the covariance matrix for the training dataset.
Second, we warp the input space into the new feature space using
the neural-network like function. We consider a hyperbolic tangent
function as a warping function. With f(x) = ∑𝑏𝑖 tanh(𝑥 + 𝑎𝑖) + 𝑐𝑖 as a
warping function, the kernel looks like:
𝑘(𝑥, 𝑦) =𝜎𝑁
2
2(−𝑒𝑥𝑝((𝑓(𝑥) − 𝑓(𝑦))𝑆−1(𝑓(𝑥) − 𝑓(𝑦))𝑇)) (18)
Finally, we give an advantage term to similar input data. If we
generally think about the hysteresis, we conclude that the behavior
18
of the function is different when the function input is increasing and
decreasing. That is the fundamental property of the hysteresis. So if
we give advantage to the covariance of the combination of same
gradients, and give penalty to the covariance of the combination of
different gradients, we can model the hysteresis more practically.
This is given in the below kernel function, which is the advantage
term.
𝑘(𝑥, 𝑦) =𝜎𝑁
2
2(1 + 𝑠𝑔𝑛(�̇�)𝑠𝑔𝑛(�̇�)𝑇)(−𝑒𝑥𝑝((𝑓(𝑥) − 𝑓(𝑦))𝑆−1(𝑓(𝑥) − 𝑓(𝑦))𝑇)) (19)
where �̇�, �̇� are the direction of the x, y. The effect of advantaging
term 1 + sgn(�̇�)𝑠𝑔𝑛(�̇�)𝑇 shows up in the RKHS [33-37]. Let sgn(�̇�) =
𝛾𝑖 and sgn(�̇�) = 𝛾𝑗 . Then, the target function f(x, y, �̇�, �̇�) =
∑ 𝛼𝑖𝑘(𝑥, 𝑦, �̇�, �̇�)𝑛𝑗=1 can be obtained by the hyperparameter optimization.
Then, the loss function JL[𝑓] is given as:
𝑘𝐽𝐿[𝑓] =1
2‖𝑓‖ℋ
2 +1
2𝜎𝑛2
× ∑(𝑦𝑖 − ∑1
2𝛼𝑗
𝑛
𝑗=1
(1 + 𝛾𝑖𝛾𝑗)𝑘𝐴𝑆𝐸(𝑥, 𝑦))
𝑛
𝑖=1
2
=1
2‖𝑓‖ℋ
2 +1
2𝜎𝑛2
× ∑(𝛾𝑖𝑦𝑖 − ∑1
2𝛼𝑗
𝑛
𝑗=1
(𝛾𝑖 + 𝛾𝑗)𝑘𝐴𝑆𝐸(𝑥, 𝑦))
𝑛
𝑖=1
2
(20)
This is possible because we can multiply γi2 = 1 to the both left and
19
right terms. Then, we can classify the problem into three cases,
where γi = 𝛾𝑗 = 1, 𝛾𝑖 = 𝛾𝑗 = −1, and γi ≠ 𝛾𝑗 . If both advantages are
valued 1, then the loss function attempts to minimize ∑(𝑦𝑖 − 𝑓(𝑥𝑖))2,
which is the original supervised learning problem. If both advantages
are valued -1, the loss function changes to ∑(−𝑦𝑖 + 𝑓(𝑥𝑖))2, which is
also a supervised learning problem. However, if the advantages are
valued opposite, the value γi + 𝛾𝑗 = 0 means that it minimizes
∑(𝑦𝑖 − 0)2 , which learns nothing. In conclusion, data comprises of
ascending advantage does not affect the learning of descending
advantage, and vice versa.
3.4. Exploration Strategy of Data Collection
Data collection to train GPR is implemented iteratively to reduce the
uncertainty level of the system dynamics relation. First, we start with
a random exploration value that is sampled from uniform distribution.
After a few iterations, we introduce an uncertainty level of the GPR
prediction and give more bias to explore the region that has a higher
uncertainty level. Detailed collection steps are enumerated as
following:
Step 1) For the first k iterations, inputs are determined
randomly, satisfying the Assumptions 1-3. Train GPR of the
translation dynamics and sensor relation.
Step 2) For the next l iterations, inputs are determined to
minimize the uncertainty level of the prediction of the next state.
20
First, find −η ≤ ϵ ≤ η that has a maximum uncertainty level of Ht.
Then, implement ut = 𝑢𝑡−1 + 𝛼𝜖 + (1 − 𝛼)𝑤 where w is sampled from
uniform distribution. Update GPR every iteration, and increase value
α.
21
Chapter 4. State Estimation
4.1. State Estimator Using Extended Kalman Filtering
One of the exciting features of the hidden Markov model is filtering,
which best estimates the current state given all the observations from
the initial time to present. In normal hidden Markov model, the exact
inference for the filtering is given as:
𝑃(𝑋𝑡+1 |𝐸1:𝑡+1) = 𝛼𝑃(𝐸𝑡+1 |𝑋𝑡+1)∑𝑃(𝑋𝑡+1|𝑥𝑡)𝑃(𝑥𝑡|𝐸1:𝑡) (21)
where α is a normalization coefficient. As we can see from the above
Equation, the filtering process requires two steps: state prediction
and measurement update. As transition model and sensor model is
given in GPR form, we can use EKF as a filtering strategy. What all
we need is to calculate the first derivative of the trained GPR
functions. Here, we assume that there are no external disturbances
or noises in model Hs and Ht in the HBN. First, we construct a
state-space model for the hysteretic system of Equation 16 with new
state variables:
[
𝑥1[𝑘 + 1]𝑥2[𝑘 + 1]𝑥3[𝑘 + 1]𝑥4[𝑘 + 1]
] =
[
𝑢[𝑘]
𝐻𝐷(𝑥1[𝑘], 𝑥2[𝑘], 𝑥3[𝑘], 𝑢[𝑘]) 𝑥2[𝑘]
𝐻𝑆(𝑥2[𝑘], 𝑥3[𝑘], 𝑥4[𝑘]) ] + [
0𝑤[𝑘]
00
]
𝑦[𝑘] = 𝑥4[𝑘] + 𝑣[𝑘]
(22)
22
Figure 3. (a) By definition of hysteresis, all inputs should be related
with the output. (b) By our proposition, hysteresis relation can be
simplified having three inputs.
23
Figure 4. Block diagram representation of Equation 22. New state
variables are introduced to make latent variables visible.
24
Graphical representation of Equation 22 is shown in Figure 3 and 4.
To express the time delay input (corresponding to uk−1 and uk−2 of
Equation 16), we introduced an artificial state x1 in Equation 22 as a
time delay of the current input x1[𝑘 + 1] = 𝑢[𝑘] . State x2 exactly
expresses xk in Equation 16, and x3 is a single time delay of x2 to
express two preceding states (corresponding to xk−2 of Equation 16).
The last state x4 is the output variable which refers to x2 and x3,
and x4 which corresponds to xk, 𝑥𝑘−1, and yk−1 in Equation 16,
respectively. By our previous discussion, our interest is y[k] = x4[𝑘],
which is the sensory output.
The most crucial step in this algorithm is a linearization step to
perform EKF. Implementing derivatives of GPR result to each
corresponding states and linearization, we get the following linearized
system equation in a matrix form:
[
𝑥1[𝑘 + 1]𝑥2[𝑘 + 1]𝑥3[𝑘 + 1]𝑥4[𝑘 + 1]
] = [
0 0 0 0𝐴21 𝐴22 𝐴23 00 1 0 00 𝐴42 𝐴43 𝐴44
] [
𝑥1[𝑘]𝑥2[𝑘]𝑥3[𝑘]𝑥4[𝑘]
] + [
1𝐵2
00
]𝑢[𝑘]
+ [
0𝑤2
00
]
𝑦[𝑘] = [0 0 0 1] [
𝑥1[𝑘]𝑥2[𝑘]𝑥3[𝑘]𝑥4[𝑘]
] + 𝑣
(23)
where A∗∗ and B2 are the first derivatives of Hs and Ht by their
25
corresponding states. Here, w2 and v are the model uncertainty
induced from GPR modeling, and they are originated from the
variance level of the GPR output. This variance value contains the
modeling error of Ht and Hs. Now, we can infer that even if there
could exist a modeling error from our proposition, GPR variance
values show us how the model is inaccurate in a certain region so
then EKF can compensate the error by its best filtering.
The last step is to implement EKF to the proposed matrix equation.
Since there may exist faulty due to the singularity, we should have
some assumptions on the EKF.
Assumption 4. Given sub-optimal state estimation, the following
conditions are made:
𝑟𝐼 ≤ 𝑅𝑘 , 𝑞𝐼 ≤ 𝑄𝑘
‖𝐹𝑘‖ ≤ 𝑓
‖𝐾𝑘‖ ≤ 𝑘
p1𝐼 ≤ 𝑃𝑘\𝑘 ≤ 𝑝2𝐼
q1𝐼 ≤ 𝑃𝑘\𝑘−1 ≤ 𝑞2𝐼
(24)
for positive constants.
If Fk is non-singular for all k ≥ 0 and Assumption 4 holds, then EKF
error is bounded by some function. All the bounding variables are
pre-determined by screening the behavior of the system. To avoid
generating singular matrix, some additional calculation like singular
value decomposition (SVD) are used to overcome this issue. Also,
we put small artificial noise to A∗∗ and B∗ to avoid its values to be
26
zero.
4.2. Observability of Hysteresis Bayesian Network
One of the prerequisites of implementing EKF to the system is to
verify the observability of the linearized system. Here is the
definition of the uniform complete observability for the linearized
system [39]:
Definition 1. The matrix pair (A,C) is uniformly completely
observable if there exist positive constants ρ1 and ρ2 , positive
integer h such that for all k ≥ h, and bounded symmetric positive
definite matrix R, the following inequality holds:
(25)
here, Φ(s, k) = A(s)A(s + 1)…A(k − 1) is the state transition matrix of
the system where the system is defined by x(k) = A(k)x(k − 1) and
y(k) = C(k)x(k). Our proposed system has a time-varying matrix A.
Now, we prove that the HBN system is uniformly completely
observable. By definition 1, if there exist positive constants ρ1 and
ρ2, and positive integer h satisfying the constraints, the system is
uniformly completely observable. Let positive integer h = 3. Then,
the Equation 25 can be written as:
27
(26)
where I4 is 4x4 identity matrix, v, A and C are defined as:
(27)
For each s = k, k − 1, k − 2, k − 3, the calculation of Equation 25 can be
generalized by following forms, knowing that the result matrix is
symmetric:
(28)
Where A∗, 𝐵∗, and C∗ are the generalized time-varying terms. Now,
we show Ψ∗(𝑘) = Ψ(𝑘) + Ψ(𝑘 − 1) + Ψ(𝑘 − 2) + Ψ(𝑘 − 3) is positive-
28
definite matrix. Let Di(𝑘) be the determinant of ith upper-left sub-
matrices of Ψ∗(𝑘). The ith upper-left sub-matrix contains rows and
columns of 1st to ith components of the original matrix. Each values
are calculated as following:
(29)
Note that the structure of D3 and D4 are the same. A matrix is
positive-definite if and only if determinants associated with all
upper-left sub-matrices are positive [32]. As a result, matrix Ψ∗(𝑘)
is positive-definite matrix. Finally, we can choose ρ1 = inf(𝜆) 𝑣−1 and
ρ2 = sup(𝜆) 𝑣−1 where λ is the set of eigenvalue of Ψ∗(𝑘). As a result,
the linearized HBN system is uniformly completely observable.
4.3. Robust Redesign Using Bayesian Linear Regression
We now consider the case that there exists external disturbances or
noises in the model. In this section, we do some modification to the
linearized matrix equation (the HBN matrix form) to be robust against
disturbances and noises. In EKF, disturbances and noise levels are
important to confirm their reliability, and all disturbances and noises
are assumed to follow the Gaussian distribution with zero mean and
29
some covariance level. The easiest and fastest way to solve the noise
issue is to inspect the background noise level and add to the EKF
variances. However, there are two problems. First, capturing
background noise level do not work when the variance level varies
within the magnitude of states or inputs. Second, GPR has a limit to
capture the noise because GPR assumes that the system has
independent identically distributed Gaussian noise with constant
variance (noise is captured by prior noisy observations, where given
cov(x, y) = Σ(x, y) + σn2𝛿𝑥𝑦). And what GPR provide users as a variance
level of the output is the model uncertainty level (epistemic
uncertainty), not the statistical uncertainty (aleatoric uncertainty).
As a result, we use Bayesian linear regression (BLR) to overcome
this issue [40]. If we use the GPR output and variance value as a
prior data of the BLR, we can check the dependency of variance level
within other state variables. Data constructing BLR is determined by
using data that falls in the range of up to ±2.5% of the maximum value
of each state in the current state. For example, if the maximum value
of the state is 10 and the current state value is 3.4, the coverage is
[3.15,3.65].
Typically, in order to make both posterior and likelihood to be
multivariate Gaussian, we convert GPR information to inverse-
Wishart distribution parameters. Consider the matrix representation
of BLR like Y = XB + E , where both Y and E are nxm matrices,
where rows are different system equations, and columns are different
observations. Let the covariance matrix of E is given as Σ. Then, the
30
prior distribution of this system is given as:
(30)
Where �̅� is a vectorization of �̅�, which is a prior weights, ν and V0
are components of inverse-Wishart distribution (degrees of freedom
and scale matrix), and Sx is the empirical variance of the dataset.
Distribution IW and N expresses the inverse-Wishart distribution
and Normal distribution, respectively. Using GPR as a prior, V0
should be equal to νΣ̂, where Σ̂ is the variance level of the learning
uncertainty from GPR. Also, �̅� is a gradient of GPR likelihood, which
is previously written as A∗∗ and B2 in HBN matrix equation. Then,
the posterior distribution using given dataset D = {X, Y} can be
written as:
(31)
where n is the number of a dataset. We determine the new system
matrix using the mean and covariance value of the posterior
distribution. Then, the resulting linearized system contains more
useful information: uncertainty from learning incompleteness
(epistemic uncertainty), and disturbance, noise issues from the real
31
world (aleatoric uncertainty). This kind of approach had been done
by some model-based reinforcement learning researchers,
attempting to control robots by demonstrations [41-43].
32
Chapter 5. Application
5.1. Numerical Simulations
The proposed hysteresis modeling method was validated using
simulation. Two artificial hysteretic systems mimicking various
hysteresis model were constructed like shown as following:
(32)
(33)
where Equation 32 is the form of Maxwell-Slip model-like system,
and Equation 33 is the general Prandtl-Ishlinskii model. Term η is a
noise level, which is a random variable following a uniform
distribution between [-0.5,0.5].
33
Figure 5. Artificial hysteresis of Equation 32
Figure 6. Artificial hysteresis of Equation 33.
34
Figure 5 and Figure 6 show the realization of hysteresis system
Equation 32 and 33 where the input of the system was in the range
uk ∈ [0,1]. First, the generated train input signal was put into the
systems and training dataset of the transition model and sensor model
were gathered. The same input signal was used 10 times to get
various trajectories to see the effect of disturbances and noises, so
each trajectory was slightly different. Second, hyperparameter
optimization for GPR was processed to both transition and sensor
models. Kernel function was defined in Equation 19, acquisition
function was Expected-improvement-plus, Quasi-newton convex
optimization is used, and the total number of iteration to find noise
level was 10. Then, another test input signal was prepared by
randomly generating a signal between [0,1] while maintaining
discrete continuity. This testing signals were not trained, but used
for testing the performance only. The performance criteria were
comparing root-mean-squared error (RMSE) and normalized root-
mean-squared error (nRMSE) between the estimated signal and the
reference signal. Three different methods were compared: using only
EKF, using EKF with BLR, and using RNN for comparison study. For
the first example, we inspected the influence of disturbance and noise
level by changing the amplitude of noise level. All RMSE and nRMSE
values were calculated by averaging results of three each attempts.
5.2. Modeling PneuNet-Type Actuator for Proprioceptive
35
Sensing
A simple integrated soft actuator and sensor system composed of
pneumatic bending actuator with embedded microfluidic soft sensor
was manufactured and HBN was implemented to validate the
performance [44]. To sense the bending by measuring the strain, a
multi-material strain sensor was placed on the top surface of the
actuator for amplified signal. The bottom surface is composed of
Dragon Skin 30 which is stiffer than the top surface. The stiffness
difference between the top and the bottom surface produced bending
motion when the air input was given (Figure 7).
36
Figure 7. A combination of pneumatic bending actuator and
microfluidic sensor system.
37
Random input air pressure was applied to the actuator system while
maintaining discrete continuity. Then, the location of the end-
effector of the actuator was recorded by Kinect. The output voltage
of the soft sensor was saved simultaneously. Data tuple {Input, state,
output} were collected for five minutes, and transition, sensor GPR
model was trained. Finally, the HBN localization algorithm was
applied to evaluate the performance, RMSE and nRMSE were
calculated to evaluate the result.
38
Figure 8. State estimation of artificial hysteresis system Equation 32.
(a) with RNN (comparison study), (b) EKF, (c) EKF+BLR.
Figure 9. State estimation of artificial hysteresis system Equation 33.
(a) with RNN (comparison study), (b) EKF, (c) EKF+BLR.
39
5.3. Numerical Simulations Result
The RMSE and nRMSE values for each simulation cases are shown
in Table 1. Both values were improved when our proposed method is
used, primarily when the BLR is used to make a linearized model.
Figure 8 and Figure 9 showed the reference and estimation signals
for two artificial hysteresis systems. The tracking quality has
improved when using EKF and EKF+BLR, especially for the second
example. RNN failed to track references due to high noise level. In
order to inspect the influence of noise level, the nRMSE value of
different noise gain was recorded, and the results are shown in Table
2 The result showed that EKF+BLR error was still bounded when
the noise gain reached 10, but the signal diverged when RNN was
used (written in NaN). When the noise gain was 0, the nRMSE value
of EKF+BLR was larger than the case of when BLR was not used.
This is a reasonable result because zero noise gain means that the
transition and sensor model is deterministic, which implies that BLR
cannot be effectively used.
40
Figure 10. Online state estimation of soft actuator systems. (a) with
RNN (comparison study), (b) EKF, (c) EKF+BLR.
41
5.4. Actuator Modeling Results
The online state estimation of a real soft actuator-sensor integrated
system was implemented, and the result is shown in Figure 10 The
RMSE and nRMSE values of the result is shown in Table 3 The
performance was significantly improved when the HBN model is used,
primarily when BLR was implemented to the system. Even though the
soft sensor's signal-to-noise ratio was significantly high, our model
captured the noise level efficiently and reflected when processing the
EKF.
42
Chapter 6. Discussion
6.1. Robust Design and Influence of Noise Level
When designing an EKF, it is essential to figure it out the proper
covariance matrices of disturbance and noise to prevent the faulty.
This statement can be validated in Table 2 where different values of
noise gain were used as a simulation system. Using EKF+BLR, the
nRMSE values were significantly improved because BLR used real
simulated data and reflected their noise level directly, where using
EKF only was not able to consider the noise level itself. After all, the
GPR variance value shows only the modeling uncertainty level, not
the noise level directly. GPR is robust against noisy observations and
attempts to learn the noise level itself by additional hyperparameter.
As a result, the GPR covariance level does not give users how the
noise influences the system, and the noise information is ignored by
the noise hyperparameter.
43
Figure 11. Reference state and its disturbance variance level of the
system using (a) EKF only and (b) EKF + BLR.
44
Figure 11 shows the effect of BLR analyzing noise level efficiently.
Relying only on the GPR variance level does not give users how much
the noise level is (conversely, the low variance level proves that GPR
learning is very well done). In contrast, BLR result gives users how
severe the noise level is, reflecting that the noise and disturbance
level of the system are positively correlated with input and state level.
The correlation coefficient between state and variance level was
0.0477 and 0.6295 for EKF and EKF+BLR, respectively. However,
using BLR consumes additional calculation time so the users must
heuristically judge whether they should use EKF and BLR together
or not, considering the nRMSE of both cases and their noise level of
the system. Furthermore, use of Bayesian neural network or
bootstrap ensemble method could be the alternative of BLR.
6.2. Issues in Real-Time Implementation
It is important to use the state estimator in on-line to do some
complicated work such as feedback control, and some issues
appeared to implement the proposed method on-line. As we assumed
to model the system in the discrete-time domain, the sampling
frequency of the operating system is fundamental. So the users must
set the sampling frequency of the system equal for both cases:
gathering training dataset and real-time state estimation. Due to the
delay of microcontroller and computing time of the software, the
sampling frequency of the system is smaller in the testing case,
where EKF and BLR has a matrix inversion computing. As a result,
45
we set an artificial time delay when gathering the training dataset,
and it worked well to match the sampling frequency of both training
and testing cases.
6.3. Fundamental Limitations of Learning and EKF
The proposed method showed powerful results where simulation
results showed a dramatic improvement in estimation error. Also,
BLR helped the algorithm work more robustly by providing
disturbance and noise level directly, rather than the way of GPR
results. However, there are some fundamental limitations in our
approach, which is originated from the `learning' perspective. The
reason why control theory researchers are hesitant to use deep
learning is that it cannot provide certainty of a stability issue. The
stability issues of the proposed method can be eased when
Assumption 3 provided, but as transition and sensor model is ̀ trained'
by supervised learning method (GPR), we cannot assure that training
is perfectly done in every operating region. This underlying problem
is non-solvable, where many deep learning researchers are still
trying to provide completeness criteria in machine learning such as
using importance sampling [45] or data-driven optimal control
theory [38]. For example, interesting work has been done by
Beckers et al., where they constructed a stabilizing feedback control
strategy with a GPR based feed-forward model of Euler-Lagrange
systems [46]. State-of-the-art soft robot modeling and controlling
papers have this issue in common, for example, modeling dynamics
46
using RNN [28,29]. Another concern in this paper is the use of EKF,
where few properties regarding convergence and stability test of
EKF have been done. Some studies have done some stability
problems given some assumptions [47,48], but the result may vary
depending on the properties of the system. To guarantee the
minimum criteria for the stability, we introduced Assumption 3.
Nevertheless, EKF is a popular localization algorithm that is widely
used in simultaneous localization and mapping (SLAM) applications.
Furthermore, variations of Kalman filtering can also be applied to this
system, for example, unscented Kalman filtering (UKF) which does
not explicitly derive the linearized system but find the mean and
covariance value using unscented transformation [49].
47
Chapter 7. Conclusion
In this paper, we introduced a novel method of localization in soft
robot systems using HBN. We proposed that distribution of the
hysteresis output is dependent on its current and previous inputs, and
previous output. This proposition was expanded, and a new Bayesian
network was established. Simulation and real-world application
results supported the reliability of the proposed algorithm, and noise
level dependency was inspected. Since many of existing soft robots
are slowly operating systems, our method could work well in terms
of stability and learning feasibility.
Future work will focus on designing a control system of soft robots
using HBN. Control policies will be added to HBN, and its structure
will be slightly changed. By introducing stochastic control theory in
the sense of reinforcement learning (RL), we can get optimal control
strategies while guaranteeing that the state of the robot system is
sub-optimal by EKF. Since there exists a model-based RL methods
such as PILCO, it will follow a similar way to construct the control
policy. However, since Gaussian Process dynamics of stochastic
states and inputs may lead non-Gaussian state prediction, advanced
techniques such as moment matching is prerequisite. Finally, to be
ready for mass production, a transfer learning based calibration
method will be useful for learning mappings [50,51].
48
Bibliography
[1] H. In, B. B. Kang, M. Sin, and K.-J. Cho, “Exo-glove: A wearable
robot for the hand with a soft tendon routing system,” IEEE Robotics
& Automation Magazine, vol. 22, no. 1, pp. 97–105, 2015.
[2] P. Polygerinos, Z. Wang, K. C. Galloway, R. J. Wood, and C. J.
Walsh, “Soft robotic glove for combined assistance and at-home
rehabilitation,” Robotics and Autonomous Systems, vol. 73, pp. 135–
143, 2015.
[3] M. Manti, V. Cacucciolo, and M. Cianchetti, “Stiffening in soft
robotics: A review of the state of the art,” IEEE Robotics &
Automation Magazine, vol. 23, no. 3, pp. 93–106, 2016.
[4] J. W. Booth, D. Shah, J. C. Case, E. L. White, M. C. Yuen, O.
CyrChoiniere, and R. Kramer-Bottiglio, “Omniskins: Robotic skins
that turn inanimate objects into multifunctional robots,” Science
Robotics, vol. 3, no. 22, p. eaat1853, 2018.
[5] S. Kim, C. Laschi, and B. Trimmer, “Soft robotics: a bioinspired
evolution in robotics,” Trends in biotechnology, vol. 31, no. 5, pp.
287– 294, 2013.
[6] N. N. Goldberg, X. Huang, C. Majidi, A. Novelia, O. M. O’Reilly, D.
A. Paley, and W. L. Scott, “On planar discrete elastic rod models for
the locomotion of soft robots,” Soft robotics, 2019.
[7] D. Ortiz, N. Gravish, and M. T. Tolley, “Soft robot actuation
strategies for locomotion in granular substrates,” IEEE Robotics and
Automation Letters, vol. 4, no. 3, pp. 2630–2636, 2019.
49
[8] D. Kim, J. I. Kim, and Y.-L. Park, “A simple tripod mobile robot
using soft membrane vibration actuators,” IEEE Robotics and
Automation Letters, 2019.
[9] R. K. Katzschmann, M. Thieffry, O. Goury, A. Kruszewski, T.-M.
Guerra, C. Duriez, and D. Rus, “Dynamically closed-loop controlled
soft robotic arm using a reduced order finite element model with state
observer,” in 2019 2nd IEEE International Conference on Soft
Robotics (RoboSoft). IEEE, 2019, pp. 717–724.
[10] B. Yu, J. d. G. Fernandez, and T. Tan, “Probabilistic kinematic
model ́of a robotic catheter for 3d position control,” Soft robotics,
vol. 6, no. 2, pp. 184–194, 2019.
[11] H.-S. Shin, J. Ryu, C. Majidi, and Y.-L. Park, “Enhanced
performance of microfluidic soft pressure sensors with embedded
solid microspheres,” Journal of Micromechanics and
Microengineering, vol. 26, no. 2, p. 025011, 2016.
[12] V. Hassani, T. Tjahjowidodo, and T. N. Do, “A survey on
hysteresis modeling, identification and control,” Mechanical systems
and signal processing, vol. 49, no. 1-2, pp. 209–233, 2014.
[13] A. Bilski and M. Twardy, “Hysteresis modeling using a preisach
operator,” International Journal of Electronics and
Telecommunications, vol. 56, no. 4, pp. 473–478, 2010.
[14] M. Rakotondrabe, “Classical prandtl-ishlinskii modeling and
inverse multiplicative structure to compensate hysteresis in
piezoactuators,” in 2012 American Control Conference (ACC). IEEE,
2012, pp. 1646– 1651.
50
[15] M. Brokate and J. Sprekels, “Hysteresis operators,” in
Hysteresis and phase transitions. Springer, 1996, pp. 22–121.
[16] D. Bruder, B. Gillespie, C. D. Remy, and R. Vasudevan, “Modeling
and control of soft robots using the koopman operator and model
predictive control,” arXiv preprint arXiv:1902.02827, 2019.
[17] Y. Liu, D. Desong, N. Qi, and J. Zhao, “A distributed parameter
maxwell-slip model for the hysteresis in piezoelectric actuators,”
IEEE Transactions on Industrial Electronics, 2018.
[18] R. Changhai, S. Lining, R. Weibin, and C. Liguo, “Adaptive
inverse control for piezoelectric actuator with dominant hysteresis,”
in Proceedings of the 2004 IEEE International Conference on Control
Applications, 2004., vol. 2. IEEE, 2004, pp. 973–976.
[19] R. B. Gorbet, K. A. Morris, and D. W. Wang, “Passivity-based
stability and control of hysteresis in smart actuators,” IEEE
Transactions on control systems technology, vol. 9, no. 1, pp. 5–16,
2001.
[20] M. Al Janaideh, S. Rakheja, and C.-Y. Su, “An analytical
generalized prandtl–ishlinskii model inversion for hysteresis
compensation in micropositioning control,” IEEE/ASME Transactions
on mechatronics, vol. 16, no. 4, pp. 734–744, 2010.
[21] M. S. Sofla, S. M. Rezaei, M. Zareinejad, and M. Saadat,
“Hysteresisobserver based robust tracking control of piezoelectric
actuators,” in Proceedings of the 2010 American Control Conference.
IEEE, 2010, pp. 4187–4192.
[22] W.-F. Xie, J. Fu, H. Yao, and C.-Y. Su, “Observer based control
51
of piezoelectric actuators with classical duhem modeled hysteresis,”
in 2009 American Control Conference. IEEE, 2009, pp. 4221–4226.
[23] J. Paduart, L. Lauwers, J. Swevers, K. Smolders, J. Schoukens,
and R. Pintelon, “Identification of nonlinear systems using polynomial
nonlinear state space models,” Automatica, vol. 46, no. 4, pp. 647–
656, 2010.
[24] C. Ru, L. Chen, B. Shao, W. Rong, and L. Sun, “A hysteresis
compensation method of piezoelectric actuator: Model, identification
and control,” Control Engineering Practice, vol. 17, no. 9, pp. 1107–
1114, 2009.
[25] X. Chen, Y. Feng, and C.-Y. Su, “Adaptive control for
continuous-time systems with actuator and sensor hysteresis,”
Automatica, vol. 64, pp. 196–207, 2016.
[26] N. Vafamand, M. H. Asemani, A. Khayatiyan, M. H. Khooban,
and T. Dragicevi ˇ c, “Ts fuzzy model-based controller design for a
class of ́ nonlinear systems including nonsmooth functions,” IEEE
Transactions on Systems, Man, and Cybernetics: Systems, no. 99, pp.
1–12, 2017.
[27] D. Kim and Y.-L. Park, “Contact localization and force
estimation of soft tactile sensors using artificial intelligence,” in 2018
IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS). IEEE, 2018, pp. 7480–7485.
[28] T. G. Thuruthel, B. Shih, C. Laschi, and M. T. Tolley, “Soft robot
perception using embedded soft sensors and recurrent neural
networks,” Science Robotics, vol. 4, no. 26, p. eaav1488, 2019.
52
[29] T. G. Thuruthel, E. Falotico, F. Renda, and C. Laschi, “Model-
based reinforcement learning for closed-loop dynamic control of soft
robotic manipulators,” IEEE Transactions on Robotics, vol. 35, no. 1,
pp. 124– 134, 2018.
[30] M. T. Gillespie, C. M. Best, E. C. Townsend, D. Wingate, and M.
D. Killpack, “Learning nonlinear dynamic models of soft robots for
model predictive control with neural networks,” in 2018 IEEE
International Conference on Soft Robotics (RoboSoft). IEEE, 2018,
pp. 39–45.
[31] S. J. Russell and P. Norvig, Artificial intelligence: a modern
approach. Malaysia; Pearson Education Limited,, 2016.
[32] H. K. Khalil and J. W. Grizzle, Nonlinear systems. Prentice hall
Upper Saddle River, NJ, 2002, vol. 3.
[33] C. Wang and R. M. Neal, “Gaussian process regression with
heteroscedastic or non-gaussian residuals,” arXiv preprint
arXiv:1212.6246, 2012.
[34] M. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and
data-efficient approach to policy search,” in Proceedings of the 28th
International Conference on machine learning (ICML-11), 2011, pp.
465–472.
[35] S. Choi, M. Jadaliha, J. Choi, and S. Oh, “Distributed gaussian
process regression under localization uncertainty,” Journal of
Dynamic Systems, Measurement, and Control, vol. 137, no. 3, p.
031007, 2015.
[36] K. Lee, S. Choi, and S. Oh, “Inverse reinforcement learning with
53
leveraged gaussian processes,” in 2016 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS). IEEE, 2016,
pp. 3907–3912.
[37] T. Horii, Y. Nagai, L. Natale, F. Giovannini, G. Metta, and M.
Asada, “Compensation for tactile hysteresis using gaussian process
with sensory markov property,” in 2014 IEEE-RAS International
Conference on Humanoid Robots. IEEE, 2014, pp. 993–998.
[38] Y. Jiang and Z.-P. Jiang, Robust adaptive dynamic programming.
John Wiley & Sons, 2017.
[39] Q. Zhang, “On stability of the kalman filter for discrete time
output error systems,” Systems & Control Letters, vol. 107, pp. 84–
91, 2017.
[40] P. D. Hoff, A first course in Bayesian statistical methods.
Springer, 2009, vol. 580.
[41] J. Fu, S. Levine, and P. Abbeel, “One-shot learning of
manipulation skills with online dynamics adaptation and neural
network priors,” in 2016 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 4019–4026.
[42] S. Levine and P. Abbeel, “Learning neural network policies with
guided policy search under unknown dynamics,” in Advances in
Neural Information Processing Systems, 2014, pp. 1071–1079.
[43] W. H. Montgomery and S. Levine, “Guided policy search via
approximate mirror descent,” in Advances in Neural Information
Processing Systems, 2016, pp. 4008–4016.
[44] M. Park, Y. Ohm, D. Kim, and Y.-L. Park, “Multi-material soft
54
strain sensors with high gauge factors for proprioceptive sensing of
soft bending actuators,” in 2019 2nd IEEE International Conference
on Soft Robotics (RoboSoft). IEEE, 2019, pp. 384–390.
[45] S. Levine and V. Koltun, “Guided policy search,” in International
Conference on Machine Learning, 2013, pp. 1–9.
[46] T. Beckers, D. Kulic, and S. Hirche, “Stable gaussian process
based ́tracking control of euler–lagrange systems,” Automatica, vol.
103, pp. 390–397, 2019.
[47] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, “Stochastic
stability of the discrete-time extended kalman filter,” IEEE
Transactions on Automatic control, vol. 44, no. 4, pp. 714–728, 1999.
[48] K. Reif and R. Unbehauen, “The extended kalman filter as an
exponential observer for nonlinear systems,” IEEE Transactions on
Signal processing, vol. 47, no. 8, pp. 2324–2328, 1999.
[49] J. Ko and D. Fox, “Gp-bayesfilters: Bayesian filtering using
gaussian process prediction and observation models,” Autonomous
Robots, vol. 27, no. 1, pp. 75–90, 2009.
[50] M. P. Deisenroth, “Learning to control a low-cost manipulator
using data-efficient reinforcement learning,” Robotics: Science and
Systems, pp. 57–64, 2011.
[51] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE
Transactions on knowledge and data engineering, vol. 22, no. 10, pp.
1345– 1359, 2009.
55
Abstract
소프트 로봇의 이력현상과 스토캐스틱 반응 때문에 이를 모델링, 예측,
제어하는 것은 매우 어려운 과제이다. 베이지안 네트워크의 특징과 기존
의 딥러닝 방법을 보완하여, 확률론적 방법으로 이력 현상을 모델링하는
기법을 제시한다. 이력현상이 3개의 파라미터에 의존하는 가우스 과정으
로 모델링하여, 이력 베이지안 네트워크인 HBN을 구축한다. 이는 로봇
의 상태 변화 모델과 센서 모델을 효과적으로 모델링할 수 있다. 특히,
로봇의 스토캐스틱 현상을 반영하기 위해 가우스 과정 회귀 모델을 이용
하고, 상태 예측으로 확장 칼만 필터를 이용한다. 또한 노이즈에 대항하
기 위해 베이즈 선형 모델을 사후 확률로 사용하는 방법을 제시한다. 시
뮬레이션과 실제 로봇 시스템의 적용으로 제시한 방법이 유효함을 검증
한다. 이 방법을 통해 강인함을 유지함과 동시에 소프트 로봇을 모델링
할 수 있음을 알 수 있다.