Disclaimers-space.snu.ac.kr/bitstream/10371/166410/1/000000160957.pdf · 2020. 5. 18. · soft robotics, they are confronted with these limitations. Many studies have attempted to

저 시-비 리- 경 지 2.0 한민

는 아래 조건 르는 경 에 한하여 게

l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.

다 과 같 조건 라야 합니다:

l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.

l 저 터 허가를 면 러한 조건들 적 되지 않습니다.

저 에 른 리는 내 에 하여 향 지 않습니다.

것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.

Disclaimer

저 시. 하는 원저 를 시하여야 합니다.

비 리. 하는 저 물 리 목적 할 수 없습니다.

경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/legalcode

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/

Master’s Thesis of Engineering

Use of Probabilistic Machine

Learning for Modeling and State

Estimation of Soft Robot Systems

비모수 기계학습 방법을 이용한 소프트 로봇 시스

템의 모델링

February 2020

Graduate School of Engineering

Seoul National University

Mechanical Engineering Major

DongWook Kim

비모수 기계학습 방법을 이용한 소프트

로봇 시스템의 모델링

Modeling Soft Robot Systems Using

Nonparametric Machine Learning Techniques

Advisor Yong-Lae Park

Submitting a master’s thesis of Public

Administration

October 2019

Graduate School of Engineering

Seoul National University Mechanical Engineering Major

DongWook Kim

Confirming the master’s thesis written by

DongWook Kim

December 2019

Chair Lee, Kun Woo (Seal)

Vice Chair Park, Yong-Lae (Seal)

Examiner Ahn, Sung-Hoon (Seal)

i

Abstract Due to hysteresis and stochastic behavior of soft robots, it is

challenging to model, control, and estimate the real state of them.

Reflecting previous attempts on modeling hysteresis and borrowing

the intuitiveness of the Bayesian network, we propose a novel way

to model hysteresis from a Bayesian perspective. By assuming that

current output distribution of the hysteretic system follows a

Gaussian distribution of mean and covariance is comprised of only on

the previous input-output tuples and the current input, we construct

a hysteresis Bayesian network that can efficiently model transition

and sensory model and best estimate the states using extended

Kalman filtering. To reflect stochastic behavior, we use Gaussian

process regression to provide a probabilistic model for the function

approximator of transition and sensor model, also variances for

Kalman filtering. Furthermore, we use Bayesian linear regression

with Gaussian process regression prior to compensating for

statistical disturbances and noises. Simulation and real-world soft

robot application results show that our method works better than

state-of-the-art soft robot modeling techniques, especially for the

noisy environment. In conclusion, the proposed method can

effectively, rapidly construct the model and estimate the state while

maintaining robustness against disturbances and noises.

Keyword : Soft Robotics, Learning-based System Modeling,

Nonparametric Machine Learning, Gaussian Process Regression

Student Number : 2018-27197

ii

Table of Contents

Chapter 1. Introduction ............................................................ 1

Chapter 2. Mathematical Preliminaries .................................... 5

2.1. Bayesian Network and Hidden Markov Model

2.2. Gaussian Process Regression

2.3. Reproducing Kernel Hilbert Space and Kernel Functions

Chapter 3. Hysteresis Bayesian Network ............................... 9

3.1. Fundamental Assumptions and Properties of Hysteresis

3.2. Hysteretic System Modeling

3.3. Gaussian Process Regression as a Function Approximator

Chapter 4. State Estimation ................................................... 21

4.1. State Estimator Using Extended Kalman Filtering

4.2. Observability of Hysteresis Bayesian Network

4.3. Robust Redesign Using Bayesian Linear Regression

Chapter 5. Application ........................................................... 32

5.1. Numerical Examples

5.2. Modeling PneuNet-Type Actuator for Proprioceptive Sensing

Chapter 6. Discussion ............................................................ 42

iii

6.1. Robust Redesign and Noise Level

6.2. Issues in Real-Time Implementation

6.3. Fundamental Limitations of Learning and Filtering

Chapter 7. Conclusion ............................................................ 47

Bibliography ........................................................................... 48

Abstract in Korean ................................................................ 55

１

Chapter 1. Introduction

Despite the radical development of soft robotics, there are not

many scenes in our society where soft robots are playing a significant

role yet. Soft robot systems developed so far have been made for

wearable [1], [2], physical Human-Robot Interaction (p-HRI) [3],

[4], and disaster relief [5], and have focused people’s attention on

its unique locomotion and behavior [6]-[8]. However, common

problems with soft robots make researchers hesitant to use them.

First, because of the use of soft materials such as silicone rubber,

soft robot systems have an infinite number of state variables in

theory, so it is challenging to model the robotics system itself [9].

Second, even if the model is given, their system behavior is

stochastic so that users cannot plan the control in a deterministic

view [10]. Third, soft robots suffer in hysteresis due to their

viscoelastic behavior of the soft material [11]. Typically, the

hysteresis phenomenon is known to have a property that the current

output is proportional to all inputs in the past, and it is known to be

very difficult to analyze [12]. Therefore, at present, it is difficult to

model a system showing hysteresis properties having an explicit

solution. So although many people are interested in the research of

soft robotics, they are confronted with these limitations.

Many studies have attempted to identify the nature of hysteresis

in the past. Hysteresis modeling methods can be roughly classified

into three types: operator method, differential equation method, and

２

machine learning method. First, representative examples of operator

methods are the Preisach model using relay operators [13], the

Prandtl-Ishlinskii (PI) model [14] and the Kransnosel’skii-

Pokrovskii (KP) model [15] using play operators, the Koopman

operator [16], and the Maxwell-Slip model [17] based on backlash

and friction operator. The operator based methods had been the

interest of nonlinear control theory, so the Lyapunov based nonlinear

control design such as adaptive and robust control has been widely

developed [18]-[20]. However, it is difficult to make various

hysteresis curves because they all make the hysteresis curve with

the weighted summation of the basis operator. Besides, many of them

are assuming hysteresis is rate-independent, where hysteresis in

soft robot systems is typically not. Second, representative models

using differential equations are the Bouc-wen model [21], the

Duhem model [22], and the polynomial nonlinear state-space

(PNLSS) model [23], and they are also easy to apply nonlinear

control such as backstepping, T-S fuzzy control because they use

the model given by the state-space-like system equations [24]-

[26]. However, they share the same disadvantage with the operator

based model that the number of hyperparameters is minimal, so it

limits the type of the hysteresis curve. Finally, researchers started

to use machine learning, which is an Artificial Neural Network (ANN)

that is an optimizer for operator based model [27], Recurrent Neural

Network (RNN) which is easy to analyze time-dependent sequence

[28], Time-Delay Artificial Neural Network (TDANN) for the

３

streamline of RNN. Deep learning has the advantage of being able to

model whatever the theoretical complexity is, but since it is

eventually modeled as a black box, it does not provide us the meaning

of each network. Further, deep-network based methods using Adam

optimizers tend to show chaotic behavior when they are faced with a

new input that has never been included in a training region. Moreover,

hyperparameters of the deep network are determined with the brute

force (i.e., the best result after trying several combinations), so it is

not known why certain hyperparameter combinations are optimal.

Nevertheless, recent developments of soft robots are using deep

learning because it is more natural and more effective to create

models than the previous two methods [28]-[30].

Then, is there any way to be useful both in terms of control and

accuracy while preserving all of the inherent features of hysteresis?

Considering the most basic definition that the output of hysteresis is

determined by referring to all inputs from starting point to the

present, we can easily infer that the computation burden becomes

more significant when operating time gets longer. So we concluded

that appropriate assumptions are needed to analyze hysteresis. In the

meantime, we found that there exists an assumption to a similar

problem raised in the control and automation field, which is the

Markov assumption defined in the dynamic Bayesian network. The

Markov assumption says that the current state depends only on the

finite number of previous states. Furthermore, the first-order

Markov chain assumes that the current state only depends on the

４

previous states and not on the other states [31]. By mirroring the

concept of the Markov chain, we made a proposition that the output

distribution of the hysteretic system is only dependent at the present

input, immediately preceding input and the latest output. Also, under

this assumption, the hysteretic system can be similarly written into

terms of a hidden Markov model (HMM), which consists of a

transition model, sensor model, and initial state distribution. We

propose a Bayesian network that follows this property as a

Hysteresis Bayesian Network (HBN).

In this thesis, we introduce the HBN as a novel way to model the

hysteresis in a probabilistic view. First, we show that the output error

of the proposed HBN is bounded so that it could be modeled in a

probability distribution formula. Second, the bounded uncertainty is

treated as a residual term so that the probability distribution is

approximated by Gaussian process regression (GPR). Third, the

uncertainty of the dynamic and sensor model is compensated by using

extended Kalman filtering (EKF) for the sub-optimal state estimator

of the HBN. Fourth, we introduce Bayesian linear regression (BLR)

to strengthen the robustness of the algorithm. The performance of

the HBN is investigated by applying it to the artificial hysteretic

system and the actual soft robot system. We also examined whether

previous hysteresis analysis methods can be expressed through HBN.

５

Chapter 2. Mathematical Preliminaries

2.1. Bayesian Network and Hidden Markov Model

A Bayesian network is a directed acyclic graph that represents a

probability distributions of correlated random variables. Each node in

the graph is a random variable, and they have a conditional probability

distribution on its parent nodes. If the relations between nodes are

time-dependent, they are called dynamic model which consists of

state variables Xt , evidence variables Et , realization of evidence

variables et. As mentioned, the first-order Markov chain assumes

that the current state depends only on the previous state. Following

this rule, we can define HMM state transition map in a more simplified

way ( P(Xt|X0:t−1) = P(Xt|Xt−1) , where Xa:b = {Xa, Xa+1, … , Xb} ). Also,

following the sensor Markov assumption that the evidence Et

depends only on the current state, the sensor model (or observation

model) can also be simplified (P(Et|X0:t, E1:t−1) = P(Et|Xt)). Finally, a

full joint distribution of the HMM can be written as follows:

P(X0:t, E1:t) = P(X0)∏P(Xi|Xi−1)P(Ei|Xi)

t

i=1

(1)

By using exact inference methods (i.e., Viterbi algorithm) or

sampling methods (i.e., Gibbs sampling, Monte-Carlo approximation),

we can derive some marginal distributions of the Bayesian network.

Also, some special inferences exist such as filtering, smoothing, and

prediction.

６

2.2. Gaussian Process Regression

If we think about nonlinear regression, our goal is to know the

relationship of inputs X and outputs Y expressed in a nonlinear

mapping f: X → Y, when there exists a dataset D = (X, Y). In this case,

curve fitting is possible using polynomial basis functions in the sense

of maximum likelihood estimation (MLE) or maximum a posteriori

(MAP). While this approach is quite strong, GPR has higher

compatibility to obtain high-level features because GPR is a general

linear regression with an infinite number of basis functions. Rather

than finding an explicit form of the function, the GPR finds the

probability distribution of the nonlinear function f in nonparametric

scheme, which represents the relationship between input X and

output Y.

The multivariate Gaussian distribution is the core technique of

explaining GPR. Let f be the result value of the known function of the

training case, and let f∗ be the function value of the test case. And

their respective input values are X and X∗. Then the joint distribution

of these two cases can be written as follows:

[ff∗] ~N ([

μμ∗

] , [Σ Σ∗

Σ∗ Σ∗∗]) (2)

where μ and μ∗ are the means, and Σ, Σ∗, Σ∗∗ are the covariances. As

we know the train results f, the conditional distribution of f∗ given f

is written as:

f∗|f~N(μ∗ + Σ∗TΣ−1(f − μ), Σ∗∗ − Σ∗

TΣ−1Σ∗) (3)

Borrowing the terms of multivariate Gaussians, the GPR model is

７

constructed by the mean function m(x), which is the mean of the

desired Gaussian process (GP), not the distinct form of the function

itself, and the covariance function k(x, x′), which is an autocorrelation

function of a mean-zero Gaussian process. The relation between the

functions constructing GP and m(x), k(x, x′) can be written as:

f~GP(m(x), k(x, x′)) (4)

The most popular kernel function is squared exponential kernel,

which is expressed as:

k(x, x′) = σY2 exp (−

(x − x′)2

2λ2) (5)

where σY and λ are the hyperparameters to optimize. As a sequel to

multivariate Gaussian case, the posterior distribution given training

data D can be written as:

f|D~GP(mD(x), kD(x, x′))

mD(x) = m(x) + Σ(X, x)TΣ−1(f − m)

kD(x, x′) = k(x, x′) − Σ(X, x)TΣ−1Σ(X, x′)

(6)

where X is training dataset and x is the test input, mD, kD are mean

and kernel function when dataset D is given, and Σ(X,∙) is the kernel

covariance matrix given all training dataset X . Therefore, the

estimated value Y′ for the new input X′ is the value obtained by

substituting X′ for the mean function of the input, and the uncertainty

is the value assigned to the covariance function.

The optimization of hyperparameters are obtained by maximizing the

log likelihood of the training dataset, which is given as

８

log 𝑝(𝑦|𝑋) = −1

2yT(K + σn

2I)−1y −1

2log|𝐾 + 𝜎𝑛

2𝐼| + α (7)

where α = −𝑛

2log 2𝜋. This result is also available by observing that

the mean of the output y is zero, and the covariance is given K with

the noisy observation covariance candidate σn.

2.3. Reproducing Kernel Hilbert Space and Kernel Function

GPR can not only be driven by the posterior distribution of the GP

but also from the representer theorem that minimizes the following

cost function and find mapping f(∙) = ∑ αik(∙, xi)ni=1 :

J[f] =1

2σn2∑(yi − f(xi))

2n

i=1

+1

2‖f‖H

2

=: E(f|D) + R(f)

(8)

Here, D = {(x1, y1), … , (xn, yn)} is a training set, ‖∙‖H2 is a Hilbert norm,

σn2 is a covariance of the noise level, and k(∙,∙) is a kernel function.

Function E and R are the mean-squared-error of the supervised

learning, and the regularization term respectively. That is, GPR is a

generalized linear regression that comprises features that has the

same number of the training dataset. Larger dataset means the

regressor can represent a variety of features, but training time

increases together (by the calculation of matrix inverse) which is a

trade-off between accuracy and time cost.

９

Chapter 3. Hysteresis Bayesian Network

3.1. Fundamental Assumptions and Properties of Hysteresis

In this section, we construct a GPR model for the function y = f(xt−k:t),

where xt−k:t ≔ [𝑥𝑡−𝑘, 𝑥𝑡−𝑘+1, … , 𝑥𝑡]𝑇. First, definition of the hysteresis

modeling and fundamental assumptions need to be introduced.

Throughout the whole paper, we express the system (or function) in

a continuous space, discrete-time domain. If the system is defined

in a continuous-time domain, it could be converted to discrete-time

domain by sampling with no aliasing occurs. Let us think of a simple

example of a 1-dimensional hysteresis curve between input x ∈ ℝ

and output y ∈ ℝ, and xt denotes the value of x at discretized time t.

Hysteresis is a causal mapping that the output y is dependent on the

whole past inputs of x and the initial condition: yt = 𝑓(𝑥0:𝑡, 𝑦0). For

modeling simplicity and without loss of generality, we have the

following three assumptions:

Assumption 1. Hysteresis output y is dependent only on the past k +

1 inputs of x satisfying t ≥ k, i.e., there exists a function f: ℝk+1 → ℝ

such that yt = 𝑓(𝑥𝑡−𝑘:𝑡).

Assumption 2. A difference between two time-adjacent inputs is

bounded by some positive real-valued number η ∈ ℝ+, i.e.,

∆𝑥𝑡 ≔ ‖𝑥𝑡+1 − 𝑥𝑡‖ ≤ η (9)

Assumption 3. Hysteresis function yt = 𝑓(𝑥𝑡−𝑘:𝑡) is locally Lipschitz

１０

with respect to xt−k:t, i.e., for all xt−k:t𝐴 , xt−k:t

𝐵 ∈ ℝk+1:

‖𝑓(𝑥𝑡−𝑘:𝑡𝐴 ) − 𝑓(𝑥𝑡−𝑘:𝑡

𝐵 )‖ ≤ 𝐿‖𝑥𝑡−𝑘:𝑡𝐴 − 𝑥𝑡−𝑘:𝑡

𝐵 ‖ (10)

with Lipschitz constant L ∈ ℝ+.

Assumption 1 is a necessary condition for modeling simplicity that

the size of the input vector is constrained to k + 1. If not, its size

increases when the operation time gets longer, then it is hard to

model the static function f. Assumption 2 restricts the maximum

evolution of the input variable, which is also interpreted as a dynamic

bandwidth. Assumption 3 constraints the type of the hysteresis

function to locally Lipschitz (i.e., continuously differentiable) type

only.

１１

Figure 1. If two different solutions for different input sequence meet

in a single point, the next input-output tuple can be estimated within

a certain interval.

１２

3.2. Hysteretic System Modeling

We interpret these defined assumptions in a two-dimension space,

where two axes are x and y. Consider the red line (labeled A) and

blue line (labeled B) in Figure 1. Each line contains the history of

inputs, and its corresponding outputs which are the calculation result

of the hysteretic function yt = 𝑓(𝑥𝑡−𝑘:𝑡). Now, consider the situation

that each trajectory intersects at time t, (i.e., (xt𝐴, 𝑦𝑡

𝐴) = (𝑥𝑡𝐵, 𝑦𝑡

𝐵)).

Then, from Assumption 2, the next input xt+1𝐴 and xt+1

𝐵 must be

defined in the interval [xt𝐴 − 𝜂, 𝑥𝑡

𝐴 + 𝜂] . If the next inputs of two

trajectory A and B are the same again, even if their value at t were

the same, their output value at t + 1 may different. First, we define

the difference of input over k + 1 time interval:

∆𝒙𝒕 ≔ 𝑥𝑡−𝑘+1:𝑡+1 − xt−k:t ∈ ℝk+1 (11)

Now, we can easily show that difference between yt+1 and yt for any

trajectory is bounded using multi-dimensional mean value theorem:

‖𝑦𝑡+1 − 𝑦𝑡‖ = ‖𝑓(𝑥𝑡−𝑘+1:𝑡+1) − 𝑓(𝑥𝑡−𝑘:𝑡)‖

= ‖∇𝑓(𝑥𝑡−𝑘:𝑡 + 𝛼∆𝒙𝒕) ∙ ∆𝒙𝒕‖, ∃α ∈ [0,1]

≤ ‖∆𝒙𝒕‖‖∇f(xt−k:t + 𝛼∆𝒙𝒕)‖

= η√𝑘 + 1‖∇𝑓(𝑥𝑡−𝑘:𝑡 + 𝛼∆𝒙𝒕)‖

(12)

Substituting the result of Equation (12) to the difference between the

next output of A and B in Figure 1 can be obtained by using

Assumption 3:

１３

‖𝑦𝑡+1𝐴 − 𝑦𝑡+1

𝐵 ‖

= ‖𝑓(𝑥𝑡−𝑘+1:𝑡+1𝐴 ) − 𝑓(𝑥𝑡−𝑘+1:𝑡+1

𝐵 )‖

= ‖𝑓(𝑥𝑡−𝑘:𝑡𝐴 + ∆𝒙𝒕

𝑨) − 𝑓(𝑥𝑡−𝑘:𝑡𝐴 ) − 𝑓(𝒙𝒕−𝒌:𝒕

𝑩 + ∆𝒙𝒕𝑩)

+ 𝑓(𝑥𝑡−𝑘:𝑡𝐵 )‖

≤ η√𝑘 + 1(‖∇𝑓(𝑥𝑡−𝑘:𝑡𝐴 + 𝛼∆𝒙𝒕

𝑨)‖ + ‖∇𝑓(𝑥𝑡−𝑘:𝑡𝐵 + 𝛼∆𝒙𝒕

𝑩)‖)

≤ η√𝑘 + 1(L + M), ∃(α, β) ∈ [0,1]

(13)

where L,M are the Lipschitz constants. The result shows that if

values of xt, 𝑦𝑡 and xt+1 are identically defined for two different

solutions A and B of the hysteretic system yt = 𝑓(𝑥𝑡−𝑘:𝑡), their next

output’s distance ‖𝑦𝑡+1𝐴 − 𝑦𝑡+1

𝐵 ‖ is bounded with a function of

maximum state evolution η and time-dependency k. If k = 1 or k =

2, which means that there is no hysteresis (yt = 𝑓(𝑥𝑡) or weak time

dependency yt = 𝑓(𝑥𝑡 , 𝑥𝑡−1) ), fixing xt, 𝑦𝑡 , and xt+1 leads identical

solution, which is trivial. As a result, hysteretic system can be

expressed as a function of xt, 𝑦𝑡, and xt+1 with the residual term ϵt,

which is bounded by η√𝑘 + 1(𝐿 + 𝑀) . If the target function has a

smooth, continuous form, we can assume that the residual can be

treated as an uncertainty term of the GP, which is a covariance term.

If the target function do not shows Gaussian behavior, we can use a

kernel trick by defining a new mapping that transports the hysteresis

output y to the latent variable z, which its latent space shows a

Gaussian behavior. Gaussian process that uses the result of latent

space is called warped Gaussian process. In summary, the hysteresis

function can be simplified as:

１４

𝑦𝑡 = 𝑓(𝑥𝑡−𝑘:𝑡) = h(xt−1, yt−1, xt) + ϵt (14)

If the target function function has a smooth, continuous form, we can

assume that the residual can be treated as an uncertainty term of the

GP, which is a covariance term. However, if the target is non-smooth,

we can introduce a latent variable as a residual term, and the latent

variable ζ has some prior distribution p(ζ), and the function h can be

written as a marginal:

h(xt−1, yt−1, xt) = ∫𝑔(𝑥𝑡−1, 𝑥𝑡, 𝑦𝑡−1, 𝜁𝑡)𝑝(𝜁𝑡)𝑑𝜁𝑡 (15)

Wang and Neal proved that introducing a latent variable in the target

function can represent non-Gaussian, heteroscedastic residuals. If

we are modeling general nonlinear function f that can have any

strange property, we should model our residual that might not follow

Gaussian distribution, and ζ can realize it. Now, we propose the full

system consisting of a dynamic model and sensor model both having

hysteresis property.

Proposition 1. Consider our hysteretic system has a hysteretic

transition and sensory model. If Assumptions 1-3 hold, then the

hysteretic system is dependent on a finite number of variables as

shown below:

𝑥𝑘 = 𝐻𝐷[𝑥𝑘−1, 𝑥𝑘−2, 𝑢𝑘−1, 𝑢𝑘−2] + 𝑤𝑘−1

𝑦𝑘 = 𝐻𝑆[𝑥𝑘−1, 𝑦𝑘−1, 𝑥𝑘] + 𝑣𝑘

(16)

Where Ht and Hs are nonlinear probability distribution functions,

each representing the transition model and sensor model,

respectively, x is the system state, u is the system input, and y is

１５

the system output, and w, v are the bounded residual term induced

by past time-dependent properties.

Proof. If Assumptions holds and the relation between two variables

of input x and output y is hysteretic, tuple {xk+1, 𝑦𝑘+1} will only

depend on {xk, 𝑦𝑘}. As xk induces yk, we can conclude that yk+1 is a

function of yk, 𝑥𝑘+1, and xk. In other words, the current output of the

hysteretic system depends only on its preceding output, and two

recent inputs. Returning to the system Equation 16, first, the

transition model is a hysteretic function of xk−1 and uk−1. Then, xk

will depend only on xk−1(indicating both preceding output and recent

inputs), xk−2, 𝑢𝑘−1 and uk−2 (two recent inputs) by reflecting the

above representation. Second, the sensor model is a hysteretic

function of xk, so yk will depend on yk−1 (preceding output), xk, and

xk−1 (two recent inputs).

１６

Figure 2. Hysteresis Bayesian Network (HBN) for hysteretic

systems is under critical assumptions that limit the correlation

between nodes.

１７

Finally, we can construct the Bayesian network reflecting the relation

proposed on Proposition 1 We call this new Bayesian network as

`Hysteresis Bayesian Network (HBN),' which is shown in Figure 2.

3.3. Gaussian Process Regression as a Function Approximator

To represent a GPR for our target function yt = 𝑓(𝑥𝑡−𝑘:𝑡), we use our

result from the previous section, given in Equation (14). Construction

of the kernel function is specified into three steps: initial setting,

warping step, and advantaging step. Each step holds the prior

knowledge of what we know about the hysteresis phenomenon.

First, our initial setting is a squared-exponential kernel, which is

widely used in GPR. However, a dimension of the inputs of xt and yt

may not match, so it is more acceptable to use a normalized metric,

also known as the Mahalanobis distance. The kernel looks like:

𝑘(𝑥, 𝑦) =𝜎𝑁

2

2(−𝑒𝑥𝑝((𝑥 − 𝑦)𝑆−1(𝑥 − 𝑦)𝑇)) (17)

where matrix S is the covariance matrix for the training dataset.

Second, we warp the input space into the new feature space using

the neural-network like function. We consider a hyperbolic tangent

function as a warping function. With f(x) = ∑𝑏𝑖 tanh(𝑥 + 𝑎𝑖) + 𝑐𝑖 as a

warping function, the kernel looks like:


2

2(−𝑒𝑥𝑝((𝑓(𝑥) − 𝑓(𝑦))𝑆−1(𝑓(𝑥) − 𝑓(𝑦))𝑇)) (18)

Finally, we give an advantage term to similar input data. If we

generally think about the hysteresis, we conclude that the behavior

１８

of the function is different when the function input is increasing and

decreasing. That is the fundamental property of the hysteresis. So if

we give advantage to the covariance of the combination of same

gradients, and give penalty to the covariance of the combination of

different gradients, we can model the hysteresis more practically.

This is given in the below kernel function, which is the advantage

term.


2

2(1 + 𝑠𝑔𝑛(�̇�)𝑠𝑔𝑛(�̇�)𝑇)(−𝑒𝑥𝑝((𝑓(𝑥) − 𝑓(𝑦))𝑆−1(𝑓(𝑥) − 𝑓(𝑦))𝑇)) (19)

where �̇�, �̇� are the direction of the x, y. The effect of advantaging

term 1 + sgn(�̇�)𝑠𝑔𝑛(�̇�)𝑇 shows up in the RKHS [33-37]. Let sgn(�̇�) =

𝛾𝑖 and sgn(�̇�) = 𝛾𝑗 . Then, the target function f(x, y, �̇�, �̇�) =

∑ 𝛼𝑖𝑘(𝑥, 𝑦, �̇�, �̇�)𝑛𝑗=1 can be obtained by the hyperparameter optimization.

Then, the loss function JL[𝑓] is given as:

𝑘𝐽𝐿[𝑓] =1

2‖𝑓‖ℋ

2 +1

2𝜎𝑛2

× ∑(𝑦𝑖 − ∑1

2𝛼𝑗

𝑛

𝑗=1

(1 + 𝛾𝑖𝛾𝑗)𝑘𝐴𝑆𝐸(𝑥, 𝑦))

𝑛

𝑖=1

2

=1

2‖𝑓‖ℋ

2 +1

2𝜎𝑛2

× ∑(𝛾𝑖𝑦𝑖 − ∑1

2𝛼𝑗

𝑛

𝑗=1

(𝛾𝑖 + 𝛾𝑗)𝑘𝐴𝑆𝐸(𝑥, 𝑦))

𝑛

𝑖=1

2

(20)

This is possible because we can multiply γi2 = 1 to the both left and

１９

right terms. Then, we can classify the problem into three cases,

where γi = 𝛾𝑗 = 1, 𝛾𝑖 = 𝛾𝑗 = −1, and γi ≠ 𝛾𝑗 . If both advantages are

valued 1, then the loss function attempts to minimize ∑(𝑦𝑖 − 𝑓(𝑥𝑖))2,

which is the original supervised learning problem. If both advantages

are valued -1, the loss function changes to ∑(−𝑦𝑖 + 𝑓(𝑥𝑖))2, which is

also a supervised learning problem. However, if the advantages are

valued opposite, the value γi + 𝛾𝑗 = 0 means that it minimizes

∑(𝑦𝑖 − 0)2 , which learns nothing. In conclusion, data comprises of

ascending advantage does not affect the learning of descending

advantage, and vice versa.

3.4. Exploration Strategy of Data Collection

Data collection to train GPR is implemented iteratively to reduce the

uncertainty level of the system dynamics relation. First, we start with

a random exploration value that is sampled from uniform distribution.

After a few iterations, we introduce an uncertainty level of the GPR

prediction and give more bias to explore the region that has a higher

uncertainty level. Detailed collection steps are enumerated as

following:

Step 1) For the first k iterations, inputs are determined

randomly, satisfying the Assumptions 1-3. Train GPR of the

translation dynamics and sensor relation.

Step 2) For the next l iterations, inputs are determined to

minimize the uncertainty level of the prediction of the next state.

２０

First, find −η ≤ ϵ ≤ η that has a maximum uncertainty level of Ht.

Then, implement ut = 𝑢𝑡−1 + 𝛼𝜖 + (1 − 𝛼)𝑤 where w is sampled from

uniform distribution. Update GPR every iteration, and increase value

α.

２１

Chapter 4. State Estimation

4.1. State Estimator Using Extended Kalman Filtering

One of the exciting features of the hidden Markov model is filtering,

which best estimates the current state given all the observations from

the initial time to present. In normal hidden Markov model, the exact

inference for the filtering is given as:

𝑃(𝑋𝑡+1 |𝐸1:𝑡+1) = 𝛼𝑃(𝐸𝑡+1 |𝑋𝑡+1)∑𝑃(𝑋𝑡+1|𝑥𝑡)𝑃(𝑥𝑡|𝐸1:𝑡) (21)

where α is a normalization coefficient. As we can see from the above

Equation, the filtering process requires two steps: state prediction

and measurement update. As transition model and sensor model is

given in GPR form, we can use EKF as a filtering strategy. What all

we need is to calculate the first derivative of the trained GPR

functions. Here, we assume that there are no external disturbances

or noises in model Hs and Ht in the HBN. First, we construct a

state-space model for the hysteretic system of Equation 16 with new

state variables:

[

𝑥1[𝑘 + 1]𝑥2[𝑘 + 1]𝑥3[𝑘 + 1]𝑥4[𝑘 + 1]

] =

[

𝑢[𝑘]

𝐻𝐷(𝑥1[𝑘], 𝑥2[𝑘], 𝑥3[𝑘], 𝑢[𝑘]) 𝑥2[𝑘]

𝐻𝑆(𝑥2[𝑘], 𝑥3[𝑘], 𝑥4[𝑘]) ] + [

0𝑤[𝑘]

00

]

𝑦[𝑘] = 𝑥4[𝑘] + 𝑣[𝑘]

(22)

２２

Figure 3. (a) By definition of hysteresis, all inputs should be related

with the output. (b) By our proposition, hysteresis relation can be

simplified having three inputs.

２３

Figure 4. Block diagram representation of Equation 22. New state

variables are introduced to make latent variables visible.

２４

Graphical representation of Equation 22 is shown in Figure 3 and 4.

To express the time delay input (corresponding to uk−1 and uk−2 of

Equation 16), we introduced an artificial state x1 in Equation 22 as a

time delay of the current input x1[𝑘 + 1] = 𝑢[𝑘] . State x2 exactly

expresses xk in Equation 16, and x3 is a single time delay of x2 to

express two preceding states (corresponding to xk−2 of Equation 16).

The last state x4 is the output variable which refers to x2 and x3,

and x4 which corresponds to xk, 𝑥𝑘−1, and yk−1 in Equation 16,

respectively. By our previous discussion, our interest is y[k] = x4[𝑘],

which is the sensory output.

The most crucial step in this algorithm is a linearization step to

perform EKF. Implementing derivatives of GPR result to each

corresponding states and linearization, we get the following linearized

system equation in a matrix form:

[

𝑥1[𝑘 + 1]𝑥2[𝑘 + 1]𝑥3[𝑘 + 1]𝑥4[𝑘 + 1]

] = [

0 0 0 0𝐴21 𝐴22 𝐴23 00 1 0 00 𝐴42 𝐴43 𝐴44

] [

𝑥1[𝑘]𝑥2[𝑘]𝑥3[𝑘]𝑥4[𝑘]

] + [

1𝐵2

00

]𝑢[𝑘]

+ [

0𝑤2

00

]

𝑦[𝑘] = [0 0 0 1] [

𝑥1[𝑘]𝑥2[𝑘]𝑥3[𝑘]𝑥4[𝑘]

] + 𝑣

(23)

where A∗∗ and B2 are the first derivatives of Hs and Ht by their

２５

corresponding states. Here, w2 and v are the model uncertainty

induced from GPR modeling, and they are originated from the

variance level of the GPR output. This variance value contains the

modeling error of Ht and Hs. Now, we can infer that even if there

could exist a modeling error from our proposition, GPR variance

values show us how the model is inaccurate in a certain region so

then EKF can compensate the error by its best filtering.

The last step is to implement EKF to the proposed matrix equation.

Since there may exist faulty due to the singularity, we should have

some assumptions on the EKF.

Assumption 4. Given sub-optimal state estimation, the following

conditions are made:

𝑟𝐼 ≤ 𝑅𝑘 , 𝑞𝐼 ≤ 𝑄𝑘

‖𝐹𝑘‖ ≤ 𝑓

‖𝐾𝑘‖ ≤ 𝑘

p1𝐼 ≤ 𝑃𝑘\𝑘 ≤ 𝑝2𝐼

q1𝐼 ≤ 𝑃𝑘\𝑘−1 ≤ 𝑞2𝐼

(24)

for positive constants.

If Fk is non-singular for all k ≥ 0 and Assumption 4 holds, then EKF

error is bounded by some function. All the bounding variables are

pre-determined by screening the behavior of the system. To avoid

generating singular matrix, some additional calculation like singular

value decomposition (SVD) are used to overcome this issue. Also,

we put small artificial noise to A∗∗ and B∗ to avoid its values to be

２６

zero.

4.2. Observability of Hysteresis Bayesian Network

One of the prerequisites of implementing EKF to the system is to

verify the observability of the linearized system. Here is the

definition of the uniform complete observability for the linearized

system [39]:

Definition 1. The matrix pair (A,C) is uniformly completely

observable if there exist positive constants ρ1 and ρ2 , positive

integer h such that for all k ≥ h, and bounded symmetric positive

definite matrix R, the following inequality holds:

(25)

here, Φ(s, k) = A(s)A(s + 1)…A(k − 1) is the state transition matrix of

the system where the system is defined by x(k) = A(k)x(k − 1) and

y(k) = C(k)x(k). Our proposed system has a time-varying matrix A.

Now, we prove that the HBN system is uniformly completely

observable. By definition 1, if there exist positive constants ρ1 and

ρ2, and positive integer h satisfying the constraints, the system is

uniformly completely observable. Let positive integer h = 3. Then,

the Equation 25 can be written as:

２７

(26)

where I4 is 4x4 identity matrix, v, A and C are defined as:

(27)

For each s = k, k − 1, k − 2, k − 3, the calculation of Equation 25 can be

generalized by following forms, knowing that the result matrix is

symmetric:

(28)

Where A∗, 𝐵∗, and C∗ are the generalized time-varying terms. Now,

we show Ψ∗(𝑘) = Ψ(𝑘) + Ψ(𝑘 − 1) + Ψ(𝑘 − 2) + Ψ(𝑘 − 3) is positive-

２８

definite matrix. Let Di(𝑘) be the determinant of ith upper-left sub-

matrices of Ψ∗(𝑘). The ith upper-left sub-matrix contains rows and

columns of 1st to ith components of the original matrix. Each values

are calculated as following:

(29)

Note that the structure of D3 and D4 are the same. A matrix is

positive-definite if and only if determinants associated with all

upper-left sub-matrices are positive [32]. As a result, matrix Ψ∗(𝑘)

is positive-definite matrix. Finally, we can choose ρ1 = inf(𝜆) 𝑣−1 and

ρ2 = sup(𝜆) 𝑣−1 where λ is the set of eigenvalue of Ψ∗(𝑘). As a result,

the linearized HBN system is uniformly completely observable.

4.3. Robust Redesign Using Bayesian Linear Regression

We now consider the case that there exists external disturbances or

noises in the model. In this section, we do some modification to the

linearized matrix equation (the HBN matrix form) to be robust against

disturbances and noises. In EKF, disturbances and noise levels are

important to confirm their reliability, and all disturbances and noises

are assumed to follow the Gaussian distribution with zero mean and

２９

some covariance level. The easiest and fastest way to solve the noise

issue is to inspect the background noise level and add to the EKF

variances. However, there are two problems. First, capturing

background noise level do not work when the variance level varies

within the magnitude of states or inputs. Second, GPR has a limit to

capture the noise because GPR assumes that the system has

independent identically distributed Gaussian noise with constant

variance (noise is captured by prior noisy observations, where given

cov(x, y) = Σ(x, y) + σn2𝛿𝑥𝑦). And what GPR provide users as a variance

level of the output is the model uncertainty level (epistemic

uncertainty), not the statistical uncertainty (aleatoric uncertainty).

As a result, we use Bayesian linear regression (BLR) to overcome

this issue [40]. If we use the GPR output and variance value as a

prior data of the BLR, we can check the dependency of variance level

within other state variables. Data constructing BLR is determined by

using data that falls in the range of up to ±2.5% of the maximum value

of each state in the current state. For example, if the maximum value

of the state is 10 and the current state value is 3.4, the coverage is

[3.15,3.65].

Typically, in order to make both posterior and likelihood to be

multivariate Gaussian, we convert GPR information to inverse-

Wishart distribution parameters. Consider the matrix representation

of BLR like Y = XB + E , where both Y and E are nxm matrices,

where rows are different system equations, and columns are different

observations. Let the covariance matrix of E is given as Σ. Then, the

３０

prior distribution of this system is given as:

(30)

Where �̅� is a vectorization of �̅�, which is a prior weights, ν and V0

are components of inverse-Wishart distribution (degrees of freedom

and scale matrix), and Sx is the empirical variance of the dataset.

Distribution IW and N expresses the inverse-Wishart distribution

and Normal distribution, respectively. Using GPR as a prior, V0

should be equal to νΣ̂, where Σ̂ is the variance level of the learning

uncertainty from GPR. Also, �̅� is a gradient of GPR likelihood, which

is previously written as A∗∗ and B2 in HBN matrix equation. Then,

the posterior distribution using given dataset D = {X, Y} can be

written as:

(31)

where n is the number of a dataset. We determine the new system

matrix using the mean and covariance value of the posterior

distribution. Then, the resulting linearized system contains more

useful information: uncertainty from learning incompleteness

(epistemic uncertainty), and disturbance, noise issues from the real

３１

world (aleatoric uncertainty). This kind of approach had been done

by some model-based reinforcement learning researchers,

attempting to control robots by demonstrations [41-43].

３２

Chapter 5. Application

5.1. Numerical Simulations

The proposed hysteresis modeling method was validated using

simulation. Two artificial hysteretic systems mimicking various

hysteresis model were constructed like shown as following:

(32)

(33)

where Equation 32 is the form of Maxwell-Slip model-like system,

and Equation 33 is the general Prandtl-Ishlinskii model. Term η is a

noise level, which is a random variable following a uniform

distribution between [-0.5,0.5].

３３

Figure 5. Artificial hysteresis of Equation 32

Figure 6. Artificial hysteresis of Equation 33.

３４

Figure 5 and Figure 6 show the realization of hysteresis system

Equation 32 and 33 where the input of the system was in the range

uk ∈ [0,1]. First, the generated train input signal was put into the

systems and training dataset of the transition model and sensor model

were gathered. The same input signal was used 10 times to get

various trajectories to see the effect of disturbances and noises, so

each trajectory was slightly different. Second, hyperparameter

optimization for GPR was processed to both transition and sensor

models. Kernel function was defined in Equation 19, acquisition

function was Expected-improvement-plus, Quasi-newton convex

optimization is used, and the total number of iteration to find noise

level was 10. Then, another test input signal was prepared by

randomly generating a signal between [0,1] while maintaining

discrete continuity. This testing signals were not trained, but used

for testing the performance only. The performance criteria were

comparing root-mean-squared error (RMSE) and normalized root-

mean-squared error (nRMSE) between the estimated signal and the

reference signal. Three different methods were compared: using only

EKF, using EKF with BLR, and using RNN for comparison study. For

the first example, we inspected the influence of disturbance and noise

level by changing the amplitude of noise level. All RMSE and nRMSE

values were calculated by averaging results of three each attempts.

5.2. Modeling PneuNet-Type Actuator for Proprioceptive

３５

Sensing

A simple integrated soft actuator and sensor system composed of

pneumatic bending actuator with embedded microfluidic soft sensor

was manufactured and HBN was implemented to validate the

performance [44]. To sense the bending by measuring the strain, a

multi-material strain sensor was placed on the top surface of the

actuator for amplified signal. The bottom surface is composed of

Dragon Skin 30 which is stiffer than the top surface. The stiffness

difference between the top and the bottom surface produced bending

motion when the air input was given (Figure 7).

３６

Figure 7. A combination of pneumatic bending actuator and

microfluidic sensor system.

３７

Random input air pressure was applied to the actuator system while

maintaining discrete continuity. Then, the location of the end-

effector of the actuator was recorded by Kinect. The output voltage

of the soft sensor was saved simultaneously. Data tuple {Input, state,

output} were collected for five minutes, and transition, sensor GPR

model was trained. Finally, the HBN localization algorithm was

applied to evaluate the performance, RMSE and nRMSE were

calculated to evaluate the result.

３８

Figure 8. State estimation of artificial hysteresis system Equation 32.

(a) with RNN (comparison study), (b) EKF, (c) EKF+BLR.

Figure 9. State estimation of artificial hysteresis system Equation 33.

(a) with RNN (comparison study), (b) EKF, (c) EKF+BLR.

３９

5.3. Numerical Simulations Result

The RMSE and nRMSE values for each simulation cases are shown

in Table 1. Both values were improved when our proposed method is

used, primarily when the BLR is used to make a linearized model.

Figure 8 and Figure 9 showed the reference and estimation signals

for two artificial hysteresis systems. The tracking quality has

improved when using EKF and EKF+BLR, especially for the second

example. RNN failed to track references due to high noise level. In

order to inspect the influence of noise level, the nRMSE value of

different noise gain was recorded, and the results are shown in Table

2 The result showed that EKF+BLR error was still bounded when

the noise gain reached 10, but the signal diverged when RNN was

used (written in NaN). When the noise gain was 0, the nRMSE value

of EKF+BLR was larger than the case of when BLR was not used.

This is a reasonable result because zero noise gain means that the

transition and sensor model is deterministic, which implies that BLR

cannot be effectively used.

４０

Figure 10. Online state estimation of soft actuator systems. (a) with

RNN (comparison study), (b) EKF, (c) EKF+BLR.

４１

5.4. Actuator Modeling Results

The online state estimation of a real soft actuator-sensor integrated

system was implemented, and the result is shown in Figure 10 The

RMSE and nRMSE values of the result is shown in Table 3 The

performance was significantly improved when the HBN model is used,

primarily when BLR was implemented to the system. Even though the

soft sensor's signal-to-noise ratio was significantly high, our model

captured the noise level efficiently and reflected when processing the

EKF.

４２

Chapter 6. Discussion

6.1. Robust Design and Influence of Noise Level

When designing an EKF, it is essential to figure it out the proper

covariance matrices of disturbance and noise to prevent the faulty.

This statement can be validated in Table 2 where different values of

noise gain were used as a simulation system. Using EKF+BLR, the

nRMSE values were significantly improved because BLR used real

simulated data and reflected their noise level directly, where using

EKF only was not able to consider the noise level itself. After all, the

GPR variance value shows only the modeling uncertainty level, not

the noise level directly. GPR is robust against noisy observations and

attempts to learn the noise level itself by additional hyperparameter.

As a result, the GPR covariance level does not give users how the

noise influences the system, and the noise information is ignored by

the noise hyperparameter.

４３

Figure 11. Reference state and its disturbance variance level of the

system using (a) EKF only and (b) EKF + BLR.

４４

Figure 11 shows the effect of BLR analyzing noise level efficiently.

Relying only on the GPR variance level does not give users how much

the noise level is (conversely, the low variance level proves that GPR

learning is very well done). In contrast, BLR result gives users how

severe the noise level is, reflecting that the noise and disturbance

level of the system are positively correlated with input and state level.

The correlation coefficient between state and variance level was

0.0477 and 0.6295 for EKF and EKF+BLR, respectively. However,

using BLR consumes additional calculation time so the users must

heuristically judge whether they should use EKF and BLR together

or not, considering the nRMSE of both cases and their noise level of

the system. Furthermore, use of Bayesian neural network or

bootstrap ensemble method could be the alternative of BLR.

6.2. Issues in Real-Time Implementation

It is important to use the state estimator in on-line to do some

complicated work such as feedback control, and some issues

appeared to implement the proposed method on-line. As we assumed

to model the system in the discrete-time domain, the sampling

frequency of the operating system is fundamental. So the users must

set the sampling frequency of the system equal for both cases:

gathering training dataset and real-time state estimation. Due to the

delay of microcontroller and computing time of the software, the

sampling frequency of the system is smaller in the testing case,

where EKF and BLR has a matrix inversion computing. As a result,

４５

we set an artificial time delay when gathering the training dataset,

and it worked well to match the sampling frequency of both training

and testing cases.

6.3. Fundamental Limitations of Learning and EKF

The proposed method showed powerful results where simulation

results showed a dramatic improvement in estimation error. Also,

BLR helped the algorithm work more robustly by providing

disturbance and noise level directly, rather than the way of GPR

results. However, there are some fundamental limitations in our

approach, which is originated from the `learning' perspective. The

reason why control theory researchers are hesitant to use deep

learning is that it cannot provide certainty of a stability issue. The

stability issues of the proposed method can be eased when

Assumption 3 provided, but as transition and sensor model is ̀ trained'

by supervised learning method (GPR), we cannot assure that training

is perfectly done in every operating region. This underlying problem

is non-solvable, where many deep learning researchers are still

trying to provide completeness criteria in machine learning such as

using importance sampling [45] or data-driven optimal control

theory [38]. For example, interesting work has been done by

Beckers et al., where they constructed a stabilizing feedback control

strategy with a GPR based feed-forward model of Euler-Lagrange

systems [46]. State-of-the-art soft robot modeling and controlling

papers have this issue in common, for example, modeling dynamics

４６

using RNN [28,29]. Another concern in this paper is the use of EKF,

where few properties regarding convergence and stability test of

EKF have been done. Some studies have done some stability

problems given some assumptions [47,48], but the result may vary

depending on the properties of the system. To guarantee the

minimum criteria for the stability, we introduced Assumption 3.

Nevertheless, EKF is a popular localization algorithm that is widely

used in simultaneous localization and mapping (SLAM) applications.

Furthermore, variations of Kalman filtering can also be applied to this

system, for example, unscented Kalman filtering (UKF) which does

not explicitly derive the linearized system but find the mean and

covariance value using unscented transformation [49].

４７

Chapter 7. Conclusion

In this paper, we introduced a novel method of localization in soft

robot systems using HBN. We proposed that distribution of the

hysteresis output is dependent on its current and previous inputs, and

previous output. This proposition was expanded, and a new Bayesian

network was established. Simulation and real-world application

results supported the reliability of the proposed algorithm, and noise

level dependency was inspected. Since many of existing soft robots

are slowly operating systems, our method could work well in terms

of stability and learning feasibility.

Future work will focus on designing a control system of soft robots

using HBN. Control policies will be added to HBN, and its structure

will be slightly changed. By introducing stochastic control theory in

the sense of reinforcement learning (RL), we can get optimal control

strategies while guaranteeing that the state of the robot system is

sub-optimal by EKF. Since there exists a model-based RL methods

such as PILCO, it will follow a similar way to construct the control

policy. However, since Gaussian Process dynamics of stochastic

states and inputs may lead non-Gaussian state prediction, advanced

techniques such as moment matching is prerequisite. Finally, to be

ready for mass production, a transfer learning based calibration

method will be useful for learning mappings [50,51].

４８

Bibliography

[1] H. In, B. B. Kang, M. Sin, and K.-J. Cho, “Exo-glove: A wearable

robot for the hand with a soft tendon routing system,” IEEE Robotics

& Automation Magazine, vol. 22, no. 1, pp. 97–105, 2015.

[2] P. Polygerinos, Z. Wang, K. C. Galloway, R. J. Wood, and C. J.

Walsh, “Soft robotic glove for combined assistance and at-home

rehabilitation,” Robotics and Autonomous Systems, vol. 73, pp. 135–

143, 2015.

[3] M. Manti, V. Cacucciolo, and M. Cianchetti, “Stiffening in soft

robotics: A review of the state of the art,” IEEE Robotics &

Automation Magazine, vol. 23, no. 3, pp. 93–106, 2016.

[4] J. W. Booth, D. Shah, J. C. Case, E. L. White, M. C. Yuen, O.

CyrChoiniere, and R. Kramer-Bottiglio, “Omniskins: Robotic skins

that turn inanimate objects into multifunctional robots,” Science

Robotics, vol. 3, no. 22, p. eaat1853, 2018.

[5] S. Kim, C. Laschi, and B. Trimmer, “Soft robotics: a bioinspired

evolution in robotics,” Trends in biotechnology, vol. 31, no. 5, pp.

287– 294, 2013.

[6] N. N. Goldberg, X. Huang, C. Majidi, A. Novelia, O. M. O’Reilly, D.

A. Paley, and W. L. Scott, “On planar discrete elastic rod models for

the locomotion of soft robots,” Soft robotics, 2019.

[7] D. Ortiz, N. Gravish, and M. T. Tolley, “Soft robot actuation

strategies for locomotion in granular substrates,” IEEE Robotics and

Automation Letters, vol. 4, no. 3, pp. 2630–2636, 2019.

４９

[8] D. Kim, J. I. Kim, and Y.-L. Park, “A simple tripod mobile robot

using soft membrane vibration actuators,” IEEE Robotics and

Automation Letters, 2019.

[9] R. K. Katzschmann, M. Thieffry, O. Goury, A. Kruszewski, T.-M.

Guerra, C. Duriez, and D. Rus, “Dynamically closed-loop controlled

soft robotic arm using a reduced order finite element model with state

observer,” in 2019 2nd IEEE International Conference on Soft

Robotics (RoboSoft). IEEE, 2019, pp. 717–724.

[10] B. Yu, J. d. G. Fernandez, and T. Tan, “Probabilistic kinematic

model ́of a robotic catheter for 3d position control,” Soft robotics,

vol. 6, no. 2, pp. 184–194, 2019.

[11] H.-S. Shin, J. Ryu, C. Majidi, and Y.-L. Park, “Enhanced

performance of microfluidic soft pressure sensors with embedded

solid microspheres,” Journal of Micromechanics and

Microengineering, vol. 26, no. 2, p. 025011, 2016.

[12] V. Hassani, T. Tjahjowidodo, and T. N. Do, “A survey on

hysteresis modeling, identification and control,” Mechanical systems

and signal processing, vol. 49, no. 1-2, pp. 209–233, 2014.

[13] A. Bilski and M. Twardy, “Hysteresis modeling using a preisach

operator,” International Journal of Electronics and

Telecommunications, vol. 56, no. 4, pp. 473–478, 2010.

[14] M. Rakotondrabe, “Classical prandtl-ishlinskii modeling and

inverse multiplicative structure to compensate hysteresis in

piezoactuators,” in 2012 American Control Conference (ACC). IEEE,

2012, pp. 1646– 1651.

５０

[15] M. Brokate and J. Sprekels, “Hysteresis operators,” in

Hysteresis and phase transitions. Springer, 1996, pp. 22–121.

[16] D. Bruder, B. Gillespie, C. D. Remy, and R. Vasudevan, “Modeling

and control of soft robots using the koopman operator and model

predictive control,” arXiv preprint arXiv:1902.02827, 2019.

[17] Y. Liu, D. Desong, N. Qi, and J. Zhao, “A distributed parameter

maxwell-slip model for the hysteresis in piezoelectric actuators,”

IEEE Transactions on Industrial Electronics, 2018.

[18] R. Changhai, S. Lining, R. Weibin, and C. Liguo, “Adaptive

inverse control for piezoelectric actuator with dominant hysteresis,”

in Proceedings of the 2004 IEEE International Conference on Control

Applications, 2004., vol. 2. IEEE, 2004, pp. 973–976.

[19] R. B. Gorbet, K. A. Morris, and D. W. Wang, “Passivity-based

stability and control of hysteresis in smart actuators,” IEEE

Transactions on control systems technology, vol. 9, no. 1, pp. 5–16,

2001.

[20] M. Al Janaideh, S. Rakheja, and C.-Y. Su, “An analytical

generalized prandtl–ishlinskii model inversion for hysteresis

compensation in micropositioning control,” IEEE/ASME Transactions

on mechatronics, vol. 16, no. 4, pp. 734–744, 2010.

[21] M. S. Sofla, S. M. Rezaei, M. Zareinejad, and M. Saadat,

“Hysteresisobserver based robust tracking control of piezoelectric

actuators,” in Proceedings of the 2010 American Control Conference.

IEEE, 2010, pp. 4187–4192.

[22] W.-F. Xie, J. Fu, H. Yao, and C.-Y. Su, “Observer based control

５１

of piezoelectric actuators with classical duhem modeled hysteresis,”

in 2009 American Control Conference. IEEE, 2009, pp. 4221–4226.

[23] J. Paduart, L. Lauwers, J. Swevers, K. Smolders, J. Schoukens,

and R. Pintelon, “Identification of nonlinear systems using polynomial

nonlinear state space models,” Automatica, vol. 46, no. 4, pp. 647–

656, 2010.

[24] C. Ru, L. Chen, B. Shao, W. Rong, and L. Sun, “A hysteresis

compensation method of piezoelectric actuator: Model, identification

and control,” Control Engineering Practice, vol. 17, no. 9, pp. 1107–

1114, 2009.

[25] X. Chen, Y. Feng, and C.-Y. Su, “Adaptive control for

continuous-time systems with actuator and sensor hysteresis,”

Automatica, vol. 64, pp. 196–207, 2016.

[26] N. Vafamand, M. H. Asemani, A. Khayatiyan, M. H. Khooban,

and T. Dragicevi ˇ c, “Ts fuzzy model-based controller design for a

class of ́ nonlinear systems including nonsmooth functions,” IEEE

Transactions on Systems, Man, and Cybernetics: Systems, no. 99, pp.

1–12, 2017.

[27] D. Kim and Y.-L. Park, “Contact localization and force

estimation of soft tactile sensors using artificial intelligence,” in 2018

IEEE/RSJ International Conference on Intelligent Robots and

Systems (IROS). IEEE, 2018, pp. 7480–7485.

[28] T. G. Thuruthel, B. Shih, C. Laschi, and M. T. Tolley, “Soft robot

perception using embedded soft sensors and recurrent neural

networks,” Science Robotics, vol. 4, no. 26, p. eaav1488, 2019.

５２

[29] T. G. Thuruthel, E. Falotico, F. Renda, and C. Laschi, “Model-

based reinforcement learning for closed-loop dynamic control of soft

robotic manipulators,” IEEE Transactions on Robotics, vol. 35, no. 1,

pp. 124– 134, 2018.

[30] M. T. Gillespie, C. M. Best, E. C. Townsend, D. Wingate, and M.

D. Killpack, “Learning nonlinear dynamic models of soft robots for

model predictive control with neural networks,” in 2018 IEEE

International Conference on Soft Robotics (RoboSoft). IEEE, 2018,

pp. 39–45.

[31] S. J. Russell and P. Norvig, Artificial intelligence: a modern

approach. Malaysia; Pearson Education Limited,, 2016.

[32] H. K. Khalil and J. W. Grizzle, Nonlinear systems. Prentice hall

Upper Saddle River, NJ, 2002, vol. 3.

[33] C. Wang and R. M. Neal, “Gaussian process regression with

heteroscedastic or non-gaussian residuals,” arXiv preprint

arXiv:1212.6246, 2012.

[34] M. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and

data-efficient approach to policy search,” in Proceedings of the 28th

International Conference on machine learning (ICML-11), 2011, pp.

465–472.

[35] S. Choi, M. Jadaliha, J. Choi, and S. Oh, “Distributed gaussian

process regression under localization uncertainty,” Journal of

Dynamic Systems, Measurement, and Control, vol. 137, no. 3, p.

031007, 2015.

[36] K. Lee, S. Choi, and S. Oh, “Inverse reinforcement learning with

５３

leveraged gaussian processes,” in 2016 IEEE/RSJ International

Conference on Intelligent Robots and Systems (IROS). IEEE, 2016,

pp. 3907–3912.

[37] T. Horii, Y. Nagai, L. Natale, F. Giovannini, G. Metta, and M.

Asada, “Compensation for tactile hysteresis using gaussian process

with sensory markov property,” in 2014 IEEE-RAS International

Conference on Humanoid Robots. IEEE, 2014, pp. 993–998.

[38] Y. Jiang and Z.-P. Jiang, Robust adaptive dynamic programming.

John Wiley & Sons, 2017.

[39] Q. Zhang, “On stability of the kalman filter for discrete time

output error systems,” Systems & Control Letters, vol. 107, pp. 84–

91, 2017.

[40] P. D. Hoff, A first course in Bayesian statistical methods.

Springer, 2009, vol. 580.

[41] J. Fu, S. Levine, and P. Abbeel, “One-shot learning of

manipulation skills with online dynamics adaptation and neural

network priors,” in 2016 IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 4019–4026.

[42] S. Levine and P. Abbeel, “Learning neural network policies with

guided policy search under unknown dynamics,” in Advances in

Neural Information Processing Systems, 2014, pp. 1071–1079.

[43] W. H. Montgomery and S. Levine, “Guided policy search via

approximate mirror descent,” in Advances in Neural Information

Processing Systems, 2016, pp. 4008–4016.

[44] M. Park, Y. Ohm, D. Kim, and Y.-L. Park, “Multi-material soft

５４

strain sensors with high gauge factors for proprioceptive sensing of

soft bending actuators,” in 2019 2nd IEEE International Conference

on Soft Robotics (RoboSoft). IEEE, 2019, pp. 384–390.

[45] S. Levine and V. Koltun, “Guided policy search,” in International

Conference on Machine Learning, 2013, pp. 1–9.

[46] T. Beckers, D. Kulic, and S. Hirche, “Stable gaussian process

based ́tracking control of euler–lagrange systems,” Automatica, vol.

103, pp. 390–397, 2019.

[47] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, “Stochastic

stability of the discrete-time extended kalman filter,” IEEE

Transactions on Automatic control, vol. 44, no. 4, pp. 714–728, 1999.

[48] K. Reif and R. Unbehauen, “The extended kalman filter as an

exponential observer for nonlinear systems,” IEEE Transactions on

Signal processing, vol. 47, no. 8, pp. 2324–2328, 1999.

[49] J. Ko and D. Fox, “Gp-bayesfilters: Bayesian filtering using

gaussian process prediction and observation models,” Autonomous

Robots, vol. 27, no. 1, pp. 75–90, 2009.

[50] M. P. Deisenroth, “Learning to control a low-cost manipulator

using data-efficient reinforcement learning,” Robotics: Science and

Systems, pp. 57–64, 2011.

[51] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE

Transactions on knowledge and data engineering, vol. 22, no. 10, pp.

1345– 1359, 2009.

５５

Abstract

소프트 로봇의 이력현상과 스토캐스틱 반응 때문에 이를 모델링, 예측,

제어하는 것은 매우 어려운 과제이다. 베이지안 네트워크의 특징과 기존

의 딥러닝 방법을 보완하여, 확률론적 방법으로 이력 현상을 모델링하는

기법을 제시한다. 이력현상이 3개의 파라미터에 의존하는 가우스 과정으

로 모델링하여, 이력 베이지안 네트워크인 HBN을 구축한다. 이는 로봇

의 상태 변화 모델과 센서 모델을 효과적으로 모델링할 수 있다. 특히,

로봇의 스토캐스틱 현상을 반영하기 위해 가우스 과정 회귀 모델을 이용

하고, 상태 예측으로 확장 칼만 필터를 이용한다. 또한 노이즈에 대항하

기 위해 베이즈 선형 모델을 사후 확률로 사용하는 방법을 제시한다. 시

뮬레이션과 실제 로봇 시스템의 적용으로 제시한 방법이 유효함을 검증

한다. 이 방법을 통해 강인함을 유지함과 동시에 소프트 로봇을 모델링

할 수 있음을 알 수 있다.

Documents

Disclaimers-space.snu.ac.kr/bitstream/10371/166410/1/000000160957.pdf · 2020. 5. 18. · soft robotics, they are confronted with these limitations. Many studies have attempted to