Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Tesi di Laurea Magistrale in Sistemi Distribuiti
Reliability Assessment of
Microservice Architectures Anno Accademico 2017/2018 relatore Ch.mo Prof. Stefano Russo correlatore Prof. Roberto Pietrantuono candidato Antonio Guerriero matr. M63000564
Dedicata a chi mi è stato sempre accanto, alla mia famiglia, ai miei amici, a Zia Rita, a Nonno Antonio, a Milly.
Index
Index ......................................................................................................................................... III Introduction .................................................................................................................................5 Chapter 1: Background ................................................................................................................7
1.1 Microservices Architectures ...............................................................................................7 1.1.1 Example ...............................................................................................................8
1.2 Reliability Assessment ..................................................................................................... 10 1.3 Adaptive Web Sampling .................................................................................................. 11
1.3.1 Sampling setup ................................................................................................... 11 1.3.2 Design ................................................................................................................ 12
1.4 Related Work ................................................................................................................... 13 Chapter 2: MART strategy ......................................................................................................... 17
2.1 MART overview ............................................................................................................... 17 2.1.1 Assumptions ....................................................................................................... 18
2.2 Test generation algorithm ................................................................................................. 18 2.2.1 Domain Interpretation ......................................................................................... 18 2.2.2 Weights Matrix determination ............................................................................ 22 2.2.3 Testing strategy .................................................................................................. 25 2.2.4 Estimation .......................................................................................................... 26 2.2.5 Active Set Update ................................................................................................ 28 2.2.6 Algorithm implementation .................................................................................. 29
2.3 Probability Update ........................................................................................................... 35 2.4 Formulation with dynamic sampler selection .................................................................... 36
Chapter 3: Simulation of the test generation algorithm ............................................................... 39 3.1 Simulation Scenarios ........................................................................................................ 39
3.1.1 Population generators ......................................................................................... 41 3.2 Evaluation Criteria ........................................................................................................... 42 3.3 Empirical correction of Estimator ..................................................................................... 42 3.4 Sensitivity Analysis .......................................................................................................... 45
3.4.1 Sensitivity Analysis in static implementation ...................................................... 45 3.4.2 Sensitivity Analysis in dynamic implementation ................................................. 50
3.5 Results ............................................................................................................................. 59 3.5.1 MSE ................................................................................................................... 59 3.5.2 Sample Variance ................................................................................................. 66 3.5.3 Failing Point Number ......................................................................................... 73 3.5.4 Considerations .................................................................................................... 80
Chapter 4: Experimentation ....................................................................................................... 82 4.1 Pet Clinic ......................................................................................................................... 82 4.2 MART setup .................................................................................................................... 83 4.3 Functions ......................................................................................................................... 85
4.3.1 True Reliability calculation function ................................................................... 85 4.3.2 Update functions ................................................................................................. 86 4.3.3 Reliability Assessment function .......................................................................... 87 4.3.4 Operational Testing function............................................................................... 87 4.3.5 Distance between profiles function ..................................................................... 88
4.4 Experimental design ......................................................................................................... 88 4.4.1 Experimental scenarios ....................................................................................... 88 4.4.2 Evaluation criteria .............................................................................................. 90 4.4.3 True Reliability estimation.................................................................................. 90
4.5 Results ............................................................................................................................. 91 4.5.1 Experiment 1 ...................................................................................................... 91 4.5.2 Experiment 2 ...................................................................................................... 93 4.5.3 Experiment 3 ...................................................................................................... 94 4.5.4 Experiment 4 ...................................................................................................... 96 4.5.5 Further considerations......................................................................................... 97
4.6 ANOVA........................................................................................................................... 97 Conclusions ............................................................................................................................. 101 Bibliography ............................................................................................................................ 102
Reliability Assessment of Microservice Architectures
5
Introduction
The problem addressed in this Thesis is an open problem, it refers to the Reliability
Assessment of Microservice Architectures (MSA). Nowadays, many software
architectures are defined according to this style, especially by companies providing on-
demand software services; a relevant example is represented by Netflix [1], the company
is the world's leading internet entertainment service.
Microservice architectures are typically developed by means of an agile approach (as in
DevOps), with frequent deliveries of software services (up to several releases per day).
In such a dynamic scenario, service reliability may change over time, due to changes of
the service provisioning and/or in the way services are used by customers (operational
profile). The goal is to determine how these changes may affect the reliability of the entire
system. Hence, we deal with the specific problem of online reliability assessment, i.e.
when the system is in the operational phase. All these considerations require the ability of
providing reliability estimates that are, at the same time, accurate and efficient.
There are different techniques for reliability assessment, for example through conceptual
models of the system under test. In this case, the assessment is performed by testing (the
system under test is online and in its usage environment). This approach has been rarely
used, because it is difficult to obtain the exact operational profile of system, but in this
Thesis a technique to overcome this limit is presented.
Reliability Assessment of Microservice Architectures
6
Starting from the preliminary tester’s knowledge, the information about the operational
profile and the proneness to failure of a given service is progressively refined during the
execution. This information can make the difference in reliability assessment; in fact, with
agile development and in-vivo testing, there is an implicit feedback mechanism that allows
us to update the system representation.
Operational testing does not consider several characteristics about this kind of software
systems, in particular:
1. Variability of Microservices, caused by the high frequency of release;
2. Variability of the operational profile;
3. Testing budget constraints.
The technique developed in this Thesis, called Microservice Adaptive Reliability
Testing (MART), exploits the information coming from the field and an advanced
sampling algorithm to have an updated, accurate and efficient estimate of reliability.
In particular, a new testing algorithm is developed, based on an adaptive sampling
procedure particularly suited for rare and clustered population [2], like faults in
operational software systems.
The Thesis is organized in four chapters: first, a Background and Related work
examination is presented, in order to explain preliminary knowledge and to survey the
existing literature about reliability assessment of MSA; the second chapter is about the
formulation, in which the conceptual transformation from the sampling to the MART
strategy is described; the third chapter reports results of the simulation, to evaluate the
behavior of the algorithm under specific assumptions so as to determine the best
configuration; the fourth chapter deals with the experimentation, in which MART is
validated and evaluated on a real system.
Reliability Assessment of Microservice Architectures
7
Chapter 1: Background
This Chapter presents the knowledge needed as background for the rest of the Thesis. In
particular, it focuses on three points: Microservice Architectures, reliability assessment
and the sampling strategy defined by Thompson in [2].
To verify if reliability assessment of Microservice Architectures is an open problem and is
actually a problem, the current literature about the covered subjects is examined in the
related work section.
1.1 Microservices Architectures As described in Paolo Di Francesco, Patricia Lago, Ivano Malavolta article [3], the most
recurring definition is the one provided by J. Lewis and M. Fowler [4]: “the microservice
architectural style is an approach to developing a single application as a suite of small
services, each running in its own process and communicating with lightweight
mechanisms, often an HTTP resource API. These services are built around business
capabilities and independently deployable by fully automated deployment machinery.
There is a bare minimum of centralized management of these services, which may be
written in different programming languages and use different data storage technologies”.
Microservice Architectures are particularly prone to be implemented by extremely agile
processes, for which automated continuous development, integration, test and release,
monitoring and feedback are a cornerstone. All these features are supported by the
underlying cloud-based technologies, such as containers, allowing to alleviate several
manual tasks and save people time.
Reliability Assessment of Microservice Architectures
8
Microservice Architectures are characterized by a simple communication infrastructure; in
fact, they are usually based on REST, in this way every service can be reached with an
URI using HTTP.
Several technologies are being developed to implement MSA-based applications. In this
Thesis, the framework based on Spring is adopted. Specifically, Spring Boot, based on
Spring framework, is used to accelerate and facilitate applications development. Spring
Cloud, builds on Spring Boot by providing a bunch of libraries that enhance the behavior
of an application when included [5].
Another important characteristic is the use of Netflix OSS [6], that is the open source
Microservices implemented by Netflix, that are at the forefront in managing and
implementing Microservices systems.
Spring Cloud Netflix [7] is a project that offers Netflix OSS integrations for Spring Boot
apps through auto-configuration and binding to the Spring Environment and other Spring
programming model idioms.
1.1.1 Example
Microservice Architecture (MSA) arises from the broader area of service-oriented-
architecture (SOA). As described in [8] and [9], there are several differences between
SOA and MSA. The following example, taken from [10], shows a simple application
structured according to the MSA style compared to an SOA in order to highlight the main
difference between the two - a detailed comparison is beyond the scope of this Thesis. The
example is on a system for e-commerce: there are 2 main roles in an SOA, a service
provider and a service consumer. A software agent can play both roles. The consumer
layer is the point where consumers (human users, other services or third parties) interact
with the SOA, and the provider layer consists of all the services defined within the SOA,
in detail: a service to manage the order, a service to manage the inventory and a service to
manage the shipping. Figure 1 shows a quick view of an SOA architecture.
Communication is based on the Enterprise Service Bus (ESB), that allows communication
via a common communication bus consisting of a variety of point-to-point connections
Reliability Assessment of Microservice Architectures
9
between providers and consumers. In addition, the data storage is shared with all the
services.
Figure 1: SOA representation
On the other hand, in an MSA, services should be independently deployable, or it should
be possible to shut-down a service when is not required in the system with no impact on
other services. Figure 2 shows a view of an MSA.
Figure 2: MSA representation
This architecture makes all the services independent, each one implemented in a
microservice, and the communication mechanism (usually REST-based) is simpler than in
the SOA case.
Reliability Assessment of Microservice Architectures
10
1.2 Reliability Assessment Dependability is a software quality factor, defined as “the trustworthiness of a computer
system such that reliance can justifiably be placed on the service it delivers” [11]; it is
composed of five attributes: Reliability, Availability, Safety, Integrity and Maintainability.
Reliability is a Dependability attribute with different definitions, but the most commonly
used in engineering application is: “the characteristic of an item expressed by the
probability that it will perform a required function under stated conditions for a stated
period of time” [12]. In the most general case, reliability can be defined as 𝑅(𝑡, 𝜏), the
probability that the system is in proper service in the interval [t, t+𝜏], given that it was in
proper service at time t:
𝑅(𝑡, 𝜏) = 𝑃(! F𝑎𝑖𝑙𝑢𝑟𝑒𝑖𝑛(t, 𝜏)|𝑝𝑟𝑜𝑝𝑒𝑟𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑖𝑛𝑡)
In particular, when the interval is [0, t], reliability can be described as:
𝑅(𝑡) = 𝑃(! 𝐹𝑎𝑖𝑙𝑢𝑟𝑒𝑖𝑛(0, 𝑡))
Defined F(t) as unreliability, the Cumulative Distribution Function (CDF) of the failing
time, reliability is computed as:
𝑅(𝑡) = 1 − 𝐹(𝑡)
Defining 𝜆(𝑡) as the failure rate of a system, measured as number of failures per hour, it is
possible to calculate:
𝑅(𝑡) = 𝑒?@A
This latter is called “exponential failure law”: in a system with constant failure rate,
reliability decreases exponentially over time.
In the considered systems, a failure is perceived only when a request is submitted to a
Microservice, thus, the discrete reliability is introduced:
𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 1 − 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦𝑜𝑓𝐹𝑎𝑖𝑙𝑢𝑟𝑒𝑂𝑛𝑑𝑒𝑚𝑎𝑛𝑑
What does reliability assessment mean? It is a way to quantify the reliability of a system.
Reliability Assessment of Microservice Architectures
11
There are different techniques to do this, for example reliability assessment can be
performed by conceptual models of system under test, but in this Thesis the assessment is
carried out by testing. Testing effectiveness and efficiency are strongly influenced by test
cases generation strategy. In the following section, the adaptive web sampling scheme is
introduced to formulate the test generation algorithm of the MART strategy.
1.3 Adaptive Web Sampling Sampling is a statistical technique to infer characteristics about an entire population from a
set samples (a set of observations, in our case a set of test cases).
The considered sampling algorithm is the Adaptive Web Sampling Without Replacement,
described by Thompson in [2]. This choice depends on many factors: in software systems
failures are a rare population, all information on operational profile and failure
probability can be codified in way to “drive” the sampling to the most significant samples.
It is interesting to deep the idea expressed by Thompson: he considers a network
characterized by a certain number of nodes and links between them; the goal of adaptive
web sampling is to estimate a variable y selecting n nodes from available N.
This sampling is made with a mixture distribution, in particular nodes are chosen
randomly or depending on the links weight.
1.3.1 Sampling setup
In this formulation, the author considers a population made by a set of labeled units (1, 2,
..., N); every label is associated with one or more variable of interest yi. A couple of values
is attached to each node:
• yi: associated with the ith node;
• wij: associated with any pair of nodes i, j (this value indicates if a link exists
between i and j; it describes the possible weight).
A sample s is defined as a subset of units from the population: it represents both a sample
of nodes s(1) and a sample of pairs of nodes s(2). The design is adaptive if it depends on any
of the variables of interest in the sample.
Reliability Assessment of Microservice Architectures
12
The original data, defined as D0, is the sequence of sample units, in the order selected,
with their associated value. It is assumed that the minimal sufficient statistic consists only
on the set of label of distinct units selected, with the associated y value. Finally, define
reduced data as 𝐷I = {(𝑖, 𝑦K), L(𝑗, 𝑘),𝑤PQR ; 𝑖 ∈ 𝑠(U), (𝑗, 𝑘) ∈ 𝑠(V)}.
1.3.2 Design
An initial sample s0 is selected from some design p0. At the kth step, the sample sk is taken,
depending on values associated with the current active set ak. The active set is defined as
subsequence or subset of current sample, together with any associated variable of interest.
Thus, the selection distribution 𝑝Q(𝑠Q|𝑎Q, 𝑦XY, 𝑤XY) is defined.
The idea is to select the next sample considering the probability d: the next unit is taken
using a distribution based on the unit values, or graph structure of the active set, with a d
probability; alternatively, another distribution is used, for example, based on the sampling
frame or spatial structure of the population. This can be realized by a mixture distribution.
In [2], a link is selected randomly from the active set with probability d, instead, with
probability 1-d this link is taken completely at random. The probability d depends on
values in the active set; in fact, if there are not outgoing links from the active set, the next
unit must be taken randomly.
The adaptive selection can be made unit by unit or in waves. One of the most important
features of this approach is the flexibility, that is obtained by the use of mixture
distribution and by the allocation of part of effort to the initial sample. The flexibility can
balance the way in which the population is explored: if you go deeper following links or
go wide with only one or a few waves.
The current sample is defined as sck; the number of units in the active set (or current
sample) nak (nck). The next set of units sk is selected with a probability 𝑞(𝑠Q|𝑎Q, 𝑦XY,𝑤XY),
in the wave’s case sckt (tth unit in the kth wave) is considered.
When the tth unit in the kth wave is selected, it is possible determinate wakt+ as the total
number of the links out, or the total of the weight values, from the active set ak to units
non in the current sample sckt, consequently it is possible to define 𝑤XY[\ =
Reliability Assessment of Microservice Architectures
13
∑ 𝑤KP{K∈XY[,P∈^∗`Y[} .
Thus, for each i unit yi and wi+ are observed, instead, for each couple (i,j), with i and j both
in the sample, wij and wji are observed.
Now, the case in which a unit that is not already in the sample is taken: assuming that in
the active set there are one or more units having links (or positive weight values) out to
unit i, referred as 𝑤XYK = ∑ 𝑤KP{P∈XY} , the probability that this unit is taken is: 𝑞QAK = 𝑑
abYcabY[d
+ (1 − 𝑑) U(f?gh`Y[)
,
where d is between 0 and 1, if there are no links out from the active set only the second
part of the equation is considered.
The equation allows to take a unit linked to the active set randomly with probability d, ot
to select a random unit among those that are not part of current sample with probability 1-
d.
The overall sample selection probability is 𝑝(𝑠) = 𝑝i ∏ ∏ 𝑞QAKgYAkU
lQkU , where:
• ikt: is the tth unit in the kth wave;
• kth wave: in the order selected is 𝑠𝑘 = (𝑖QU, … , 𝑖QgY);
• nk: is the size of the kth wave;
• p0: is the selection probability of the initial sample;
• K: is the number of the waves.
The link weight can be associated to the selection probability of the current unit,
𝑝(𝑖|𝑠nQA, 𝑎Q, 𝑦XY, 𝑤XY). In the most general context w is a continuous weight variable and
the extraction probability is: 𝑞QAK = 𝑑𝑝o𝑖p𝑠nQA, 𝑎Q, 𝑦XY,𝑤XYq + (1 − 𝑑)𝑝(𝑖|𝑠nQA).
It is possible add a step in which the solution obtained is accepted/rejected, based on a
certain value, such as value of y or out degree. As previously stated d can be replaced with
a probability 𝑑(𝑘, 𝑡, 𝑎Q, 𝑦XY, 𝑤XY), so that depends from nodes and links in the active set or
changes as sample selection progresses.
1.4 Related Work A considerable number of works about Microservice Software Architectures are being
Reliability Assessment of Microservice Architectures
14
published in the last few years. Besides architectural and design issues, researchers started
targeting quality concerns and how this new architectural style impacts them. Among the
quality attributes of interest, performance and maintainability are the most investigated
ones, followed by security-related studies, according to a recent mapping study [3].
Reliability is considered in few studies and always in its broader acceptation related to
dependability (i.e., fault tolerance, robustness, resiliency, anomaly detection) – no study
deals with reliability meant as probability of failure-free operation for a specified time and
environment. Moreover, reliability-related considerations rarely appear as the main
proposal of a research, but more often as a side concern in a design-related proposal.
For instance, in [13] authors propose a novel architecture that enables scalable and
resilient self-management of microservices applications on cloud, emphasizing continuous
management monitors application and infrastructural metrics to provide automated and
responsive reactions to failures (health management) and changing environmental
conditions (auto- scaling) minimizing human intervention. In [14] authors define a
prototype framework for software service emergence called Mordor, with the scope of
enabling service emergence in a pervasive environment. In [15] a formal model for multi-
phase live testing is defined to test changes or new features in the production environment.
In [16] is discussed identifies the key challenges that impede realizing the full promise of
containerizing infrastructure services. In [17] is described the possibility to use
Microservices patterns for IoT applications. In [18] authors introduced a software to
compose one or more arbitrary Docker containers to realize systems of Microservices. In
[19] the author proposed a new Cloudware PaaS platform based on microservice
architecture and light weighted container technology, with this application traditional
software are deployable without any modification, by utilizing the microservice
architecture. In [20] is defined a framework for systematically testing the failure-handling
capabilities of microservices. In [21] author introduced an automated tool to be able to
understand the service architecture, topology, and be able to inject faults to assess fault
tolerance and resiliency of the system. In [22] authors present some related cloud
Reliability Assessment of Microservice Architectures
15
computing patterns and discuss their adaptations for implementation of IMS or other
telecommunication systems. In [23] are present three cloud microservices that can
substantially accelerate the development and evolvement of location and context-based
applications. In [24] learning-based testing (LBT) is used to evaluate the functional
correctness of distributed systems and their robustness to injected faults. In [25]
experiences about the migrating a monolithic on-premise software architecture to
microservices, with positive and negative considerations. In [26] authors propose a
decentralized message bus to use as a communication tool between services.
In these articles, the problem of Microservices implementation, deployment and
improvement is central, but with no reference to reliability assessment; however, from
these emerge the need to evaluate the environment characteristic and the general
dependability of deployed system.
To enlarge the scope, it is possible to consider the quality assessment, in particular
Reliability, Performance and Security: in all cases there are references to evaluation rather
than to assessment of the Quality attributes defined.
Once again, in the scope of reliability assessment of Microservice Architectures there are
no references. For this reason, the subject matter in this thesis is an absolute novelty.
In MSAs, as well as in many other scenarios, the assumption of an operational profile
known at development time is easily violated. In addition, in MSA the changing of the
software itself needs to be considered too, as continuous service up- grades occur. The
method proposed here is conceived to encompass both the updates of the profile and of the
services’ software failure probability to generate new tests at run-time in order to assess
the actual reliability depending on the current usage and deployed software. Indeed,
generating and executing tests at run-time further stresses the second issue always
criticized to operational testing, namely the high cost required to expose many failures
besides the high-occurrence ones. To this aim, evolutions of operational testing could be
considered, which improve the fault detection ability by a partition-based approach and
through adaptation. For instance, Cai et al. published several papers on Adaptive Testing,
Reliability Assessment of Microservice Architectures
16
in which the assignment of the next test to a partition is based on the outcomes of previous
tests, with the overall goal of reducing the variance of the reliability estimator, as in [27].
The profile (assumed known) is defined on partitions, and selection within partitions is
done by simple random sampling with replacement (SRSWR). Adaptiveness is also
exploited in [28], where importance sampling is used to allocate tests toward more failure-
prone partitions. Adaptive random testing (ART) [29] also exploits adaptiveness, as test
outcomes are used to evenly distribute the next tests across the input domain, but it aims at
increasing the number of exposed failures rather than at assessing reliability. Besides
adaptive- ness, the sampling procedure is another key to improve the efficiency while
preserving unbiasedness of the estimate. In [30], authors introduced a family of sampling-
based algorithms that, exploiting the knowledge of testers about partitions, enable the
usage of more efficient sampling strategies than SRSWR/SRSWOR. The proposed method
includes a new sampling-based testing algorithm, conceived to quickly detect clusters of
faults with very scarce testing budget, hence, suitable for run-time testing, and by
dynamically considering the updated operational profile and the services version being
deployed at assessment time. Estimation efficiency (i.e., small-variance) and accuracy
(w.r.t. to the real reliability at assessment time) are both pursued thanks to these features.
Reliability Assessment of Microservice Architectures
17
Chapter 2: MART strategy
This Chapter focuses on MART, in particular the testing strategy is formulated, detailing
the test generation algorithm, the reliability estimation and the update of knowledge about
the MSA application.
2.1 MART overview MART is a testing technique, that exploits the sampling-based testing strategy and field
information to update the knowledge about the application under test and to improve the
assessment task.
Figure 3: MART
As described in Figure 3, MART is characterized by two principal steps: the first step is
Reliability Assessment of Microservice Architectures
18
performed Development-time, it consists of Initialization, in which the input domain is
partitioned and interpreted as a network, linking test frames between them; Mapping of
occurrence probabilities to each partition, to define the “suspected” operational profile.
The second step is run-time, in which the reliability assessment is performed on demand
and the probabilities update is made cyclically. These operations are fundamental to adapt
the used operational profile to the true one.
2.1.1 Assumptions
To guarantee the correct MART usage, the following assumptions are considered:
• The application can be monitored: it is possible to collect requests and responses to
microservices;
• Frozen code during the assessment operations;
• Perfect oracle: it is always possible to recognize if a failure occurred or not;
• The test case domain can be partitioned;
• Independent requests (this assumption derives from the independent nature of
Microservices).
2.2 Test generation algorithm This section reports the formulation of the test generation algorithm of MART. This may
involve a lot of optimizations that can be used to improve the behavior of the sampling
technique. The formulation starts with the domain interpretation, in which the test case
domain is partitioned and interpreted like a network, defining the weight matrix.
2.2.1 Domain Interpretation
To partition the input domain, it is introduced the concept of test frame, defined as an
element of the Cartesian product of equivalence classes.
For example, the methods OP1(inputClass1, inputClass2, …inputClassM) is characterized
by M different input classes. Each of this is made by a certain number of instance, for
instance the inputClass1 assumes 5 values:
Reliability Assessment of Microservice Architectures
19
• inputClass1,1 = in range positive integer;
• inputClass1,2 = in range negative integer;
• inputClass1,3 = out of range positive integer;
• inputClass1,4 = out of range negative integer;
• inputClass1,5 = 0.
The test frame is a combination of these classes, for example: OP1(inputClass1,4,
inputClass2,3, … inputClassM,K).
In this way, the representation of the input domain like a network is simpler, identifying
possible links between test frames, for example, based on the Microservice they belong to.
Example
Three methods, described in Table 1, are considered to made a test frame example. Table 1: example methods
Login(String username, String password); username made by at least 8
characters, password made by at
least 8 characters with at least
one number and one special
character.
Select(int selection, int flag, boolean value); selection∈[0, 12], flag∈{0, 1,
2}.
Register(String username, String password,
int age);
username made by at least 8
characters, password made by at
least 8 characters with at least
one number and one special
character, age∈[16, 99].
The following input classes are defined:
InputUsername with K = 3 values:
• String ≥ 8 characters -> Valid Class;
• String< 8 characters -> Invalid Class;
Reliability Assessment of Microservice Architectures
20
• Username is not String-> Invalid class.
InputPassword with K = 5 values:
• String≥ 8 characters, with at least one number and one special character -> Valid
class;
• String≥ 8 characters, without number -> Invalid class;
• String≥ 8 characters, without special character -> Invalid class;
• String< 8 characters -> Invalid class;
• String with only special character -> Invalid class
InputSelection with K = 4 values:
• In range Integer -> Valid class;
• Out of range Integer -> 12 -> Invalid class;
• Negative out of range Integer -> Invalid class;
• Values different from Integer (ex.: character) -> Invalid class
InputFlag with K = 5 values:
• Integer = 0 -> Valid class;
• Integer = 1 -> Valid class;
• Integer = 2 -> Valid class;
• Integer value different from {0, 1, 2} -> Invalid class;
• Value different from Integer -> Invalid class.
InputValue with K = 3 values:
• Value = true, 1 -> Valid class;
• Value = false, 0 -> Valid class;
• Value different from {true, false, 0, 1}.
InputAge with K = 4 values:
• In range Integer -> Valid class;
• Out of range Integer > 99 -> Invalid class;
• Out of range Integer < 16 -> Invalid class;
• Values different from Integer (ex.: character) -> Invalid class.
Reliability Assessment of Microservice Architectures
21
Thus can be defined test frames shown in Table 2. Table 2: test frames example
1 Login(InputUsername1, InputPassword1); NF
2 Login(InputUsername2, InputPassword1); F
3 Login(InputUsername3, InputPassword1); F
4 Login(InputUsername1, InputPassword2); F
5 …
6 Login(InputUsername3, InputPassword5); F
7 Select(InputSelection1, InputFlag1, InputValue1); NF
8 Select(InputSelection2, InputFlag1, InputValue1); F
9 Select(InputSelection3, InputFlag1, InputValue1); F
10 Select(InputSelection4, InputFlag1, InputValue1); F
11 Select(InputSelection1, InputFlag2, InputValue1); NF
12 …
13 Select(InputSelection4, InputFlag5, InputValue3); F
14 Register(InputUsername1, InputPassword1, InputAge1); NF
15 Register(InputUsername2, InputPassword1, InputAge1); F
16 Register(InputUsername3, InputPassword1, InputAge1); F
17 …
18 Register(InputUsername3, InputPassword5, InputAge4); F
To perform the reliability assessment, it is necessary to adequately represent the system, so
as to encode the information on the operational profile and the probability of failure
within the test frames. Two values are attached to each test frame:
• the occurrence probability, between 0 and 1 and such that ∑ 𝑂𝑃KK = 1, represents
the probability that a test case taken from the correspondent partition is executed
during system execution;
• the failure probability, between 0 and 1, that represents the proneness to failure of
partition’s test cases.
Reliability Assessment of Microservice Architectures
22
The way in which these values are encoded determines the impact of MART strategy.
2.2.2 Weights Matrix determination
This Section focuses on how to define the links between test frames and on how to encode
the information about their failure proneness, then exploited for reliability assessment.
Test frames failure probability
In the domain interpretation, test frames are introduced as partition of test cases and two
probabilities are attached to them. In particular, the concept of failure probability is
introduced: in case of perfect partitioning there are only test frames with failure
probability 0 and test frames with failure probability 1. This means that all test cases
inside a partition are failing or not. This situation is unrealistic, therefore test frames with
failure probability between ]0, 1[ are considered and this information is exploited to
“drive” testing strategy towards test cases more prone to failure.
Weight calculation
To determine the weight matrix, a concept of distance between test frames is defined. For
this purpose, more techniques can be considered; for example, this distance can be
computed starting from the signature of the test frame and calculating the Hamming
distance. In the rest of the Thesis, the distance is computed using the distance factor, that
is the number of differing input classes between two test frames. The distance between test
frames is calculated as the module of difference of their distance factors.
As previously defined, weights are the coding of tester’s information, in a way to “drive”
the algorithm towards test frames more prone to failure. For this aim, it is necessary to
exploit all the information and to combine them: one approach is based on the use of joint
probability, that is to configure the weights considering the probability that the next
selected node could fail if the previously selected nodes has exposed a failure.
In case of failure, testing strategy should be prefer destination nodes whose joint
probability with the source node is maximum. The joint probability𝑃𝑟(𝐷 ∩ 𝑆) =
Reliability Assessment of Microservice Architectures
23
𝑃𝑟(𝐷|𝑆) ∗ 𝑃𝑟(𝑆), where Pr(S) is the failure probability of source (S), Pr(D) is the failure
probability of destination (D), Pr(D|S) has to be determined as a distance function.
The probability represents the “belief” that D fail since S is failed. This belief increases as
the distance k decreases, remembering that k is the module of difference between distance
factors, it is always an integer greater than or equal to zero. Thus, this probability is
defined as 𝑃𝑟(𝐷|𝑆) = 𝑃𝑟(𝐷) ∗ 𝑓(𝑘), with 𝑓(𝑘) that increases to decreasing of k (with
successive normalization): in particular it is considered 𝑓(𝑘) = 1/𝑘, in this way weight is
inversely proportional to the distance and, when 𝑘 > 0, the product with P(D) is between
0 and 1. Else if 𝑘 = 0, this quantity is not calculated because there is not a link.
The assumption is that the “belief on Pr(S)” remains unchanged (independently from the
observation or failures). Removing this assumption, the performances can be improved,
but in this moment the static case is considered.
In conclusion, weights determination is based on definition of the probabilities associated
at different couples of test frames. Considering a couple of test frame (destination, source),
weights are calculated as:
𝑃𝑟(𝐷 ∩ 𝑆) = 𝑃𝑟(𝐷|𝑆) ∗ 𝑃𝑟(𝑆) = 𝑃𝑟(𝐷) ∗ 𝑓(𝑘) ∗ 𝑃𝑟(𝑆)
where k is the distance between nodes and where 𝑓(𝑘) = 1/k.
Example
Figure 4 is considered as a simple example of network: the distance between every couple
of nodes is 1 (𝑘KP = 1, ∀𝑖𝑒∀𝑗), knowing that the assignment of weights is achieved
Figure 4: Example
Reliability Assessment of Microservice Architectures
24
through the joint probability calculation 𝑃𝑟(𝐷 ∩ 𝑆) = 𝑃𝑟(𝐷|𝑆) ∗ 𝑃𝑟(𝑆) = 𝑃𝑟(𝐷) ∗
𝑓(𝑘) ∗ 𝑃𝑟(𝑆) and considering 𝑓(𝑘) = UQ, they are:
1. About node 0:
a. W1 = 0.08;
b. W3 = 0.24;
c. W5 = 0.32.
2. About node 1:
a. W0 = 0.08;
b. W6 = 0.03.
3. About node 2:
a. W2 = 0.24.
4. About node 3:
a. W4 = 0.32.
It is possible to notice that starting from node 0 it is more likely to reach nodes 2 and 3,
rather than the node 1; as well as starting from node 1 it is more likely to reach the node 3
and node 0.
In Table 3 the matrix built on the obtained weights is shown, considering sources on rows
and destinations on columns.
Table 3: weight matrix
𝑆 𝐷⁄ 0 1 2 3 0 0 0.08 0.24 0.32 1 0.08 0 0.03 0 2 0.24 0 0 0 3 0.32 0 0 0
Weights Matrix computation
To calculate the weights matrix, it is necessary to consider the following three steps:
• Test frames acquisition;
• Joint probability computation for each couple of test frames;
• Matrix population.
Reliability Assessment of Microservice Architectures
25
Executive steps are performed by two classes, TestFrameDistance and TestFrameLoader,
and they are represented in the activity diagram in Figure 5.
Figure 5: Distance Matrix Population
2.2.3 Testing strategy
In this Section test frames and weight matrix concepts are used to define testing strategy.
This strategy is based on Thompson’s idea in adaptive web sampling without replacement
described in Section 1.3.
Testing strategy is characterized by a two-stage sampling: for the first stage, the sampling
unit is the test frame; for the second stage a test case is generated randomly from the
partition. After each test case generation, it is executed and its outcome is collected for the
estimation.
As adaptive web sampling, this testing strategy consists of two principal steps: at first, n0
test frames (with 𝑛0 ≥ 1) are selected using Simple Random Sampling Without
Replacement (SRSWR), building the “initial sample”; in second instance the remaining
Reliability Assessment of Microservice Architectures
26
units are selected with a mixture distribution.
First step is necessary to populate the active set, that represents the set of extracted test
frames at each pass, which constitutes the knowledge base for the second step execution.
The concept of mixture distribution can be translated as “the use of two different
samplers”, in particular the use of a Weight Based Sampler and of a Simple Random
Sampler. At each step only a sampler is used, in particular WBS is selected with
probability d, instead SRS is selected with probability 1-d.
Once the initial sample is built, the extraction probability is:
𝑞QK = 𝑑𝑤XYK𝑤XY\
+ (1 − 𝑑)1
o𝑁 − 𝑛^`Yq
where:
• qki is the probability to extract the test frame i in the step k;
• 𝑎Q is the active set at kth-step containing all sampled test frames with information
about outgoing links;
• N is the cardinality of test frames population;
• 𝑛^`Y is the sample dimension at kth-step.
The parameters to be defined are:
• d: between 0 and 1;
• 𝑤z{: static weights corresponding to the (i, j) element of the weight matrix.
The expression 𝑑abYcabYd
is representative of probability to take a unit with Weight Based
Sampler, in fact it selects the next test frames choosing among the ones linked with the
active set, with probability proportional to the link weight.
If there are no outgoing links from the active set, next test frame must be selected with the
Simple Random Sampler.
After the testing strategy definition, it is important to introduce how the reliability of the
system is estimate.
2.2.4 Estimation
The estimation represents a crucial problem, in fact the estimation in sampling with
varying probability is still a problem. For instance, many solutions are reported in [32] and
Reliability Assessment of Microservice Architectures
27
[33].
Before introducing the estimation problem, it is necessary to individuate the variable to be
estimated. With this testing strategy a test case is selected from each test frame, to estimate
the reliability the variable 𝑥K = 𝑝K𝑦K is considered, where pi is the occurrence probability
(the probability that at run time an input will be taken from corresponding test frame) and
yi is the outcome of current test case (1 in case of failure, 0 otherwise). This variable is
useful to estimate the unreliabilty, defined as 𝜙 = ∑ 𝑥KfKki , and consequentially to obtain
the reliability R=1-𝜙.
The estimator used in this case is an adaptation of the Estimator Based on Conditional
Selection Problem described in [2]. This estimator is presented in two versions, the first is
relative to an initial sample dimension greater than one, the second is particularized in case
of an initial sample with unitary dimension.
Estimation with initial sample dimension greater than one
The Reliability estimation is performed in three steps. At first step, the total estimator of
SRSWR is considered for the initial sample:
𝑡^i~ =𝑁𝑛 � 𝑥K
g�?U
Kki
=𝑁𝑛 � 𝑥K
g�?U
Kki
.
In the following step the estimator zi is considered as total estimator of x: 𝑧K =� 𝑥P +
𝑥K𝑞QKP�^`Y
=� 𝑝P𝑦P +𝑝K𝑦K𝑞QKP�^`Y
Finally, the unreliability is calculated as:
𝜙� =1𝑛�𝑛i𝑡^i~ + � 𝑧K
g?U
Kkg�
�
and the reliability is calculated as 𝑅� = 1 − 𝜙�.
To calculate the analytic variance, it is presented the variance estimator [2]:
𝑣𝑎𝑟~ o𝜙�q = Lg�gRV𝑣U + L
g?g�gRV𝑣V
where:
- 𝑣U = Lf?g�fg�
R 𝑣i, where v0 is the sample variance of the initial sample.
- 𝑣V = ∑(�c?�̅)�
(g?g�)(g?g�?U)f�g?UKkg� , where 𝑧̅ = ∑ �c
(g?g�)g?UKkg� .
Reliability Assessment of Microservice Architectures
28
𝑣iis calculated as described in [31]:
𝑣i =U
g�?U∑ (𝑧K − 𝑚(𝑧))Vg�?UKki , where 𝑚(𝑧) = U
g�∑ 𝑧Kg�?UKki .
Estimation with initial sample dimension unitary
This case is a particularization of the previous case, in fact the total estimator of initial
sample is
𝑡^i~ =𝑁𝑛 𝑥i =
𝑁𝑛 𝑝i𝑦i,
zi is defined as:
𝑧K =� 𝑥P +𝑥K𝑞QKP�^`Y
=� 𝑝P𝑦P +𝑝K𝑦K𝑞QKP�^`Y
,
Finally, the unreliability is calculated as:
𝜙� =1𝑛�𝑡^i~ + � 𝑧K
g?U
Kkg�
�
with the reliability calculated as 𝑅� = 1 − 𝜙�.
The variance estimator results simpler than the one used previously:
𝑣𝑎𝑟~ o𝜙�q = ��(𝑧K − 𝑡^i~)V
𝑛(𝑛 − 1)𝑁V�g?U
Kki
2.2.5 Active Set Update
In this Section there is a focus on Active set update operations, this procedure is relevant to
understand how the selected test frame affects the next sampling step.
Information about links are stored in an array, where each position represents a test frame,
and the contained value is 0, if there is no outgoing link from the Active Set to it,
otherwise there is the link weight. Each time that a test frame is selected, the information
about outgoing links from the Active Set is updated:
1. The relative row to the selected test frame is taken from weight matrix.
2. Each element of this row is summed with homologues values in Active Set
outgoing links array.
3. As the testing technique is without replacement, there are no self-links, as a
consequence, in the test frames cells already selected the value 0 is stored.
Reliability Assessment of Microservice Architectures
29
4. All values of Active Set outgoing links array are normalized.
At each step, when the new test frame is taken with Weight Based Sampler, it is selected
from test frames linked to active set with a probability proportional to values defined in
active set outgoing links array (test frames with bigger weights are taken with greater
probability). If there are no outgoing links from the active set the next test frame is
selected with Simple Random Sampling. 2.2.6 Algorithm implementation
The classes used to implement the described algorithm are represented in Figure 6.
Figure 6: Class Diagram
The testing strategy has been implemented in two different versions: in Figure 7 the
version with unitary initial sample is described, instead, in Figure 8 there is the version
with variable initial sample dimension. The latter has the objective to provide to weight
based sampling a more representative Active Set: test frames (and consequentially test
cases) are selected as long as a failure is taken or a limit is reached (maxInitialSampleSize,
that in our case is set at 25% of n).
Reliability Assessment of Microservice Architectures
30
Figure 7: Implementation with unitary initial sample
Reliability Assessment of Microservice Architectures
31
Figure 8: Implementation with variable initial sample
Example
To well understand how test generation algorithm of MART works, an example based on a
dummy test frames network is presented. In this case an initial sample formed by only a
test frame is considered (first implementation).
Reliability Assessment of Microservice Architectures
32
Table 4: test frames with attached failure probability and occurrence probability
TestFrame Failure Probability
Occurrence
probability
Login(InputUsername1, InputPassword1) 0 0.05
Login(InputUsername2, InputPassword1) 0 0.05
Login(InputUsername3, InputPassword1) 0 0.05
Login(InputUsername1, InputPassword2) 0 0.05
Login(InputUsername2, InputPassword2) 0 0.05
Login(InputUsername3, InputPassword2) 0 0.05
Login(InputUsername1, InputPassword3) 1 0.05
Login(InputUsername2, InputPassword3) 0 0.05
Login(InputUsername3, InputPassword3) 0 0.05
Login(InputUsername1, InputPassword4) 0 0.05
Login(InputUsername2, InputPassword4) 0 0.05
Login(InputUsername3, InputPassword4) 0 0.05
Login(InputUsername1, InputPassword5) 0 0.05
Login(InputUsername2, InputPassword5) 0 0.05
Login(InputUsername3, InputPassword5) 1 0.05
Login(InputUsername4, InputPassword1) 0 0.05
Login(InputUsername4, InputPassword2) 0 0.05
Login(InputUsername4, InputPassword3) 1 0.05
Login(InputUsername4, InputPassword4) 1 0.05
Login(InputUsername4, InputPassword5) 1 0.05
In Table 4 all obtained test frames are represented: the failure probability of each one is a
binary value (0 or 1), this results from a perfect partitioning of test cases; the occurrence
probability is the same for each test frame. The distance between test frames is calculated
as the difference between their signature (Hamming distance). The resulting network is
Reliability Assessment of Microservice Architectures
33
represented in Figure 9. Recalling that the failure probability is calculated as the joint
probability described in Section 2.2, there are links only between test frames with failure
probability 1.
Assuming to run the algorithm with d = 0.8, two possible executions are presented.
Execution 1: in this execution, represented in Figure 10, it is possible to evaluate how the
testing strategy works in case of first sample is a failing point, in this case 4/5 failing point
are taken.
Figure 9: Example Network
Reliability Assessment of Microservice Architectures
34
Figure 10: First sampled test frame is failing
Execution 2: in this execution, represented in Figure 11, it is possible to evaluate the
testing strategy when a non-failing point is taken in the first sample. In this case there is a
Random Selection of samples until a failing sample is not taken, this choice derives from
the absence of outgoing links from the Active Set in second and third step.
Figure 11: First sampled test frame is not failing
Reliability Assessment of Microservice Architectures
35
2.3 Probability Update The testing algorithm exploits the updated information about the application under test.
This information, as described in previous section, is encoded through probabilities.
The operational profile is defined as the set of occurrence probability (OP) assigned to
each test frame. These values are such that ∑ 𝑂𝑃KK = 1; they represent the probability that
a “Client” performs a test case contained in the ith test frame. On the other hand, failure
probability is a value between 0 and 1 attached to each test frame. It encodes the
probability that a test case, taken from this set, exposes a failure.
The operational profile and failure probability update of each test frame are calculated at
run time evaluating the couple request-response generated from the system under test.
For this purpose, monitoring tools can be used, like MetroFunnel [34], that has been
modified to obtain the payload also, so that to monitor properly the examined system.
The operational profile update is realized using a sliding window of size “MAX”, that
represents the maximum number of test cases that can be considered for this purpose. The
expression that regulates the operational profile update is the following:
𝑂𝑃g�aK = 𝑂𝑃���K ∗ �𝑝 + (1 − 𝑝) ∗ L1 −𝑥
𝑀𝐴𝑋R� + 𝑂𝑃nX�K ∗ (1 − 𝑝) ∗ L
𝑥𝑀𝐴𝑋R
Where:
• 𝑂𝑃���K : it is the occurrence probability of the ith test frame in the previous update
step (at first execution it is the one specified from tester);
• 𝑂𝑃nX�K : it is the occurrence probability calculated with a frequentistic approach, it is
the ratio between failed test cases and executed test cases in the ith test frame;
• 𝑝: it is a value between 0 and 1, that represents the minimum percentage of history
that is preserved in the update operations (in the examined case this value is 50%);
• 𝑥: it is the number of executed test cases (maximum MAX).
This approach guarantees that the sum of occurrence probabilities values is unitary, in
fact:
�𝑂𝑃���KK
= 1𝑎𝑛𝑑�𝑂𝑃nX�KK
= 1.
This technique guarantees that the changes in the true operational profile are seen from
Reliability Assessment of Microservice Architectures
36
the update function in few steps. This property is the reason of sliding window using, in
fact, if the entire history is considered, the changes of true operational profile are more
difficult to see.
The mechanism described is an implementation of the feedback mechanism (Figure 12),
that is a feature of agile development; in fact, there is the effective possibility to cyclically
update the operational profile during the system execution.
Figure 12: Operational Profile Feedback
The failure probability update of each test frame is implemented in the same way:
𝐹𝑃g�aK = 𝐹𝑃���K ∗ �𝑝 + (1 − 𝑝) ∗ L1 −𝑥
𝑀𝐴𝑋R� + 𝐹𝑃nX�K ∗ (1 − 𝑝) ∗ L
𝑥𝑀𝐴𝑋R
Where FP is the failure probability.
Other update approaches can be used, like the black-box ones, where adjustments to the
frequentist or Bayesian estimators are done at a profile changes, or the white-box approach
where the control flow transfer among components is captured. However, investigating the
best update strategy is outside the scope of this work and matter of future work.
2.4 Formulation with dynamic sampler selection In this Section another version of test generation algorithm of MART is described, in
particular it is a pseudo-adaptive version, in which an initial value of d named d0 (with
𝑑0 ∈ [0.5, 1[) is specified. It encodes the trust that tester has in the WBS compared with
Reliability Assessment of Microservice Architectures
37
SRS.
The idea is based on the recent history of sampling for each of two techniques, like in the
algorithms used for processors’ branch prediction. The architecture considered for this
purpose is described in Figure 13.
Figure 13: Dynamic d logic
In this approach there is the possibility to codify the remaining part of d (that is 1-d0) into
the different shift register cells. The idea is to attribute this quantity in descending order,
for example, the configuration for a shift register of dimension four is represented in
Figure 14.
At each execution of relative sampler, values in each cell are shifted on the left, with the
new value inserted at position 0. The shift register update logic assures that d is always
Figure 14: Sampler Shift Register
Reliability Assessment of Microservice Architectures
38
between [0,1]: it takes the maximum value (1) when the WBS register is full and the other
one is empty; instead, it takes the minimum value (d0-(1-d0)) when the SRS register is full
and the other one is empty.
The estimator used in this approach is the same estimator used in the static formulation.
Reliability Assessment of Microservice Architectures
39
Chapter 3: Simulation of the test generation algorithm
This Chapter focuses on evaluating, by simulation, the test generation algorithm. The aim
is to tune parameters in the two versions of the test generation algorithm of MART (with
static and dynamic d), then evaluating their performance compared to the Simple Random
Sampling Without Replacement.
3.1 Simulation Scenarios To obtain the simulation scenarios, it is necessary to consider three main factors about the
population and the problem size:
• Type of partitioning: it represents how test cases are partitioned in test frame. This
information is encoded as the test frame’s failure probability, in particular in a
Correct/Failing value, for which three settings are considered: 0/1, 0.25/0.75 and
0.1/0.9. This probability is necessary to determine the weight associated with each
network’s link. In case 0/1, a test frame believed to be failing (correct) contains
only failing (correct) test cases, hence its failure probability (as proportion of
failing test cases) is 1 (0). This refers to a Perfect Partitioning, instead, if the
failure test frames are organized in clusters, a Perfect Clustered Partitioning is
taken into account; the couples 0.25/0.75 and 0.1/0.9 mean that the partitioning is
not accurate, because failures are distributed in all test frames. These probabilities
generate the Close to Uniform Partitioning (for 0.25/0.75) and Close to Perfect
Partitioning (0.1/0.9), if failure test frames are organized in cluster, the population
distribution is Clustered.
• Failing test frame proportion: his is the proportion of the failing test frames over
Reliability Assessment of Microservice Architectures
40
the total, for which we consider two values, 0.1 and 0.2.
• Total number of test frames (N): two order of magnitudes are tried: N = 100, N =
1000.
The combination of the first two factors generates 12 different configurations, shown in
Table 5. Then, a completely uniform population distribution is added, in which the failure
probability of each test frame is obtained as a random value between 0 and 1; as a
consequence, failures are uniformly distributed between test frames (this represents an
ideal case). Table 5: Configurations
Configurations Type of partitioning Failing test frame
proportion
1 Uniform (Random) 0.5
2 Close to uniform (0.25/0.75) 0.1
3 Close to perfect partitioning (0.1/0.9) 0.1
4 Clustered (0.25/0.75) 0.1
5 Clustered (0.1/0.9) 0.1
6 Perfect 0.1
7 Perfect Clustered (1/0) 0.1
8 Close to uniform (0.25/0.75) 0.2
9 Close to perfect partitioning (0.1/0.9) 0.2
10 Clustered (0.25/0.75) 0.2
11 Clustered (0.1/0.9) 0.2
12 Perfect 0.2
13 Perfect Clustered 0.2
26 different scenarios are obtained adding to the combinations the last factor, N.
Assessment is made at 9 checkpoints: n1 = 0.1N, n2 = 0.2N, ... 0.9N.
For the described scenarios, a uniform operational profile is adopted. The populations
Reliability Assessment of Microservice Architectures
41
from 1 to 7 are defined in ascending order of compatibility with testing strategy (from
worst to best case). The same consideration also applies to populations from 8 to 13, but
with different failure distribution.
3.1.1 Population generators
Different population distributions are generated using the following functions:
• generatePopulationAndMatrix: to generate a random set of test frames with
random failure probability, random distance factor (between 0 and maxdistance);
occurrence probabilities can be random or equiprobable.
• generatePopulationAndMatrixBinary: to generate a random set of test frames with
failure probability chosen randomly between two selected values ((0/1), (0.1/0.9),
(0.25/0.75)), respecting a failure proportion (proportion of values with the high
probability of failure on the total). A random distance factor (between 0 and
maxdistance) and the probability of occurrence (random or equiprobable) are also
generated;
• generatePopulationAndMatrixCluster: to generate a set of test frames with a policy
defined below, a random distance factor (between 0 and maxdistance) and the
probability of occurrence can be random or equiprobable.
The steps used to determine the failure probability of N test frames set with clustered
distribution are the following:
1. A t% of test frames is considered as failing, the t%-th part of N is called X:
2. The lowest failure probability is assigned to all points;
3. Each Cluster of failure points is made by a certain percentage (in this case 10% and
20% are considered) of X’s cardinality, called T;
4. |�||�|
points are chosen randomly as centroids;
5. Finally, for each centroid T minimum distance test frames are chosen assigning
them the maximum failure probability.
Reliability Assessment of Microservice Architectures
42
3.2 Evaluation Criteria Accuracy and efficiency are considered as evaluation criteria estimated as follows. A
simulation scenario j is repeated 100 times; denote with r one of such repetitions. At the
end of each repetition, the reliability estimates 𝑅I,�� is computed by the technique under
assessment as well as the true reliability 𝑅P. For simulation, it is known in advance which
input t is a failure point (hence, 𝑅P = ∑ 𝑝A𝛿AA∈� , where 𝛿A is 1 if the input is a failure point
and 0 otherwise).
For each scenario j, the sample mean (denoted as M), sample variance (var) and mean
squared error (MSE) are computed:
• 𝑀(𝑅�� ) = UUii
o∑ 𝑅I,��UiiKki q;
• 𝑀𝑆𝐸(𝑅�� ) = UUii
o∑ (𝑅I,�� − 𝑅P)VUiiKki q;
• 𝑣𝑎𝑟(𝑅�� ) = U
Uii?U∑ (𝑅I,�� −𝑀(𝑅�� ))VUiiKkU ;
Comparison of estimation accuracy is done by looking at the MSE. Comparison of
efficiency is done by the sample variance. Lastly, the average number of detected failing
points (NFP) is also considered.
3.3 Empirical correction of Estimator The simulations showed that for small values of n, zi at each pass is strongly influenced
both by the order in which test frames are selected and by the number of taken failures.
This behavior is a consequence of the working conditions foreseen for MART: since it is
conceived to work under a scarce testing budget, the formulation considers at most one
test case per test frame. Unreliability is calculated as 𝜙� = ∑ 𝑝K𝑦Kf?UKki ,where 𝑦K is a binary
value, because only a test case is taken from each test frame. A consequence is the
possibility that a test case taken from a test frame with “high” failure probability (ex.: 0.9)
could not fail and that a test case taken from a test frame with “low” failure probability
(ex.: 0.1) could fail, causing underestimation or overestimation respectively. This
phenomenon is more evident when are considered small values of n.
For the discussed problem it is more dangerous have a reliability overestimation than an
Reliability Assessment of Microservice Architectures
43
underestimation, thus the idea is to adjust this value for avoid this condition.
The estimate defined in previous chapter is based on the mean value of the estimates
calculated with zi, influenced by the presence of "outliers”. This presence is very heavy
when small values of n are considered, but the mean is still the most representative value
of the set.
The simulation shows that testing strategy takes test frames with the highest failure
probability in the initial part of the testing, this implies that the probability to obtain an
underestimation of unreliability (overestimation of reliability) increases.
The idea is to calculate the mean of estimated value in a single algorithm execution and
use it to consider only:
• values that differing not later than 90% by mean in overestimation;
• values that differing not later than 10% by mean in underestimation.
In other words, a window with two limit values is considered: the upper bound is the mean
plus 90% of mean, the lower bound is mean minus 10% of mean.
This idea does not influence the unbiasedness of the estimators; in fact, this operation
consists in calculating the mean choosing values that better representing the examined
population. This adjustment is particularly relevant for small values of n, for which
outliers are more impacting.
The final solution consists of a combination of two explained estimators, the choice
depends on n value. It is fundamental to define what small or big values of n mean. It is
observed that the adjusted estimator has good performance when n is between 0 and
30/40%. This consideration leads to divide n values as in Figure 15, where: for 0-34% n is
small and the adjusted estimator is used; for 66-100% n is big and the estimators defined
in previous chapter are used.
For n among 34% and 66% a linear combination of both estimators is used. This linear
Figure 15: Different Estimator usage
Reliability Assessment of Microservice Architectures
44
combination is realized using a “coefficient of correction” defined as: 𝑐𝑜𝑐 = (𝑛%−
0.34) ∗ 3 and the estimation is calculated as:
𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = (1 − 𝑐o𝑐) ∗ 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 + 𝑐𝑜𝑐 ∗ 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛.
Figure 16 shows the trend of MSE, in particular it converges to 0 when n increases. For
this purpose, configurations 1, 2 and 10 are considered.
(a) Configuration 1
(b) Configuration 2
Reliability Assessment of Microservice Architectures
45
(c) Configuration 10
Figure 16: Results of the adjusted estimator based on MSE
3.4 Sensitivity Analysis 3.4.1 Sensitivity Analysis in static implementation
The chosen value of d is taken comparing simulation results, which are obtained
considering these four values: 0.2, 0.4, 0.6, 0.8. These represent the trust in the weight
based sampling compared to the simple random sampling.
Sensitivity analysis considers only five configurations 1, 2, 8, 9, which represent extreme
cases, and configuration 12, as a best case example.
The evaluation criteria are MSE and Sample Variance.
(a) Configuration 1
Reliability Assessment of Microservice Architectures
46
(b) Configuration 2
(c) Configuration 8
Reliability Assessment of Microservice Architectures
47
(d) Configuration 9
(e) Configuration 12
Figure 17: MSE comparison between d = 0.2, d = 0.4, d = 0.6 and d = 0.8, considering the most significant configurations
From results shown in Figure 17, the best value of d about MSE is 0.8, in particular for
small values of n.
Reliability Assessment of Microservice Architectures
48
(a) Configuration 1
(b) Configuration 2
Reliability Assessment of Microservice Architectures
49
(c) Configuration 8
(d) Configuration 9
Reliability Assessment of Microservice Architectures
50
(e) Configuration 12
Figure 18: Sample Variance comparison between d = 0.2, d = 0.4, d = 0.6 and d = 0.8, considering the most significant configurations
As shown in Figure 18, the sample variance is slightly influenced by the d value in the
different configurations; in fact, except few cases, the variance values have the same order
of magnitude. In configurations 1 and 2, better sample variance values are observed for d
= 0.2 and d = 0.4, but, in configuration 9 and 12, better values are for d= 0.6 and d= 0.8.
The chosen value of d is 0.8, because it has the best trade-off between MSE and variance.
3.4.2 Sensitivity Analysis in dynamic implementation
In this case, sensitivity analysis is performed for two different values: d0 (value between
0.5 and 0.9 with step 0.1) and the shift register size (3, 4 and 5).
Sensitivity Analysis on d0
The chosen value of d0 is taken comparing simulation results, which are obtained
considering five different values between 0.5 and 0.9 with step 0.1. MSE, Sample
Variance and Number of Failing Point (NFP) are the used evaluation criteria for the
sensitivity analysis.
Sensitivity analysis is done considering only four configurations 1, 2, 8, 9, that represent
the limit cases.
Reliability Assessment of Microservice Architectures
51
As shown in Tables 19, 20 and 21, it is possible to observe three different trends:
• MSE has a downward trend with the increment of d0;
• Sample Variance is more or less growing with the increment of d0;
• NFP is almost constant for all d0 values.
Considerations about the limit cases are also interesting, the first subset, in fact there are
substantial differences for different values of d0.
(a) Configuration 1
(b) Configuration 2
Reliability Assessment of Microservice Architectures
52
(c) Configuration 8
(d) Configuration 9 Figure 19: Sensitivity Analysis on d0 for MSE, where Ad1 is the implementation with dynamic d and initial sample unitary, instead,
Ad2 is the implementation with dynamic d and variable initial sample
How previously described, the MSE has a downward trend with the increment of d0. As
shown in Figure 19 the best values of d0 are 0.8 and 0.9.
Reliability Assessment of Microservice Architectures
53
(a) Configuration 1
(b) Configuration 2
(c) Configuration 8
Reliability Assessment of Microservice Architectures
54
(d) Configuration 9 Figure 20: Sensitivity Analysis on d0 for Sample Variance, where Ad1 is the implementation with dynamic d and initial sample
unitary, instead, Ad2 is the implementation with dynamic d and variable initial sample
In Figure 20 it is shown that Sample Variance is better for small values of d0, instead, 0.9
is the worst case.
(a) Configuration 1
Reliability Assessment of Microservice Architectures
55
(b) Configuration 2
(c) Configuration 8
(d) Configuration 9
Figure 21: Sensitivity Analysis on d0 for NFP, where Ad1 is the implementation with dynamic d and initial sample unitary, instead, Ad2 is the implementation with dynamic d and variable initial sample
Reliability Assessment of Microservice Architectures
56
In case of NFP performances, shown in Figure 21, are more or less the same for each
value of d0.
Considering all observations, the selected value of d0 is 0.8, that offers a good trade-off
between MSE and Sample Variance.
Sensitivity Analysis on Shift Register
The analysis is carried out on three values of shift registers’ dimension: 3, 4 and 5.
The values associated with each cell are organized as in Figure 22.
Figure 22: Shift registers values
As the previous case, simulations are performed for configurations 1, 2, 8 and 9. About
MSE and Sample Variance, the three configuration are few different.
In case of MSE the better value of Shift Register’s dimension is 4, because it gives better
performances also in the limit cases.
(a) Configuration 1
Reliability Assessment of Microservice Architectures
57
(b) Configuration 2
(c) Configuration 8
(d) Configuration 9
Figure 23: Sensitivity Analysis on Shift Register dimension for MSE
As shown in Figure 23, in these cases performances are better in cases SR = 3 and 4.
As in the case of MSE, Sample Variance is more or less the same in all cases, except the
limit cases.
Reliability Assessment of Microservice Architectures
58
(a) Configuration 1
(b) Configuration 2
(c) Configuration 8
Reliability Assessment of Microservice Architectures
59
(d) Configuration 9
Figure 24: Sensitivity Analysis on Shift Register dimension for Sample Variance
Observing Figure 24, the performances of SR=5 are better, instead, values 4 and 3 show
the same performances. It is important to note that the differences are the third decimal
digit, hence very small. The chosen value is SR=4, because is the best value in case of
MSE, with acceptable values for the variance (2nd best value).
3.5 Results Results are based on five different variants of the MART algorithm plus the SRS case:
1. Static d with n0=1 (1);
2. Static d with n0≥1 (2);
3. Simple Random Sampling (SRS);
4. Dynamic d with n0=1 (Ad1);
5. Dynamic d with n0≥1 (Ad2). 3.5.1 MSE
To evaluate the difference between the three different approaches it is possible observe the
histograms in Figure 25. The evaluation of MSE is important to make considerations on
how the obtained values deviate from the mean.
Reliability Assessment of Microservice Architectures
60
(a) Configuration 1
(b) Configuration 2
Reliability Assessment of Microservice Architectures
61
(c) Configuration 3
(d) Configuration 4
Reliability Assessment of Microservice Architectures
62
(e) Configuration 5
(f) Configuration 6
Reliability Assessment of Microservice Architectures
63
(g) Configuration 7
(h) Configuration 8
Reliability Assessment of Microservice Architectures
64
(i) Configuration 9
(j) Configuration 10
Reliability Assessment of Microservice Architectures
65
(k) Configuration 11
(l) Configuration 12
Reliability Assessment of Microservice Architectures
66
(m) Configuration 13
Figure 25: MSE simulation results for each Configuration
In Figures (a), (b) and (h) SRS is better than other algorithms, this result depends on
configurations nature, in fact configuration 1 corresponds to a uniform distribution of
failures across partitions; configurations 2 and 8, as defined in Section 3.1, are close to a
uniform failure distribution.
In all other case SRS has worse results, in particular for small values of n the algorithm
with the initial unitary sample (1) is the best.
3.5.2 Sample Variance
Sample Variance is important to evaluate the goodness of estimate; in particular it
underline how much estimate is representative of population mean. All results are shown
in Figure 26.
Reliability Assessment of Microservice Architectures
67
(a) Configuration 1
(b) Configuration 2
Reliability Assessment of Microservice Architectures
68
(c) Configuration 3
(d) Configuration 4
Reliability Assessment of Microservice Architectures
69
(e) Configuration 5
(f) Configuration 6
Reliability Assessment of Microservice Architectures
70
(g) Configuration 7
(h) Configuration 8
Reliability Assessment of Microservice Architectures
71
(i) Configuration 9
(j) Configuration 10
Reliability Assessment of Microservice Architectures
72
(k) Configuration 11
(l) Configuration 12
Reliability Assessment of Microservice Architectures
73
(m) Configuration 13
Figure 26: Sample Variance simulation results for each Configuration
In Figure 26 for SRS all configurations present the worse values of sample variance. The
results of the others algorithms are very close to each other, except in Figures (a), (c) and
(h), in which the static techniques are better for small values of n.
3.5.3 Failing Point Number
This quantity explains the trend of different techniques to expose failures, all results are
shown in Figure 27.
Reliability Assessment of Microservice Architectures
74
(a) Configuration 1
(b) Configuration 2
Reliability Assessment of Microservice Architectures
75
(c) Configuration 3
(d) Configuration 4
Reliability Assessment of Microservice Architectures
76
(e) Configuration 5
(f) Configuration 6
Reliability Assessment of Microservice Architectures
77
(g) Configuration 7
(h) Configuration 8
Reliability Assessment of Microservice Architectures
78
(i) Configuration 9
(j) Configuration 10
Reliability Assessment of Microservice Architectures
79
(k) Configuration 11
(l) Configuration 12
Reliability Assessment of Microservice Architectures
80
(m) Configuration 13
Figure 27: NFP simulation results for each Configuration
In Figure 27 all configurations present the worse values for SRS, in Figures (a), (b) e (h) it
presents its best values, that are however worse than other techniques ones. About NFP
values related to the other four techniques are the same across all configurations.
3.5.4 Considerations
The first consideration is about efficiency and accuracy of SRS compared with the four
different testing strategy versions. For this purpose, configurations 1, 2 and 8, represented
in graphs (a), (b) and (h), are considered. In these configurations MSE is better in SRS, this
result depends on the uniform distribution of failures (or close to uniform distribution in
case 0.25/0.75), that is an ideal case away from the real world. On the other hand, SRS is
worse than four versions of test generation algorithm of MART about sample variance,
where the difference is very strong.
For all other configurations our techniques are better than SRS both about sample variance
and except for few isolated point about MSE.
At last the NFP values are considered, where the four implementations of test generation
algorithm are globally better.
Reliability Assessment of Microservice Architectures
81
Now is useful to verify what is the better between four versions of test generation
algorithm of MART. The first step is the comparison between the different “initial sample
dimensions”. Results show that two techniques with unitary initial sample are globally
better, both for MSE than for sample variance.
This consideration leads to the comparison between technique with static d and technique
with dynamic d both with unitary sample size, knowing that the differences are very slight.
For MSE, values are more or less the same, in some configuration i.e. 5 the dynamic d is
better, instead, there are other configurations, i.e. 6 in which the static d is better. For
Variance the static d is meanly better.
In conclusion, the test generation algorithm of MART with static d is considered better, not
only for Sample Variance, but also for its simpler formulation than dynamic one.
Reliability Assessment of Microservice Architectures
82
Chapter 4: Experimentation
The objectives of experimentation are evaluated with the application of MART to a real
system, they are:
• To demonstrate the validity of update operations, considering operational profiles
with different distances from the true one;
• To verify the advantage of using MART rather than operational testing;
• To verify the behavior of MART when there is a true operational profile change.
From simulation, a set of desirable working conditions come out, under which the MART
technique is expected to work better than SRS.
• Clusterized failures;
• Scarce testing budget;
• Good belief of tester about the system: probability values assigned by tester are
close to real ones;
• Good partitioning: each partition is made mostly by failing or not failing test case.
These will be considered in the interpretation of the experimental results.
4.1 Pet Clinic The considered application for the experimentation is Pet Clinic [35], a system to manage
owners, pets, vets and visits of a veterinarian clinic. This microservice architecture is
based on Spring Cloud Netflix technology [7], that gives the necessary integration between
Spring Environment and Netflix OSS [6].
The architecture includes the following services:
• Admin Server (Spring Boot Admin): is a project to manage and monitor Spring
Boot applications [36].
Reliability Assessment of Microservice Architectures
83
• Api Gateway Application (Zuul Sever): Zuul is the front door for all requests from
devices and web sites to the backend of the Netflix streaming application [37]. As
an edge service application, Zuul is built to enable dynamic routing, monitoring,
resiliency and security. It also has the ability to route requests to multiple Amazon
Auto Scaling Groups as appropriate.
• ConfigServerApplication: The Server provides an HTTP, resource-based API for
external configuration (name-value pairs, or equivalent YAML content) [38].
• CustomersServiceApplication, VetsServiceApplication, VisitsServiceApplication:
microservices for owners, pets, vets and visits management.
• DiscoveryServerApplication (Eureka Server): Eureka is a REST (Representational
State Transfer) based service that is primarily used in the AWS cloud for locating
services for the purpose of load balancing and failover of middle-tier servers [39].
• Monitoring (AspectJ): for the monitoring of determinate “aspects” [40].
• Tracing Server (Zipkin): Zipkin is a distributed tracing system. It helps gather
timing data needed to troubleshoot latency problems in microservice architectures.
It manages both the collection and lookup of this data [41].
The system is considered in a reduced version, because the admin server and the tracing
server are not useful for experimentation objectives.
All Pet Clinic services are executed locally with Docker [42], that is an open-source
project for the system’s deployment automation, with the use of Linux containers.
4.2 MART setup The Setup stage consists in defining all necessary information to the algorithm execution.
The representative structure of test frames is characterized by the following fields:
• Service: name of the microservice that exposes the method;
• Distance Factor: integer number to encode the distance between test frames owned
by the same Service;
• Linked Service: a way to encode logic links between different Services;
Reliability Assessment of Microservice Architectures
84
• test frame: the name of test frame, represented as encoded URL, built as the
Cartesian product of Input Classes;
• Type: REST requests type (GET, POST, …);
• Failure Probability: probability that a test case, taken from his test frame, expose a
failure;
• Occurrence Probability: probability that one of test cases included, in the test
frame, is run in the system execution;
• Payload: coding of the payload associated with a method, represented with JSON.
All information must be defined to derive automatically all necessary data structures for
MART execution.
4.2.1 Weight Matrix determination
To obtain the weight matrix automatically, the coding of the “distance information” is
defined on three fields:
• The field Service determines if there is a link between test frames depending on
service that belong them (the assumption that there are no self-links is guaranteed);
• The field Distance Factor determines the distance between test frames for which a
link is defined;
• The field Linked Service determines if there is a link of fixed weight (in this case it
is 2) between methods of different services.
All other operations, carried out for weight matrix definition, are the same described in
subsection 2.2.2, respecting the self-links absence and the inverse proportionality of
weights compared to the distance.
4.2.2 Test frame definition
As described in the subsection 2.2.1, test frames are directly dependent from Input
Classes; the latter are described to realize test cases partitioning. This representation is
realized using the following fields:
Reliability Assessment of Microservice Architectures
85
• Name: instance name of an input class;
• Type: type of input, that can be chosen from different values;
• Min: this field interpretation depends on the specified type;
• Max: this field interpretation depends on the specified type.
Input type and values are encoded in the filed Type. Possible values are:
• Range: integer value between Min and Max;
• Lower: integer value lower than Min;
• Greater: integer value greater than Min;
• Different: every value of type that differs from the one specified in Min;
• Symbol: special character specified in Min;
• S_range: alphanumeric string with length between Min and Max;
• S_greater: alphanumeric string with length greater than Min;
• Empty: empty input;
• N_range: numeric string with length between Min and Max;
• N_greater: numeric string with length greater than Min.
In this implementation, test cases are randomly generated using input obtained by different
Input Classes instances.
4.3 Functions Each necessary operation for experimentation execution is implemented by a function. All
functions are collected in a Java application. Considering that Pet Clinic is an application
REST based, functions are implemented as a Client that sends requests and receives
responses; it evaluates them and calculates the objective values. The examined system is
an interactive application, this means that, to observe their behavior, services have to be
stimulated externally. In the following subsections the functions implementation is
described.
4.3.1 True Reliability calculation function
The first step is to calculate the true operational profile (the operational profile is the set
Reliability Assessment of Microservice Architectures
86
of occurrence probabilities attached to each test frame): firstly, there is the execution of
10.000 test cases collecting each one outcome. After the True Reliability is calculated as
ratio between the number of failed test cases (F) and the total number of executed test
cases (T):
𝑇𝑟𝑢𝑒𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝐹𝑇
Figure 28: True Reliability Function
The True Reliability calculation function is described in Figure 28. In the implemented
function all requests are made by a single Client.
4.3.2 Update functions
The run-time assessment requires the ability of monitoring and updating the usage profile
and the failure probability for each test frame. Common monitoring tools can be used to
Reliability Assessment of Microservice Architectures
87
gather data, such as Wireshark, Amazon Cloudwatch, Nagios. In this case, a tool called
MetroFunnel [34], tailored for microservice applications, is customized for defined
purpose, adding payload information. In alternative, because of system interactive nature,
all the workload can be generated by a unique Client, in this way it is simple to collect
request and responses in a single point.
The probabilities update is implemented, as described in Section 2.5, in Loop Requests and
Monitoring Parsing functions, where the first is based on a single Client idea, the second
implements the log parsing with modified Metro Funnel implementation.
4.3.3 Reliability Assessment function
In this paragraph the reliability assessment function is described, in which MART is
applied to obtain the Reliability of System. It is necessary, to run this algorithm, to execute
the steps described in Chapter 2:
• Test cases domain is partitioned in test frames, this is realized considering the
different methods and the Cartesian product between the input classes;
• the second step consists of assigning to each test frame both occurrence
probability and failure probability.
After these two steps, the reliability assessment is executed.
In previous chapters, different versions of the test generation algorithm of MART are
presented, but the experimentation is based on the first version, with one unitary initial
sample and static d.
This procedure is implemented by reliability assessment by MART function.
4.3.4 Operational Testing function
A benchmark algorithm is considered to evaluate MART performance; in this case it is the
Operational Testing. The latter executes pseudo-random requests based on the current
operational profile and calculates Reliability in a frequentistic way, as in True Operational
Profile case:
𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 1 −𝐹𝑎𝑖𝑙𝑒𝑑𝑇𝑒𝑠𝑡𝐶𝑎𝑠𝑒𝑠𝑛𝑢𝑚𝑏𝑒𝑟𝐸𝑥𝑒𝑐𝑢𝑡𝑒𝑑𝑇𝑒𝑠𝑡𝐶𝑎𝑠𝑒𝑠𝑛𝑢𝑚𝑏𝑒𝑟
Reliability Assessment of Microservice Architectures
88
This procedure is implemented in reliability assessment by operational testing function.
4.3.5 Distance between profiles function
The distance between operational profiles is calculated as sum of scraps between the old
occurrence probability and the new one of homologous test frames.
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = � p𝑜𝑐𝑐𝑃𝑟𝑜𝑏K��� − 𝑜𝑐𝑐𝑃𝑟𝑜𝑏Kg�ap�¥hc¦§
Kki
where:
• 𝑇𝐹^K��: it is the cardinality of test frame set;
• 𝑜𝑐𝑐𝑃𝑟𝑜𝑏K���: occurrence probability of a test frame in the old operational profile;
• 𝑜𝑐𝑐𝑃𝑟𝑜𝑏Kg�a: occurrence probability of a test frame in the new operational profile.
This procedure is implemented by Distance Calculation function.
4.4 Experimental design 4.4.1 Experimental scenarios
To define experimental scenarios, test frames are analyzed to provide a realistic testing-
time characterization. Specifically, 30 test cases are executed for each test frame,
collecting outcomes. Results (in terms of failing/correct test cases) are a support to
distinguishing more or less failure-prone test frames. Based on them, test frames are
divided in three categories, and an initial failure probability is assigned to each one:
• First Category: 25 test frames which exhibited no failure. To each test frame of
this category is assigned an initial failure probability fi = ε = 0.01, with i denoting
the test frame. The ε (instead of assigning 0) represents the uncertainty due to the
limited number of observations;
• Second Category: 46 test frames, which failed at any of the 30 executions.
Specularly, the initial failure probability for these test frames are: fi = 1 − ε = 0.99;
• Third Category: 191 test frames, the rest of test frames, which failed sporadically.
Based on observed proportion of failures, approximately 1 failure each 10 requests,
the initial probability is set at: fi = 0.
Reliability Assessment of Microservice Architectures
89
To demonstrate that MART is able to see true operational profile variation, two different
"true profiles" are considered; they have been obtained assigning to each category a
percentage, that represents the probability to select an input from a test frame belonging to
this category:
• To build the first profile 80% is assigned to the First Category, 15% to the Second
Category, 5% to the Third Category;
• To build the second profile 55% is assigned to the First Category, 35% to the
Second Category, 10%, to the Third Category.
The probability attached to test frame in the same category is equal (there is
equiprobability inside categories).
The following factors are defined according to a Design of Experiment planning:
• technique: operational testing or MART;
• true profile: true operational profile;
• used profile: three operational profile, that differ from the first true operational
profile of 10%, 50% and 90%;
• n: number of executed test case expressed in percentage of test frames number, the
considered values are 20%, 40% and 70%;
• K: number of experiment repetitions, fixed to 30;
• step: number of different experiments executed in a session;
• update cycles: number of updates (consisting of 5000 test case) that are executed
between each step change.
In conclusion, four different experiments are considered: in the first three MART is
evaluated considering a used profile that differs from the true one of a certain quantity,
they are used to observe the update impact compared to the operational testing; in the
fourth there is the evaluation of MART in case of true operational profile changes.
The first three experiments consist of three steps and 1 update cycle for each step change.
These experiments refer to the first true operational profile (80%, 15%, 5%).
Reliability Assessment of Microservice Architectures
90
The fourth experiment consists of five steps and 1 update cycle for each step change. In
the latter assessment is evaluated considering: for the first two steps, the first true
operational profile; for the step 3, both profiles; for the last two, the second profile.
These combinations generate 42 different scenarios.
4.4.2 Evaluation criteria
As in simulation, accuracy and efficiency metrics are used. However, an experimental
scenario j is repeated 30 times instead of 100. At the end of each repetition r, the true
reliability 𝑅P is computed by preliminary running 10,000 test cases under the true profile
and using𝑅 = 1 − ¥�, with F being the number of observed failures and T=10,000. For
each scenario j, the MSE and Sample Variance are used as accuracy and efficiency
metrics.
The number of experiment runs are: (2 techniques x 3 profiles x 3 n x 30 repetitions) + (2
techniques x 1 profile x 3n x 30 repetitions) = 720 runs. The first addend refers to first
three experiments, the second to the fourth experiment.
For MART all runs of first addend are executed for three consecutive steps and all runs of
second for five steps, for a total of 1260 effective runs. For operational testing (applicable
only at step 1) there are: (15 simulation scenarios x 30 repetitions) = 360 runs, for a total
of 1620 effective runs.
4.4.3 True Reliability estimation
For the first profile, the true reliability is 0.9436 calculated on 10000 tests executed
independently, with 564 failed test cases.
For the second profile, the true reliability is 0.8979 calculated on 10000 tests executed
independently, with 1021 failed test cases.
As highlighted from the true reliability estimation, the reliability is strongly influenced by
the percentage of occurrence probability assigned to the third category, in fact the
unreliability is very close to this value. This result means that a good partitioning is
performed.
Reliability Assessment of Microservice Architectures
91
4.5 Results In this Section all experiments results are presented. The code necessary for experiments
execution is realized in JAVA. The experiments are performed on MacBook Pro 15'', Intel
Core i7 2.5 GHz CPU and 16 GB 1600 MHz DDR3 RAM, with java version "1.8.0_101"
and Docker version 17.12.0-ce, build c97c6d6.
4.5.1 Experiment 1
The operational profile considered in this experiment differs from the first true operational
profile of 10% and it is obtained considering the following input distribution:
• 75% for first category,
• 17.5% for second category,
• 7.5% for third category.
Figure 29: MSE of each technique at each Step, for different n%
Reliability Assessment of Microservice Architectures
92
Figure 30: Sample Variance of each technique at each Step, for different n%
The wrong operational profile is evaluated in the step 1: as shown in Figures 29 and 30
operational testing is worse than MART, both for MSE and for sample variance.
The considerations are more or less the same for all considered n values.
There is an increment only MSE in case of 40% for MART: this condition depends on
various factors, as the combination of two formulated estimators (see Section 3.3) and the
peculiar nature of algorithm, that exposes more consecutively failures causing the crash of
microservices and violating the requests independence. The latter is particularly evident
for big values of n: this demonstrates the power of MART, in fact when the independent
requests assumption is respected, there is an improvement of performance.
An update phase, in which there is the observation of generated traffic to obtain an
adjustment of the operational profile, is necessary to pass at next step. In this case, the
difference between the initial operational profile and the one obtained by the update is
0.07, with an improvement of 3%.
In step 2 the superiority is even more evident both for MSE, then for Sample Variance, in
fact results of the operational testing are the same obtained in the first step; this
consideration is a direct consequence of the update function, that is absent in the
operational testing technique.
Reliability Assessment of Microservice Architectures
93
In step 3 the difference between operational profile obtained in previous step after the
global update is very small, ~1%, that is a good value remembering that there are 262 test
frames.
Compared to the previous step there is a lower MSE, this implies a constant improvement
of MART performances compared to operational testing.
4.5.2 Experiment 2
The operational profile considered differs from the first true operational profile of 50%:
• 57.5% for first category,
• 40% for second category,
• 2.5% for third category.
The operational profile considered in the previous case leads to an underestimation of true
system reliability; instead, in this case the operational profile is built in way to obtain an
overestimation of system reliability.
Figure 31: MSE of each technique at each Step, for different n%
Reliability Assessment of Microservice Architectures
94
Figure 32: Sample Variance of each technique at each Step, for different n%
Results are represented in Figures 31 and 32: about step 1 there are close values of MSE,
in particular for n=70%, and an important MART superiority in term of Sample Variance.
After an update cycle, there is an operational profile improvement of 25% compared to the
true one, with a general decrement trend of MSE values.
After another update cycle, there is an improvement of 12%, with the decrement of the
reliability estimated value for all n percentage values.
The considerations between algorithms performances are the same of the previous
experiment.
4.5.3 Experiment 3
The operational profile considered differs from the first true operational profile of 90%:
• 35% for first category,
• 55% for second category,
• 10% for third category.
Experiment results are reported in Figures 33 and 34.
Reliability Assessment of Microservice Architectures
95
Figure 33: MSE of each technique at each Step, for different n%
Figure 34: Sample Variance of each technique at each Step, for different n%
In the first step, the operational testing has better MSE values than MART for small n, but
worse values of Sample Variance.
After an update cycle, there is an improvement of operational profile of 45% compared to
the true one, with a very important improvement on reliability estimation in case of
MART.
Reliability Assessment of Microservice Architectures
96
After another update cycle, there is another improvement of operational profile of 22%. In
this case is evident the trend to converge towards the true reliability value.
Considerations are the same that in previous two experiments, evidencing that the
technique implemented in this thesis is better than the compared ones.
4.5.4 Experiment 4
For the fourth experiment there is a different formulation. The objective is to see how
MART works when there is a variation of the true operational profile. The aim is to verify
not only the superiority of MART compared to operational testing, but also the power of
operational profile update.
After three Steps there is a true operational profile variation. Hence, the ability of MART
to adapt to this variation is evaluated adding other two steps.
Figure 35: MSE of each technique at each Step, for different n%
Reliability Assessment of Microservice Architectures
97
Figure 36: Sample Variance of each technique at each Step, for different n%
Figures 35 and 36 show that for the first three steps results are more or less the same of
experiment 2.
Step 4 is not a true step, but is a revaluation of step 3 compared to the new true operational
profile. In this Step there is a very important increment of MSE.
In steps 5 and 6, after their operational profile update operations, the estimated value tends
to the true value, in fact there is a decreasing MSE.
Sample variance is more or less the same across all Steps.
4.5.5 Further considerations
About failure probability update results are relevant: the probability, attributed by default,
are respected, except for few test frames. In the most cases the 0.99 probabilities are
incremented, instead 0.1 and 0.01 are reduced. Because of the violation of request
independency, in few cases there is a slightly increment of this last two values (~0.1).
4.6 ANOVA For statistical significance one-way analysis of variance (ANOVA) test is conducted. At
first test data properties, in particular the normality and homoscedasticity of both MSE and
Sample Variance residuals, to determine the type of ANOVA to apply.
Reliability Assessment of Microservice Architectures
98
Figure 37: normality of MSE residuals
Figure 38: normality of Sample Variance residuals
The Shapiro-Wilk test is run to verify normality of residuals, results are reported in Figures
37 and 38: the null hypothesis of data coming from a normal distribution is rejected for
MSE with a p-value < 0.0001 and for Sample Variance with a p-value = 0.0003.
Homoscedasticity is verified by the Levene’s test, in fact it is less sensitive to non-
normality.
Reliability Assessment of Microservice Architectures
99
Figure 39:Levene's Test
As shown in Figure 39, homoscedasticity is also rejected with p-value = 0.0003 in MSE
case, otherwise in case of Sample Variance is rejected with p-value < 0.0001.
Thus, Wilcoxon’s test is adopted in both cases.
Figure 40: Wilcoxon’s test
As shown in Figure 40, the hypothesis of no difference among techniques is rejected at p-
value < 0.0001 in both cases: this means that the differences between the two techniques
Reliability Assessment of Microservice Architectures
100
are statistically significant and the considerations reported are valid.
Reliability Assessment of Microservice Architectures
101
Conclusions
Reliability assessment in MSA calls for run-time approaches. The run-time testing method
presented in this thesis, based on MART, supports the on-demand assessment of reliability
pursuing the accuracy and efficiency of the estimate during the operational phase. Results
suggest that both the run-time adaptivity to the real observed profile and failing behavior
and the testing-time adaptivity implemented by MART (allowing to spot failures with few
tests while preserving the estimate unbiasedness) are good starting points to further
elaborate in the future. Improvements can be achieved by investigating what other
information can be useful to expedite the assessment (e.g., about service interactions), by
exploring other approaches for the information update (e.g., Bayesian updates), and/or by
exploring different partitioning criteria. More extensive experiments are also planned to
improve generalization in terms of experimental subjects and operational profiles.
MART is a valid technique to realize reliability assessment of Microservice Architectures.
This assertion is very robust, in fact results are positive and they are obtained considering
conditions very close to a real case.
102
Bibliography
[1] Netflix, Inc. https://media.netflix.com/en/about-netflix
[2] S. K. Thompson, “Adaptive Web Sampling”, The International Biometric Society,
2006.
[3] P. Di Francesco, P. Lago, I. Malavolta, "Research on Architecting Microservices:
Trends Focus and Potential for Industrial Adoption", IEEE International Conference on
Software Architecture (ICSA), 2017.
[4] J. Lewis and M. Fowler, “Microservices: a definition of this new architectural term”,
2014, Available at: http://martinfowler.com/articles/microservices.html.
[5] Spring Cloud. Available at: http://projects.spring.io/spring-cloud/
[6] Netflix OSS. Available at: https://netflix.github.io/
[7] Spring Cloud Netflix. Available at: https://cloud.spring.io/spring-cloud-netflix/
[8] D. Shadija, M. Rezai, R. Hill, “Towards an Understanding of Microservices”, In
Proceedings of the 23rd International Conference of Automation and Computing (ICAC),
University of Huddersfield, IEEE Computer Society, 2017.
[9] M. Richards, “Microservices vs. Service-Oriented Architecture”, O’Reilly Media,
2015.
[10] Ima Miri, Microservices vs. SOA. Available at:
https://dzone.com/articles/microservices-vs-soa-2
[11] M. R. Lyu, “Handbook of software reliability engineering”, Chap. 2, 12, 13 and 16,
McGraw-Hill, Inc., Hightstown, NJ, 1996.
[12] JOSEPH J. NARESKY, “Reliability Definitions”, IEEE, 1970.
[13] G. Toffetti, S. Brunner, M. Blochlinger, F. Dudouet, A. Edmonds, "An architecture
103
for self-managing microservices", Proceedings of the 1st International Workshop on
Automated Incident Management in Cloud (AIMC), pp. 19-24, 2015.
[14] N. Cardozo, “Emergent software services”, In Proceedings of the 2016 ACM
International Symposium on New Ideas, New Paradigms, and Reflections on
Programming and Software, pages 15–28, 2016.
[15] G. Schermann, D. Scho ̈ni, P. Leitner, and H. C. Gall, “Bifrost: Supporting
Continuous Deployment with Automated Enactment of Multi-Phase Live Testing
Strategies”. In ACM/IFIP/USNIX Middleware, p. 12, 2016
[16] H. Kang, M. Le, S. Tao, "Container and microservice driven design for cloud
infrastructure devops", IEEE International Conference on Cloud Engineering (IC2E), pp.
202-211, 2016.
[17] B. Frank, G. Butzin, D. Timmermann, "Microservices approach for the internet of
things", Emerging Technologies and Factory Automation (ETFA) IEEE 21st International
Conference on, 2016.
[18] J. Stubbs, W. Moreira, R. Dooley, "Distributed systems of microservices using docker
and serfnode", Science Gateways (IWSG) 7th International Workshop on, pp. 34-39,
2015.
[19] D. Guo, W. Wang, G. Zeng, Z. Wei, "Microservices architecture based cloudware
deployment platform for service computing", IEEE Symposium on Service-Oriented
System Engineering (SOSE), pp. 358-363, 2016.
[20] V. Heorhiadi, S. Rajagopalan, H. Jamjoom, M. K. Reiter, V. Sekar, "Gremlin:
systematic resilience testing of microservices", Proc. of ICDCS, pp. 57-66, 2016.
[21] A. Nagarajan; A. Vaddadi,“Automated Fault-Tolerance Testing”,IEEE, 2016
[22] P. Potvin, M. Nabaee, F. Labeau, K. Nguyen, M. Cheriet, “Microservice cloud
computing pattern for next generation networks”, LNICST 166, pp. 263-274, 2016
[23] P. Bak, R. Melamed, D. Moshkovich, Y. Nardi, H. Ship, A. Yaeli, "Location and
context-based microservices for mobile and internet of things workloads", IEEE
International Conference on Mobile Services, pp. 1-8, 2015.
[24] K. Meinke, P. Nycander, "Learning-based testing of distributed microservice
104
architectures: Correctness and fault injection", International Conference on Software
Engineering and Formal Methods, pp. 3-10, 2015.
[25] A. Balalaie, A. Heydarnoori, P. Jamshidi, "Migrating to Cloud-Native Architectures
Using Microservices: An Experience Report", Proc. 1st Int'l Workshop Cloud Adoption
and Migration, 2015.
[26] P. Kookarinrat, Y. Temtanapat, "Design and Implementation of a Decentralized
Message Bus for Microservices," 13th International Joint Conference on Computer
Science and Software Engineering (JCSSE), Khon Kaen, Thailand, 2016.
[27] K.-Y. Cai, Y.-C. Li, K. Liu, “Optimal and adaptive testing for software reliability
assessment”, Information and Software Technology 46 (15) 989–1000, 2004.
[28] D. Cotroneo, R. Pietrantuono, S. Russo, “RELAI Testing: A Technique to Assess and
Improve Software Reliability”, IEEE Trans. on Software Engineering 42 (5) 452–475,
2016.
[29] T. Chen, H. Leung, and I. Mak, “Adaptive random testing, in Proc. 9th Asian
Comput. Sci. Conf. Adv. Comput. Sci. Higher-Level Decision Making, pp. 320–329, 2005
[30] R. Pietrantuono, S. Russo, “On Adaptive Sampling-Based Testing for Soft- ware
Reliability Assessment”, in: Proceedings 27th International Symposium on Software
Reliability Engineering (ISSRE), IEEE, pp. 1–11, 2016.
[31] Daniela Cocchi, “Teoria Dei Campioni”, Cap. 2 e 3.
[32] D. Raj, “Some Estimators in Sampling with Varying Probabilities without
Replacement”, Journal of the American Statistical Association, Vol. 51, No. 274, pp. 269-
284, 1956.
[33] M. N. Murthy, “Ordered and Unordered Estimators in Sampling without
Replacement”, Sankhyā: The Indian Journal of Statistics (1933-1960), Vol. 18, No. 3/4,
pp. 379-390, 1957
[34] R. Iorio, M. Cinque, R. Della Corte, “Real-time monitoring of microservices-based
software systems”, 2017.
[35] Pet Clinic, Available at: https://github.com/spring-petclinic/spring-petclinic-
microservices
105
[36] Admin Server, Available at: http://codecentric.github.io/spring-boot-admin/current/
[37] Zuul Server, Available at: https://github.com/Netflix/zuul/wiki
[38] Config Server, Available at: https://github.com/spring-cloud/spring-cloud-config/
[39] Eureka Server, Available at: https://github.com/Netflix/eureka/wiki
[40] AspectJ, Available at: http://www.baeldung.com/aspectj
[41] Zipkin, Available at: https://zipkin.io/
[42] Docker, Available at: https://www.docker.com/