Technical Report: Analysis of the (1; -Self-Adaptation ......Technical Report: Analysis of the (1; )-˙-Self-Adaptation Evolution Strategy with Repair by Projection Applied to a Conically

Technical Report: Analysis of the (1, λ)-σ-Self-Adaptation

Evolution Strategy with Repair by Projection Applied to a

Conically Constrained Problem

Technical Report

Patrick Spettel and Hans-Georg Beyer

{Patrick.Spettel,Hans-Georg.Beyer}@fhv.at

Research Center Process and Product EngineeringVorarlberg University of Applied SciencesHochschulstr. 1, 6850 Dornbirn, Austria

April 6, 2018

Abstract

The behavior of the (1, λ)-σ-Self-Adaptation Evolution Strategy ((1, λ)-σSA-ES) applied to aconically constrained problem is analyzed. For handling infeasible offspring, a repair approachthat projects infeasible offspring onto the boundary of the feasibility region is considered. Closed-form approximations are derived for the expected changes of an individual’s parameter vectorand mutation strength from one generation to the next. For analyzing the strategy’s behav-ior over multiple generations, deterministic evolution equations are derived. It is shown thatthose evolution equations together with the approximate one-generation expressions allow toapproximately predict the evolution dynamics using closed-form approximations. Those derivedapproximations are compared to simulations in order to visualize the approximation quality.

2

Acknowledgments

This work was supported by the Austrian Science Fund FWF under grant P29651-N32.

3

Contents

List of Symbols and Abbreviations 6

1 Introduction 7

2 Problem and Algorithm 82.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.2 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Analysis 173.1 The Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 The Dynamical Systems Approach . . . . . . . . . . . . . . . . . . . . . . 173.1.2 The Microscopic Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2.1 Derivation of the x Progress Rate . . . . . . . . . . . . . . . . . 193.1.2.1.1 The Exact PQ(q) and pQ(q) Functions . . . . . . . . . . 193.1.2.1.2 The Approximate PQ(q) and pQ(q) Functions . . . . . . 21

3.1.2.1.2.1 The Offspring Density Before Projection in x Di-rection . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2.1.2.2 The Offspring Density Before Projection in r Di-rection and its Normal Approximation . . . . . . 22

3.1.2.1.2.3 The Derived PQ(q) Upper Bound as an Approx-imation for PQ(q) . . . . . . . . . . . . . . . . . 33

3.1.2.1.2.4 The PQ(q) Approximation in the case that theprobability of feasible offspring tends to 1 . . . . 34

3.1.2.1.2.5 The PQ(q) Approximation . . . . . . . . . . . . . 353.1.2.1.2.6 The Approximate x Progress Rate in the case

that the probability of feasible offspring tends to 1 403.1.2.1.2.7 The Approximate x Progress Rate in the case

that the probability of feasible offspring tends to 0 413.1.2.1.2.8 The Approximate x Progress Rate - Combina-

tion Using the Best Offspring’s Feasibility Prob-ability . . . . . . . . . . . . . . . . . . . . . . . . 44

3.1.2.2 Derivation of the r Progress Rate . . . . . . . . . . . . . . . . . 493.1.2.2.1 The Approximate r Progress Rate in the case that the

probability of feasible offspring tends to 0 . . . . . . . . 553.1.2.2.2 The Approximate r Progress Rate in the case that the

probability of feasible offspring tends to 1 . . . . . . . . 56

4

CONTENTS 5

3.1.2.2.3 The Approximate r Progress Rate - Combination Usingthe Best Offspring’s Feasibility Probability . . . . . . . 57

3.1.2.3 Derivation of the SAR . . . . . . . . . . . . . . . . . . . . . . . . 583.1.2.3.1 The Approximate SAR in the case that the probability

of feasible offspring tends to 0 . . . . . . . . . . . . . . 643.1.2.3.2 The Approximate SAR in the case that the probability

of feasible offspring tends to 1 . . . . . . . . . . . . . . 683.1.2.3.3 The Approximate SAR - Combination Using the Best

Offspring’s Feasibility Probability . . . . . . . . . . . . 693.1.3 The Evolution Dynamics in the Deterministic Approximation . . . . . . . 73

3.1.3.1 The Evolution Equations . . . . . . . . . . . . . . . . . . . . . . 733.1.3.2 The ES in the Stationary State . . . . . . . . . . . . . . . . . . . 73

4 Conclusion 91

Bibliography 92

List of Symbols and Abbreviations

σSA σ-Self-Adaptation

CAS Computer Algebra System

ES Evolution Strategy

i.i.d. independent and identically distributed

SAR Self-Adaptation Response

6

Chapter 1

Introduction

As constrained optimization is important in many practical applications, the theoretical un-derstanding of evolution strategies (Evolution Strategies (ESs)) with constraint handling is ofinterest in current research. In [3], the (1, λ)-ES with constraint handling by resampling isanalyzed for a single linear constraint. This is extended in [2] with the analysis of repair byprojection for a single linear constraint. In [8], the repair approach analyzed in [2] is comparedwith an approach that reflects infeasible points into the feasible region and an approach thattruncates infeasible points. A variant of the (1, λ)-ES for a conically constrained problem isconsidered in [4]. There, offspring are discarded until feasible are obtained. The goal of thischapter is to extend the analysis for the local progress measures in [4] with an analysis for arepair approach based on projection. The repair method is performed by projecting infeasi-ble offspring onto the boundary of the feasible region by minimizing the euclidean distance tothe constraint boundary. For the mutation strength control, σ-Self-Adaptation (σSA) is con-sidered. The chapter is organized as follows. The optimization problem and the algorithmunder consideration are described in Section 2.1 and Section 2.2, respectively. Next, the theo-retical results are presented. To this end, the theoretical analysis method used is explained inSection 3.1.1. This is followed by the investigation of the algorithm’s microscopic aspects (one-generation behavior) in Section 3.1.2 and then by the analysis of the algorithm’s macroscopicbehavior (multi-generation behavior, i.e., the evolution dynamics) in Section 3.1.3. For both,the microscopic and the macroscopic behavior, closed-form approximations under asymptoticassumptions are derived. The approximations are compared to simulations.

7

Chapter 2

Problem and Algorithm

2.1 The Problem

The optimization problem under consideration is

f(x) = x1 → min! (2.1)

subject to constraints

x21 − ξ

N∑k=2

x2k ≥ 0 (2.2)

x1 ≥ 0 (2.3)

where x = (x1, . . . , xN )T ∈ RN and ξ > 0.Figure 2.1 shows the optimization problem for the case of 2 dimensions. The cone bound-

ary and the projection lines are visualized. The equation for the cone boundary is a directconsequence of the problem definition (Equation (2.2))

x21 − ξ

N∑k=2

x2k = 0

x21 = ξ

N∑k=2

x2k√√√√ N∑

k=2

x2k︸︷︷︸

=:r

=x1√ξ.

(2.4)

This readsx2 = ± x1√

ξ(2.5)

for the case N = 2. A direction vector of the cone boundary is therefore(1,

1√ξ

)T. (2.6)

Counterclockwise rotation of this vector by 90 degrees results in a direction vector of the pro-jection line (

− 1√ξ, 1

)T. (2.7)

8

CHAPTER 2. PROBLEM AND ALGORITHM 9

The projection line for a given value q (as shown in the figure) can be expressed in parametricform using Equations (2.6) and (2.7) resulting in

x1 = q +t√ξ

(2.8)

x2 =q√ξ− t (2.9)

for t ∈ R. Solving Equation (2.8) for t and substituting it into Equation (2.9) yields

x2 =q√ξ−√ξ(x1 − q) (2.10)

= −√ξx1 + q

(√ξ +

1√ξ

). (2.11)

For q = 0, this results in −√ξx1. Additionally, in Figure 2.1, an offspring x with its parentx and the corresponding mutation vector z scaled by the mutation strength σ are shown. Theoffspring’s x1 and x2 values after the projection step are indicated by q and qr, respectively.

The 2-dimensional case can be generalized to N dimensions due to symmetry considerations.A point in the search space can be uniquely described by its distance x from 0 in x1 direction(cone axis) and the distance r from the cone’s axis. In the further analyses, this is calledthe (x, r)T -space. It is visualized in Figure 2.2. Because the distance from the cone’s axis ispositive, only half of the cone needs to be considered. The equations of the cone boundary andthe projection lines are derived analogously to the case of 2 dimensions. Similar to Figure 2.1,an offspring x and its parent x together with the scaled mutation vector σz are visualized. Thevalues q and qr indicate the x and r values, respectively, after the projection step.


x1

x2

(0, 0)T

x2 = x1√ξ

x2 = − x1√ξ

x2 = −√ξx1

x2 =√ξx1

x

x

σz

q

(q, qr)T

x2 = −√ξx1 + q(√

ξ + 1√ξ

)

x2 =√ξx1 − q

(√ξ + 1√

ξ

)

qr

··

Figure 2.1: The conically constrained optimization problem in two dimensions. The coordinateaxes are x1 and x2. The cone boundary is visualized (lines with equations x2 = x1/

√ξ and

x2 = −x1/√ξ). The feasible region comprises the region inside and on the cone. It is horizontally

hatched. The lines orthogonal to the cone boundary indicate the projection lines. Points onthese lines are projected onto the cone boundary at the particular line-cone intersection. Everypoint that is to the left of the projection lines at the origin is projected to the origin. Theparental candidate solution is indicated by x. One particular offspring is shown and denoted byx. The corresponding mutation vector is σz. The x1 and x2 values of the offspring after theprojection step are indicated by q and qr, respectively.


x

r

(0, 0)T

r = x√ξ

r = −√ξx

x

x

σz

q

(q, qr)T

r = −√ξx+ q(√

ξ + 1√ξ

)

qr·

Figure 2.2: The conically constrained optimization problem in N dimensions shown in the(x, r)T -space. The coordinate axes are x and r where x is the cone’s axis and r is the distancefrom the cone’s axis. The cone boundary is visualized (line with equation r = x/

√ξ). The

feasible region comprises the region inside and on the cone. It is horizontally hatched. The linesorthogonal to the cone boundary indicate the projection lines. Points on these lines are projectedonto the cone boundary at the particular line-cone intersection. Every point that is to the leftof the projection line at the origin is projected to the origin. The parental candidate solutionis indicated by x. One particular offspring is shown and denoted by x. The correspondingmutation vector is σz. The x and r values of the offspring after the projection step are thevalues q and qr, respectively.

2.2 The Algorithm

2.2.1 Core

The considered algorithm is a (1, λ)-σSA-ES. The pseudo code is shown in Algorithm 1. Afterinitialization (Lines 1 to 2), the generation loop is entered. In Lines 6 to 15, λ offspring aregenerated. For every offspring, its mutation strength σl is determined by mutating the parentalmutation strength σ(g) using a log-normal distribution (Line 7). This mutation strength isthen used to sample the offspring’s parameter vector from a multivariate normal distributionwith mean x(g) and standard deviation σl (Line 8). Then, it is determined whether repair isnecessary. Repair is necessary if the offspring is infeasible. If the generated offspring is infeasible,its parameter vector is projected onto the point on the boundary of the feasible region thatminimizes the euclidean distance to the offspring point (Lines 9 to 11). That is, if the offspring


is not feasible, the optimization problem

x = arg minx′‖x′ − x‖2

s.t. x′12 − ξ

N∑k=2

x′k2 ≥ 0

x′1 ≥ 0

(2.12)

must be solved. In (2.12), x is the individual to be projected. For this, a convenience function

x = projectOntoCone(x) (2.13)

is introduced, returning x of the problem (2.12). Section 2.2.2 presents a geometrical approachfor deriving a closed-form solution to the projection optimization problem (2.12). The valuesx(g), r(g), ql, qrl, q1;λ, and qr1;λ are only needed in the theoretical analysis and can be removedin practical applications of the ES. They are indicated in the algorithm in Lines 4, 5, 13, 14,19, and 20, respectively. In Line 12, the offspring’s fitness is determined. After the procreationstep, the next generation’s parental individual x(g+1) (Line 17) and the next generation’s mu-tation strength σ(g+1) (Line 18) are set to the corresponding values of the offspring with thesmallest objective function value. The update of the generation counter ends one iteration ofthe generation loop. The generation loop is quit if the defined termination criteria are met (forexample if a maximum number of generations is reached, if a sigma threshold is reached, etc.).

Algorithm 1 Pseudo-code of the (1, λ)-σSA-ES with repair by projection applied to the coni-cally constrained problem.

1: Initialize x(0), σ(0), τ , λ2: g ← 03: repeat4: x(g) = (x(g))1

5: r(g) =√∑N

k=2(x(g))2k

6: for l← 1 to λ do7: σl ← σ(g)eτN (0,1)

8: xl ← x(g) + σlN (0, I)9: if not isFeasible(xl) then . see Algorithm 2

10: xl ← projectOntoCone(xl) . see Equations (2.12) and (2.13)11: end if12: fl ← f(xl) = (xl)1

13: ql = (xl)1

14: qrl =√∑N

k=2(xl)2k

15: end for16: Sort offspring according to fl in ascending order17: x(g+1) ← x1;λ

18: σ(g+1) ← σ1;λ

19: q1;λ = (x(g+1))1

20: qr1;λ =√∑N

k=2(x(g+1))2k

21: g ← g + 122: until termination criteria are met


Algorithm 2 Feasibility check

1: function isFeasible(x)2: return(x1 ≥ 0 ∧ x2

1 − ξ∑N

k=2 x2k ≥ 0)

3: end function

Figure 2.3 shows an example of the x- and r-dynamics generated by running Algorithm 1(solid line) in comparison with the iteration of the closed-form approximation iterative system(dotted line) that is derived in the following sections.

10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x

real runiterated by approx.

10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r

real runiterated by approx.

Figure 2.3: Comparison of the x- and r-dynamics of a real ES run (solid line) with the iterationof the closed-form approximation of the iterative system (dotted line) that is derived in thefollowing sections.

2.2.2 Projection

In this section, a geometrical approach for the projection of points that are outside of the feasibleregion onto the cone boundary is described. Figure 2.4 shows a visualization of the 2D-planespanned by an offspring x and the x1 coordinate axis. The equation for the cone boundaryis a direct consequence of the problem definition (Equation (2.2)) as shown in Equation (2.4).The projected vector is indicated by xΠ. The unit vectors e1, ex, er, and ec are introduced.They are unit vectors in the direction of the x1 axis, x vector, r vector, and the cone boundary,respectively. The projection line is indicated by the dashed line. It indicates the shortesteuclidean distance from x to the cone boundary. The goal is to compute the parameter vectorafter projection xΠ. By inspecting Figure 2.4 one can see that the projected vector xΠ can beexpressed as

xΠ =

{(eTc x)ec if eTc x > 0

0 otherwise.(2.14)

From Equation (2.2), the equation of the cone boundary x√ξ

follows. Using this, the vector ec

can be expressed as a linear combination of the vectors e1 and er. After normalization (suchthat ||ec|| = 1) this writes

ec =e1 + 1√

ξer√

1 + 1ξ

. (2.15)

The unit vectors e1 = (1, 0, . . . , 0)T and ex = x||x|| are known. Therefore, the goal is to express

er as a linear combination of e1 and ex. Additionally, er should be a unit vector and it should


cone boundary x√ξ

x = x1

r

x

ex ecer

e1

xΠ

·

Figure 2.4: 2D-plane spanned by an offspring x and the x1 coordinate axis. Vectors introducedfor expressing the projection onto the feasible region of Figure 2.2 are visualized. The vectorse1, ex, er, and ec are unit vectors of the corresponding vectors. The dashed line indicates theorthogonal projection of x onto the cone boundary.

be orthogonal to e1. Formally, a and b are to be determined such that

er = ae1 + bex (2.16)

eT1 er = 0 (2.17)

||er|| = 1. (2.18)

Using ||e1||2 = eT1 e1 and Equations (2.16) and (2.17) it follows that

eT1 er = eT1 (ae1 + bex)

0 = a+ beT1 ex

a = −beT1 ex. (2.19)

Inserting Equation (2.19) into Equation (2.16) yields

er = −b(eT1 ex)e1 + bex. (2.20)

With Equation (2.18)

1 = ||er||2

1 = eTr er

1 = b[−(eT1 ex)e1 + ex]T b[−(eT1 ex)e1 + ex]

1 = b2[(eT1 ex)2 − 2((eT1 ex)e1)Tex + 1]

1 = b2[(eT1 ex)2 − 2(eT1 ex)2 + 1]

1 = b2[1− (eT1 ex)2]

b =1√

1− (eT1 ex)2. (2.21)


follows. Insertion of Equation (2.21) into Equation (2.19) results in

a = − eT1 ex√1− (eT1 ex)2

. (2.22)

Now, Equations (2.16), (2.21), and (2.22) yield

er = − eT1 ex√1− (eT1 ex)2

e1 +1√

1− (eT1 ex)2ex

= (−(x)1, 0, . . . , 0)T1

||x||√

1−(

(x)1

||x||

)2+ ((x)1, (x)2, . . . , (x)N )T

1

||x||√

1−(

(x)1

||x||

)2

= (0, (x)2, . . . , (x)N )T1

||x||√

1−(

(x)1

||x||

)2

= (0, (x)2, . . . , (x)N )T1√

||x||2 − (x)21

= (0, (x)2, . . . , (x)N )T1√

(x)21 + · · ·+ (x)2

N − (x)21

= (0, (x)2, . . . , (x)N )T1√

(x)22 + · · ·+ (x)2

N

= (0, (x)2, . . . , (x)N )T /||r||.

(2.23)

Inserting Equation (2.23) into Equation (2.15) results in

ec =

(1,

(x)2√ξ||r|| , . . . ,

(x)N√ξ||r||

)T /√1 +

1

ξ. (2.24)

Using Equation (2.24) it follows that

eTc x =

((x)1 +

(x)22 + · · ·+ (x)2

N√ξ||r||

)/√1 +

1

ξ

=

((x)1 +

||r||2√ξ||r||

)/√1 +

1

ξ

=

((x)1 +

||r||√ξ

)/√1 +

1

ξ

=(

(x)1

√ξ + ||r||

)/√ξ + 1.

(2.25)

Use of Equations (2.14), (2.24), and (2.25) yields for the case eTc x > 0

xΠ = (eTc x)ec

=1

1 + 1ξ

((x)1 +

(x)22 + · · ·+ (x)2

N√ξ||r||

)(1,

(x)2√ξ||r|| , . . . ,

(x)N√ξ||r||

)T.

(2.26)


The first vector component of xΠ writes

q = (xΠ)1 =1

1 + 1ξ

((x)1 +

(x)22 + · · ·+ (x)2

N√ξ||r||

)=

ξ

ξ + 1

((x)1 +

||r||√ξ

).

(2.27)

The k-th vector component of xΠ for k ∈ {2, . . . , N} writes

(xΠ)k =ξ

ξ + 1

((x)1 +

||r||√ξ

)(x)k√ξ||r||

=ξ

ξ + 1

((x)1√ξ||r|| +

1

ξ

)(x)k.

(2.28)

And the projected distance from the cone axis ||rΠ|| writes

qr = ||rΠ|| =

√√√√ N∑k=2

(xΠ)2k

=

√√√√ N∑k=2

(ξ

ξ + 1

((x)1√ξ||r|| +

1

ξ

))2

(x)2k

=

(ξ

ξ + 1

((x)1√ξ||r|| +

1

ξ

))||r||.

(2.29)

Note that in terms of the (x, r)T representation, the offspring x can be expressed as (eT1 x, eTr x)T =((x)1, ||r||)T . This is exactly what is expected. The first component is the value in direction ofthe cone axis. The second component is the distance from the cone axis.

Chapter 3

Analysis

3.1 The Theoretical Analysis

This section presents the theoretical analysis of the (1, λ)-σSA-ES (Algorithm 1) applied tothe conically constrained problem as defined in Section 2.1. To this end, the method used forthe analysis, the dynamical systems approach, is first introduced in Section 3.1.1. There, theso-called evolution equations for the prediction of the evolution dynamics are defined at a highlevel. Then, approximations for all the needed parts for the evolution equations are derived ina step-by-step manner in Section 3.1.2. Finally, everything is put together in Section 3.1.3 topresent the evolution dynamics and steady state considerations. The derived approximationsare compared to real ES simulations in order to show the quality of the approximations.

3.1.1 The Dynamical Systems Approach

For the analysis of the (1, λ)-σSA-ES, the (x, r)T -modeling described in Section 2.1, is used. Thegoal is to compute the evolution dynamics of the (1, λ)-σSA-ES. For doing this, the dynamicalsystems approach presented in [5] is used. This approach models specific state variables ofthe strategy over time. For the problem under consideration, there are three random variablesthat describe the system (assuming constant exogenous parameters). The x and r values ofthe current parental individual and the current mutation strength σ. The transition betweenconsecutive states of the evolution strategy can then be modeled as a Markov process x(g+1)

r(g+1)

σ(g+1)

← x(g)

r(g)

σ(g)

. (3.1)

Deriving closed-form expressions for the transition equations (so-called Chapman-Kolmogorovequations, see, e.g. [6]) is often not possible. Instead, approximate equations for the change ofall the state variables are usually derived. The change of the random state variables is expressedin two parts. The first part is the expected change and the second part comprises the stochasticfluctuations. Making use of functions that express the change (ϕx, ϕr, ψ), so-called evolutionequations describe the system. They read

x(g+1) = x(g) − ϕx(x(g), r(g), σ(g)) + εx(x(g), r(g), σ(g)) (3.2)

r(g+1) = r(g) − ϕr(x(g), r(g), σ(g)) + εr(x(g), r(g), σ(g)) (3.3)

σ(g+1) = σ(g) + σ(g)ψ(x(g), r(g), σ(g)) + εσ(x(g), r(g), σ(g)) (3.4)

17

CHAPTER 3. ANALYSIS 18

and are stochastic difference equations. The expected changes in the parameter space are ex-pressed as so-called progress rates. They are defined as

ϕx(x(g), r(g), σ(g)) := E[x(g) − x(g+1) |x(g), r(g), σ(g)] (3.5)

ϕr(x(g), r(g), σ(g)) := E[r(g) − r(g+1) |x(g), r(g), σ(g)]. (3.6)

Normalized variants are introduced as

ϕ∗x(·) :=Nϕx(·)x(g)

(3.7)

and

ϕ∗r(·) :=Nϕr(·)r(g)

. (3.8)

Similarly, a normalized variant of the mutation strength is introduced as

σ∗ :=Nσ

r(g). (3.9)

Evolution of the mutation strength is performed by multiplication with a log-normally dis-tributed random variable. Therefore, its relative expected change is of interest. This yields aslightly different (in comparison to the progress rates in the parameter space) progress measurefor the mutation strength, the so-called Self-Adaptation Response (SAR). It is defined as

ψ(x(g), r(g), σ(g)) := E

[σ(g+1) − σ(g)

σ(g)

∣∣∣∣x(g), r(g), σ(g)

]. (3.10)

The fluctuations are represented as random variables εx, εr, and εσ with necessarily E[εx] = 0,E[εr] = 0, and E[εσ] = 0. Fluctuation terms are treated in detail in [9]. For the analysis in thischapter, it is assumed that it is sufficient to consider the dynamics without fluctuations in orderto approximately model the evolution dynamics of the strategy in the asymptotic limit caseN → ∞. Such evolution equations without fluctuation terms are referred to as deterministicevolution equations. They are usually less complex to deal with in theoretical analyses. Theyread

x(g+1) = x(g) − ϕx(x(g), r(g), σ(g)) (3.11)

r(g+1) = r(g) − ϕr(x(g), r(g), σ(g)) (3.12)

σ(g+1) = σ(g) + σ(g)ψ(x(g), r(g), σ(g)). (3.13)

As it turns out, it is possible to approximately predict the mean value evolution dynamics ofthe evolution strategy using these evolution equations. The next step is to derive expressionsfor the functions ϕx, ϕr, and ψ.

3.1.2 The Microscopic Aspects

The microscopic aspects deal with the local analysis of the behavior of the evolution strategy.This means that, given the current parental individual and the current mutation strength, astep from one generation to the next is considered. The progress rates express the expectedchange of the parental individual in the parameter space. And the expected relative changeof the mutation strength is given as the SAR. Those are derived in the next sections for thealgorithm and the problem under consideration.


3.1.2.1 Derivation of the x Progress Rate

From the definition of the progress rate (Equation (3.5)) and the pseudo-code of the ES (Algo-rithm 1, Lines 4 and 19) it follows that

ϕx(x(g), r(g), σ(g)) = x(g) − E[x(g+1) |x(g), r(g), σ(g)] (3.14)

= x(g) − E[q1;λ |x(g), r(g), σ(g)]. (3.15)

This means that the expectation of q1;λ, i.e., the x value after (possible) projection of the bestoffspring, is needed for the derivation of the progress rate in x direction

E[q1;λ |x(g), r(g), σ(g)] := E[q1;λ] =

∫ q=∞

q=0q pq1;λ

(q) dq (3.16)

= λ

∫ q=∞

q=0q pQ(q)[1− PQ(q)]λ−1 dq. (3.17)

Equation (3.16) follows directly from the definition of expectation where

pq1;λ(q) := pq1;λ

(q |x(g), r(g), σ(g))

indicates the probability density function of the best (offspring with smallest q value) offspring’sq value. The random variable Q denotes the random x values after projection. The stepto Equation (3.17) follows from the calculation of pq1;λ

(q). Because the objective function(Equation (2.1)) is defined to return an individual’s x value, pq1;λ

(q) is the probability den-sity function of the minimal q value among λ values. This calculation is well-known in orderstatistics (see, e.g., [1]). A short derivation is presented here for the case under considera-tion. One arbitrarily selected mutation out of the total λ mutations has probability densitypQ(q) := pQ(q |x(g), r(g), σ(g)) for its projected value q. For this particular q value to be thesmallest, all other λ − 1 values must have larger q values. Because they are statistically inde-pendent, this results in the probability [1 − PQ(q)]λ−1 of all other mutations being larger thanq. PQ(q) := PQ(q |x(g), r(g), σ(g)) denotes the cumulative distribution function of the randomvariable of the q values Q, i.e., PQ(q) = Pr[Q ≤ q]. The density for the mutation considered isthus pQ(q)[1− PQ(q)]λ−1. Since there are λ possibilities for best q values, one obtains

pq1;λ(q) = λpQ(q)[1− PQ(q)]λ−1 (3.18)

where

PQ(q) = Pr[Q ≤ q] =

∫ q′=q

q′=0pQ(q′) dq′. (3.19)

Note that the lower bound is 0 because the q values are defined to be the x values after projection.Now, to proceed further with Equation (3.17), the cumulative distribution function PQ(q) andthe corresponding probability density function pQ(q) need to be derived.

3.1.2.1.1 The Exact PQ(q) and pQ(q) Functions The cumulative distribution functionPQ(q) and its corresponding probability density function pQ(q) of projected values in directionof the cone axis are derived in this section. The approach followed is to compute Pr[Q ≤ q] byintegration. By doing this, PQ(q) = Pr[Q ≤ q] is derived. Then, pQ(q) follows by computingthe derivative using the fact that pQ(q) = d

dqPQ(q). For computing Pr[Q ≤ q], consider thevisualization in Figure 3.1. The probability Pr[Q ≤ q] expressed in terms of the offspring beforeprojection is the area of the horizontally hatched region. To see this, consider an arbitrary


offspring with (x, r)T = (x, r)T before projection and values (q, qr)T after projection. If the

offspring is feasible, i.e., r ≤ x/√ξ, q = x holds because the offspring is not projected. This

results in the conditionr ≤ x/

√ξ ∧ x ≤ q. (3.20)

Note that this is the condition for the area of the horizontally hatched part of the feasible region.For the case that the offspring is infeasible, i.e., r > x/

√ξ, the offspring is projected onto the

cone. The equation for projection is given in Equation (2.27). Requiring q ≤ q together withEquation (2.27) yields

ξ

ξ + 1

(x+

r√ξ

)≤ q (3.21)

x+r√ξ≤(ξ + 1

ξ

)q (3.22)

r ≤ −√ξx+ q

(√ξ +

1√ξ

). (3.23)

Together with the infeasibility condition

r > x/√ξ (3.24)

this results in

r > x/√ξ ∧ r ≤ −

√ξx+ q

(√ξ +

1√ξ

)(3.25)

which describes the remaining part of the horizontally hatched region. The area of the hori-zontally hatched region can be obtained by integration. To this end, the area of the doublyhatched region is subtracted from the area of the horizontally hatched and doubly hatchedregions. Formally, this reads

PQ(q) =

∫ x=q

x=−∞

∫ r=∞

r=0p1;1(x, r) dr dx

−∫ r=∞

r=q/√ξ

∫ x=q

x=−(1/√ξ)r+(1+1/ξ)q

p1;1(x, r) dx dr

(3.26)

=

∫ x=q

x=−∞

∫ r=∞

r=0p1;1(x, r) dr dx

−∫ x=q

x=−∞

∫ r=∞

r=−√ξx+(√ξ+1/

√ξ)qp1;1(x, r) dr dx.

(3.27)

The function p1;1(x, r) := p1;1(x, r |x(g), r(g), σ(g)) denotes the joint probability density functionof a single descendant before projection. It is derived from Algorithm 1. Lines 7 and 8 outlinethe way a single descendant before projection is generated. The ES is at the parental state(x(g), r(g), σ(g))T . First, σ(g) is mutated. This occurs with the conditional density (log-normalmutation operator)

pσ(σ |σ(g)) =1√2πτ

1

σexp

−1

2

(ln(σ/σ(g))

τ

)2 . (3.28)

The σ value obtained serves as the mutation strength in mutating x and r. This occurs withthe conditional probability density px,r(x, r |x(g), r(g), σ). The expression px,r(x, r |x(g), r(g), σ)


x

r

cone boundary r = x√ξ

q

proj. line r = −√ξx+ q(√

ξ + 1√ξ

)

Figure 3.1: Visualization of the integration for the calculation of PQ(q) = Pr[Q ≤ q]. Note thatPr[Q ≤ q] is the area of the horizontally hatched region. The integration can be performedby first computing the area of the rectangle containing the horizontally hatched and doublyhatched regions. Then, subtracting from that the area of the doubly hatched region results inthe horizontally hatched region.

denotes the joint probability density function of offspring before projection. The parameterσ occurs with conditional probability pσ(σ |σ(g)) dσ. The conditional probability density of asingle descendant is derived by integrating over all possible values for σ

p1;1(x, r |x(g), r(g), σ(g)) =

∫ σ=∞

σ=0px,r(x, r |x(g), r(g), σ)pσ(σ |σ(g)) dσ. (3.29)

Insertion of Equation (3.29) into Equation (3.26) or Equation (3.27) results in the exact PQ(q)function. The resulting expression is difficult to deal with analytically. Numerical integrationand approximation are two possibilities to proceed further. Because closed-form solutions arearguably preferable, the latter option is followed.

3.1.2.1.2 The Approximate PQ(q) and pQ(q) Functions In the investigations of thefollowing sections, τ is assumed to be small. By the properties of the log-normal distributionthis leads to a small variance of possible mutated σ values σl of the offspring. This property canbe seen by considering 1

τ →∞ and σ � 1τ . Note that, with these assumptions, Equation (3.28)

tends to∞ for σ(g) and it tends to 0 for every other argument. This is also shown in Figure 3.2.Those plots show the log-normal density function for different σ(g) and τ values. The σl valuestherefore tend to the expected value of the above defined log-normally distributed mutationstrengths. This expected value approaches σ(g) for small τ . The assumption allows to assumeσl ≈ σ(g) which simplifies the analysis because the σ mutation can be ignored. Together withfurther assumptions, this makes closed-form results of the theoretical analysis possible. Moreformally, this can be shown by an expansion analogously to [5, Section 7.3.2.4]. Consequently,the following probability density for a single descendant before projection is used in the further


analysisp1;1(x, r |x(g), r(g), σ(g)) ≈ px,r(x, r |x(g), r(g), σ(g)) = px(x)pr(r) (3.30)

for sufficiently small τ . The last equality in Equation (3.30) follows by statistical independenceof the mutation in parameter space.

From Line 8 of Algorithm 1 the probability density functions px(x) and pr(r) follow.

3.1.2.1.2.1 The Offspring Density Before Projection in x Direction The offspringdensity of an offspring in x direction before projection follows directly from the offspring gen-eration in Algorithm 1 (Line 8). As every component of the offspring’s parameter vector x isindependent and identically distributed (i.i.d.) according to a normal distribution it followstogether with the assumption σl ≈ σ(g) that

px(x) ≈ 1

σ(g)φ

(x− x(g)

σ(g)

)

=1√

2πσ(g)exp

−1

2

(x− x(g)

σ(g)

)2 (3.31)

holds. φ(x) denotes the probability density function of the standard normal distribution.

3.1.2.1.2.2 The Offspring Density Before Projection in r Direction and its Nor-mal Approximation The offspring density of an offspring in r direction before projectioncan be derived from Algorithm 1 (Line 8). In Section 2.1 it has already been explained that anindividual can be uniquely described by its distance x from 0 in direction of the cone axis andthe distance r from the cone axis. In addition, a simplifying assumption can be made due tothe problem’s symmetry. Because the problem’s dimensions 2, . . . , N represent an N −1 dimen-sional sphere, the coordinate system can be rotated without loss of generality. For a particularindividual x the coordinate system is rotated such that x1 points in direction of the individ-ual’s x1 direction, i.e., x1 points in the direction of the vector (x1, 0, . . . , 0)T and x2 points inthe direction of the individual’s other components, i.e., x2 points in the direction of the vector(0, x2, . . . , xN )T . This rotation of the coordinate system allows to conveniently represent an in-dividual. Note that the length in the direction of x2 is exactly the individual’s distance from thecone axis. With this observation, it immediately follows that an individual in the (x, r)T -space,(x1, r =

√∑Nk=2 x

2k

)Tis (x1, r, 0, . . . , 0)T in the rotated coordinate system. The offspring’s

distance from the cone’s axis, r =√∑N

k=2 x2k, is distributed according to a non-central χ distri-

bution ([7]) with N − 1 degrees of freedom ((positive) square root of a sum of N − 1 normally

i.i.d. variables). As, under the assumption σl ≈ σ(g), x2 = r(g) + σ(g)z2 ∼ N (r(g), σ(g)2)

and xk = σ(g)zk ∼ N (0, σ(g)2) for k ∈ {3, . . . , N}, it is the (positive) square root of a sum

of squared normally distributed random variables. Therefore, (r/σ(g))2 ∼ χ2N−1,(r(g)/σ(g))2 and

consequently r ∼ σ(g)χN−1,r(g)/σ(g) . The expressions χ2df,nc and χdf,nc denote the χ2 and χ

distributions, respectively, with df degrees of freedom and non-centrality parameter nc. Thecumulative distribution function and the probability density function of this scaled distributioncan be expressed using the corresponding functions of the non-central χ distribution. Let’s as-sume an arbitrary random variable S with probability density function pS(r) and cumulativedistribution function PS(r). The goal is to express pcS using pS and PcS using PS for some


0

5000

10000

15000

20000

25000

30000

35000

40000

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.0001pσ(σ|σ(g))

0

500

1000

1500

2000

2500

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.0001pσ(σ|σ(g))

0

200

400

600

800

1000

1200

1400

1600

1800

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.0001pσ(σ|σ(g))

0

1000

2000

3000

4000

5000

6000

7000

8000

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.0005pσ(σ|σ(g))

0

100

200

300

400

500

600

700

800

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.0005pσ(σ|σ(g))

0

50

100

150

200

250

300

350

400

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.0005pσ(σ|σ(g))

0

500

1000

1500

2000

2500

3000

3500

4000

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.001pσ(σ|σ(g))

0

50

100

150

200

250

300

350

400

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.001pσ(σ|σ(g))

0

20

40

60

80

100

120

140

160

180

200

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.001pσ(σ|σ(g))

0

100

200

300

400

500

600

700

800

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.005pσ(σ|σ(g))

0

10

20

30

40

50

60

70

80

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.005pσ(σ|σ(g))

0

5

10

15

20

25

30

35

40

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.005pσ(σ|σ(g))

Continued on the next page.


0

50

100

150

200

250

300

350

400

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.01pσ(σ|σ(g))

0

5

10

15

20

25

30

35

40

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.01pσ(σ|σ(g))

0

2

4

6

8

10

12

14

16

18

20

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.01pσ(σ|σ(g))

0

10

20

30

40

50

60

70

80

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.05pσ(σ|σ(g))

0

1

2

3

4

5

6

7

8

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.05pσ(σ|σ(g))

0

0.5

1

1.5

2

2.5

3

3.5

4

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.05pσ(σ|σ(g))

0

5

10

15

20

25

30

35

40

45

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.1pσ(σ|σ(g))

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.1pσ(σ|σ(g))

0

0.5

1

1.5

2

2.5

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.1pσ(σ|σ(g))

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1 1.2

σ

σ(g) = 0.1, τ = 0.15pσ(σ|σ(g))

0

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2

σ

σ(g) = 1, τ = 0.15pσ(σ|σ(g))

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.5 2 2.5 3

σ

σ(g) = 2, τ = 0.15pσ(σ|σ(g))

Figure 3.2: Log-normal density function for different parameters. The line in every subplot has been generated byevaluating Equation (3.28) for 10000 linearly spaced values of σ in the interval [max(0, σ(g)− 1), σ(g) + 1] for the particularvalue σ(g).


constant c ∈ R. For achieving this,

Pr[cS ≤ r] = Pr[S ≤ r

c

]= PS

(rc

)(3.32)

andd

dr

[PS

(rc

)]=

1

cpS

(rc

)(3.33)

are derived. Applying this to the particular case under consideration it follows that

Pσ(g)χdf,nc(x) = Pχdf,nc

( x

σ(g)

)(3.34)

and

pσ(g)χdf,nc(x) =

1

σ(g)pχdf,nc

( x

σ(g)

)(3.35)

hold. Hence,

pr(r) =1

σ(g)pχ

N−1,r(g)/σ(g)

( x

σ(g)

)(3.36)

follows where pχN−1,r(g)/σ(g)

denotes the probability density function of the non-central χ distri-

bution. In order to get tractable expressions in analysis steps that follow, pr(r) is approximatedby a normal distribution. From the offspring generation it follows that the distance from thecone’s axis of the offspring is

r =

√√√√(r(g) + σ(g)z2)2 +

N∑k=3

(σ(g)zk)2 =

√√√√r(g)2+ 2σ(g)r(g)z2 + σ(g)2

z22 + σ(g)2

N∑k=3

z2k. (3.37)

Now, 2σ(g)r(g)z2 +σ(g)2z2

2 and σ(g)2∑Nk=3 z

2k are replaced with normally distributed expressions

with mean values and standard deviations of the corresponding expressions. With the momentsof the i.i.d. variables according to a standard normal distribution zk, E[zk] = 0, and E[z2

k] = 1,the expected values

E[2σ(g)r(g)z2 + σ(g)2

z22

]= σ(g)2

(3.38)

and

E

[σ(g)2

N∑k=3

z2k

]= σ(g)2

(N − 2) (3.39)

follow. The variances are computed as

Var[2σ(g)r(g)z2 + σ(g)2

z22

]= E

[(2σ(g)r(g)z2 + σ(g)2

z22)2]− E

[2σ(g)r(g)z2 + σ(g)2

z22

]2

= E[4σ(g)2

r(g)2z2

2 + 4σ(g)3r(g)z3

2 + σ(g)4z4

2

]− σ(g)4

= 4σ(g)2r(g)2

+ 3σ(g)4 − σ(g)4

= 4σ(g)2r(g)2

+ 2σ(g)4

(3.40)


and

Var

[σ(g)2

N∑i=3

z2i

]= σ(g)4

N∑i=3

Var[z2i

]= σ(g)4

N∑i=3

E[z4i

]− E

[z2i

]2= σ(g)4

N∑i=3

2

= 2σ(g)4(N − 2)

(3.41)

where the moments of the i.i.d. variables according to a standard normal distribution zi E[z2i ] =

1, E[z3i ] = 0, and E[z4

i ] = 3 were used.With these results, the normal approximation follows as

r ≈√r(g)2

+N (σ(g)2, 4σ(g)2

r(g)2+ 2σ(g)4

) +N (σ(g)2(N − 2), 2σ(g)4

(N − 2))

=

√r(g)2

+N (σ(g)2+ σ(g)2

(N − 2), 4σ(g)2r(g)2

+ 2σ(g)4+ 2σ(g)4

(N − 2))

=

√r(g)2

+ σ(g)2(N − 1) +

√4σ(g)2

r(g)2+ 2σ(g)4

(N − 1)N (0, 1).

(3.42)

Substitution of the normalized quantity σ(g)∗ = σ(g)Nr(g) yields

r ≈

√√√√r(g)2

+σ(g)∗2r(g)2

N2(N − 1) +

√4σ(g)∗2r(g)4

N2+ 2

σ(g)∗4r(g)4

N4(N − 1)N (0, 1) (3.43)

= r(g)

√√√√1 +

σ(g)∗2

N

(1− 1

N

)+

2σ(g)∗

N

√1 +

σ(g)∗2

2N

(1− 1

N

)N (0, 1) (3.44)

= r(g)

√1 +

σ(g)∗2

N

(1− 1

N

)√√√√√1 +2σ(g)∗

N

√1 + σ(g)∗2

2N

(1− 1

N

)1 + σ(g)∗2

N

(1− 1

N

) N (0, 1). (3.45)

As the expression 2σ(g)∗

N

√1+σ(g)∗2

2N (1− 1N )

1+σ(g)∗2N (1− 1

N )N (0, 1) → 0 with N → ∞ (σ(g)∗ < ∞), a further

asymptotically simplified expression can be obtained by Taylor expansion of the square root at0 and cutoff after the linear term

r ≈ r(g)

√1 +

σ(g)∗2

N

(1− 1

N

)1 +σ(g)∗

N

√1 + σ(g)∗2

2N

(1− 1

N

)1 + σ(g)∗2

N

(1− 1

N

) N (0, 1)

= r(g)

√1 +

σ(g)∗2

N

(1− 1

N

)︸︷︷︸

r

+ r(g)σ(g)∗

N

√√√√1 + σ(g)∗2

2N

(1− 1

N

)1 + σ(g)∗2

N

(1− 1

N

)︸︷︷︸σr

N (0, 1).

(3.46)


Consequently, the mean of the asymptotic normal approximation of r is r and its standarddeviation is σr

pr(r) ≈1

σrφ

(r − rσr

)=

1√2πσr

exp

[−1

2

(r − rσr

)2].

(3.47)

Figure 3.3 shows plots of the exact r probability density function (Equation (3.36), solid line) incomparison to the normal approximation of the r probability density function (Equation (3.47),crosses) for different parameter values of interest. One can observe that for N = 10 and N = 40,the crosses are slightly off the solid line in a number of cases. For N = 200 and N = 1000, thesolid line is approximated well. Now that expressions for px(x) and pr(r) have been derived,the exact PQ expression can be treated further to arrive at closed-form PQ approximations. Inthe following, asymptotic lower and upper bounds are derived. Use of the approximations inEquations (3.30), (3.31), and (3.47), and subsequent insertion into Equation (3.26) yield

PQ(q) ≈∫ x=q

x=−∞

∫ r=∞

r=0px(x)pr(r) dr dx

−∫ r=∞

r=q/√ξ

∫ x=q

x=−(1/√ξ)r+(1+1/ξ)q

px(x)pr(r) dx dr

(3.48)

=

∫ x=q

x=−∞px(x)

∫ r=∞

r=0pr(r) dr︸︷︷︸

=1

dx

−∫ r=∞

r=q/√ξ

[∫ x=q

x=−(1/√ξ)r+(1+1/ξ)q

px(x) dx

]pr(r) dr

(3.49)

≈∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2 dx

−∫ r=∞

r=q/√ξ

∫ x=q

x=−(1/√ξ)r+(1+1/ξ)q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2 dx

× 1√

2πσrexp

[−1

2

(r − rσr

)2]

dr.

(3.50)

Expressing the integrals where possible using the cumulative distribution function of the stan-


0

5

10

15

20

25

30

35

40

0.85 0.9 0.95 1 1.05 1.1 1.15

r

N = 10, x(g) = 1, r(g) = 1, σ(g) = 0.01

exact pr(r)normal approx. of pr(r)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.5 1 1.5 2 2.5

r

N = 10, x(g) = 1, r(g) = 1, σ(g) = 0.1


0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10 12 14

r

N = 10, x(g) = 1, r(g) = 1, σ(g) = 1


0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15 20 25 30

r

N = 10, x(g) = 1, r(g) = 1, σ(g) = 2


0

0.02

0.04

0.06

0.08

0.1

0.12

0 10 20 30 40 50 60 70

r

N = 10, x(g) = 1, r(g) = 1, σ(g) = 5


0

0.01

0.02

0.03

0.04

0.05

0.06

0 20 40 60 80 100 120 140

r

N = 10, x(g) = 1, r(g) = 1, σ(g) = 10


0

5

10

15

20

25

30

35

40

0.85 0.9 0.95 1 1.05 1.1 1.15

r

N = 40, x(g) = 1, r(g) = 1, σ(g) = 0.01


0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 0.5 1 1.5 2 2.5

r

N = 40, x(g) = 1, r(g) = 1, σ(g) = 0.1


0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10 12 14 16

r

N = 40, x(g) = 1, r(g) = 1, σ(g) = 1


0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15 20 25 30 35

r

N = 40, x(g) = 1, r(g) = 1, σ(g) = 2


0

0.02

0.04

0.06

0.08

0.1

0.12

0 10 20 30 40 50 60 70 80

r

N = 40, x(g) = 1, r(g) = 1, σ(g) = 5


0

0.01

0.02

0.03

0.04

0.05

0.06

0 20 40 60 80 100 120 140 160

r

N = 40, x(g) = 1, r(g) = 1, σ(g) = 10




0

5

10

15

20

25

30

35

40

0.85 0.9 0.95 1 1.05 1.1 1.15

r

N = 200, x(g) = 1, r(g) = 1, σ(g) = 0.01


0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8

r

N = 200, x(g) = 1, r(g) = 1, σ(g) = 0.1


0

0.1

0.2

0.3

0.4

0.5

0.6

4 6 8 10 12 14 16 18 20 22 24

r

N = 200, x(g) = 1, r(g) = 1, σ(g) = 1


0

0.05

0.1

0.15

0.2

0.25

0.3

10 15 20 25 30 35 40 45 50

r

N = 200, x(g) = 1, r(g) = 1, σ(g) = 2


0

0.02

0.04

0.06

0.08

0.1

0.12

20 30 40 50 60 70 80 90 100 110 120

r

N = 200, x(g) = 1, r(g) = 1, σ(g) = 5


0

0.01

0.02

0.03

0.04

0.05

0.06

60 80 100 120 140 160 180 200 220 240

r

N = 200, x(g) = 1, r(g) = 1, σ(g) = 10


0

5

10

15

20

25

30

35

40

45

0.9 0.95 1 1.05 1.1 1.15 1.2

r

N = 1000, x(g) = 1, r(g) = 1, σ(g) = 0.01


0

1

2

3

4

5

6

2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4

r

N = 1000, x(g) = 1, r(g) = 1, σ(g) = 0.1


0

0.1

0.2

0.3

0.4

0.5

0.6

22 24 26 28 30 32 34 36 38 40 42

r

N = 1000, x(g) = 1, r(g) = 1, σ(g) = 1


0

0.05

0.1

0.15

0.2

0.25

0.3

45 50 55 60 65 70 75 80 85

r

N = 1000, x(g) = 1, r(g) = 1, σ(g) = 2


0

0.02

0.04

0.06

0.08

0.1

0.12

110 120 130 140 150 160 170 180 190 200 210

r

N = 1000, x(g) = 1, r(g) = 1, σ(g) = 5


0

0.01

0.02

0.03

0.04

0.05

0.06

220 240 260 280 300 320 340 360 380 400 420

r

N = 1000, x(g) = 1, r(g) = 1, σ(g) = 10


Figure 3.3: Normal approximation of the r probability density function (Equation (3.47)) in comparison to the exact rprobability density function (Equation (3.36)) for different parameter values. The solid line in every figure has been generatedby evaluating Equation (3.36) for 1000 linearly spaced values of r in the interval [max(0, r(g) − 100σr), r(g) + 100σr]. Theparameters indicated in the title of a particular subfigure have been kept constant. The crosses show the results of evaluatingEquation (3.47) for the same values of r and the same constant parameters.


dard normal distribution Φ(·), Equation (3.50) can be turned into

PQ(q) ≈ Φ

(q − x(g)

σ(g)

)

−∫ r=∞

r=q/√ξ

[Φ

(q − x(g)

σ(g)

)− Φ

(−(1/

√ξ)r + (1 + 1/ξ)q − x(g)

σ(g)

)]

× 1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

(3.51)

= Φ

(q − x(g)

σ(g)

)−∫ r=∞

r=q/√ξ

Φ

(q − x(g)

σ(g)

)1√

2πσrexp

[−1

2

(r − rσr

)2]

dr

+

∫ r=∞

r=q/√ξ

Φ

(−(1/

√ξ)r + (1 + 1/ξ)q − x(g)

σ(g)

)

× 1√2πσr

exp

[−1

2

(r − rσr

)2]

dr.

(3.52)

Equation (3.52) can further be simplified and subsequently lower bounded to yield

PQ(q) ≈ Φ

(q − x(g)

σ(g)

)− Φ

(q − x(g)

σ(g)

)[1− Φ

(q/√ξ − rσr

)]

+

∫ r=∞

r=q/√ξ

Φ

(−(1/

√ξ)r + (1 + 1/ξ)q − x(g)

σ(g)

)

× 1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

(3.53)

= Φ

(q − x(g)

σ(g)

)Φ

(q/√ξ − rσr

)

+

∫ r=∞

r=q/√ξ

Φ

(−(1/

√ξ)r + (1 + 1/ξ)q − x(g)

σ(g)

)

× 1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

(3.54)

≥ Φ

(q − x(g)

σ(g)

)Φ

(q/√ξ − rσr

). (3.55)

Because Φ(·) and 1σrφ(r−rσr

)= 1√

2πσrexp

[−1

2

(r−rσr

)2]

are non-negative functions, the lower

bound in Equation (3.55) can be concluded. The second summand is non-negative.To arrive at an upper bound, the integration order in the subtrahend of Equation (3.26)

can be reversed yielding Equation (3.27). Analogously to the derivation of the lower bound,Equations (3.30), (3.31), (3.47), and (3.50), and insertion into Equation (3.27) (different order


of integration in the subtrahend) yield

PQ(q) ≈∫ x=q

x=−∞

∫ r=∞

r=0px(x)pr(r) dr dx

−∫ x=q

x=−∞

∫ r=∞

r=−√ξx+(√ξ+1/

√ξ)qpx(x)pr(r) dr dx

(3.56)

=

∫ x=q

x=−∞px(x)

∫ r=∞

r=0pr(r) dr︸︷︷︸

=1

dx

−∫ x=q

x=−∞px(x)

[∫ r=∞

r=−√ξx+(√ξ+1/

√ξ)qpr(r) dr

]dx

(3.57)

≈∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2 dx

−∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

×{∫ r=∞

r=−√ξx+(√ξ+1/

√ξ)q

1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

}dx.

(3.58)

Expressing the integrals where possible using the cumulative distribution function of the stan-dard normal distribution Φ(·), Equation (3.58) can be turned into

PQ(q) ≈ Φ

(q − x(g)

σ(g)

)

−∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

×[1− Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)]dx

(3.59)

= Φ

(q − x(g)

σ(g)

)− Φ

(q − x(g)

σ(g)

)

+

∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

× Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx.

(3.60)


Equation (3.60) can further be simplified and then upper bounded yielding

PQ(q) ≈∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

× Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx

(3.61)

≤∫ x=∞

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

× Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx

(3.62)

= Φ

(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)2

+ σ2r/ξ

. (3.63)

The step from Equation (3.61) to Equation (3.62) is justified because the expression inside theintegral is non-negative. Therefore, an increase in the upper bound of the integral results in anincrease of the result of evaluating the integral. While the integral in Equation (3.61) has anupper bound of q and a closed form solution is not apparent, the integral in Equation (3.62) hasan upper bound of ∞ and can be solved analytically. To this end, the identity∫ ∞

−∞e−

12t2Φ(at+ b) =

√2πΦ

(b√

1 + a2

), (3.64)

which has been derived and proven in [5, Equation A.10], is used. The substitution t := x−x(g)

σ(g)

that implies dx = σ(g) dt yields for Equation (3.62)

∫ x=∞

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx

=

∫ t=∞

t=−∞

1√2πe−

12t2Φ

(−√ξ(σ(g)t+ x(g)) + (

√ξ + 1/

√ξ)q − r

σr

)dt.

(3.65)

In this form, Equation (3.64) can directly be used for Equation (3.65) with

b =−√ξx(g) + (

√ξ + 1/

√ξ)q − r

σr(3.66)

and

a =−√ξσ(g)

σr. (3.67)

Insertion of Equation (3.66) and Equation (3.67) into Equation (3.64) results, after simplification,in Equation (3.63). Consequently, PQ(q) can be bounded in the asymptotic case considered, as

Φ

(q − x(g)

σ(g)

)Φ

(q/√ξ − rσr

)≤ PQ(q) ≤ Φ

(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)2

+ σ2r/ξ

. (3.68)


3.1.2.1.2.3 The Derived PQ(q) Upper Bound as an Approximation for PQ(q) Thegoal is to find a good approximation for PQ(q) for the asymptotic cases. As it turns out, theupper bound (Equation (3.63)) is a good approximation under certain assumptions. To see this,the error introduced from Equation (3.61) to Equation (3.62) is further analyzed. This errorcan be estimated from∫ x=∞

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx (3.69)

=

∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx (3.70)

+

∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx (3.71)

where the second summand represents the error. Analyzing the error term further assumingξ →∞ results in

∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−√ξx+ (√ξ + 1/

√ξ)q − r

σr

)dx (3.72)

=

∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−x+ (1 + 1/ξ)q − r/√ξσr/√ξ

)dx (3.73)

'∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

(−x+ q − r/√ξσr/√ξ

)dx (3.74)

=

∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

((q − x)

√ξ − r

σr

)dx (3.75)

where 1/ξ → 0 has been used in the step from Equation (3.73) to Equation (3.74). As Φ(·)is a monotonically increasing function and (q−x)

√ξ−r

σr≤ (q−q)√ξ−r

σrfor x ≥ q, Φ

((q−x)

√ξ−r

σr

)≤

Φ(

(q−q)√ξ−rσr

)holds for x ≥ q. Hence, Equation (3.75) can be bounded as

∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

((q − x)

√ξ − r

σr

)dx (3.76)

≤∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

((q − q)√ξ − r

σr

)dx. (3.77)


Assuming N → ∞ together with Equation (3.46), r ' r(g)

√1 + σ(g)∗2

N can be concluded. Sub-

stituting this and the quantity σr = r(g)σ∗rN into Equation (3.77) results in

∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

((q − q)√ξ − r

σr

)dx (3.78)

'∫ x=∞

x=q

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2Φ

−Nr(g)

√1 + σ(g)∗2

N

σ∗rr(g)

dx (3.79)

=

[1− Φ

(q − x(g)

σ(g)

)]Φ

−Nσ∗r

√1 +

σ(g)∗2

N

︸︷︷︸

→0 for Nσ∗r→∞

.(3.80)

Consequently, it can be concluded from Equation (3.80) that for sufficiently large ξ and suffi-ciently large N , the error introduced from Equation (3.61) to Equation (3.62) vanishes. There-fore, the upper bound in Equation (3.68) is an approximation for PQ(q) for sufficiently large ξand sufficiently large N .

3.1.2.1.2.4 The PQ(q) Approximation in the case that the probability of feasibleoffspring tends to 1 By taking a close look at Equation (3.61), one can see that the expressioncan be further simplified under certain conditions. Normalization of σr and multiplication of

the argument to Φ(·) by 1/√ξ

1/√ξ

in Equation (3.61) results in

PQ(q) ≈∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

× Φ

(−x+ (1 + 1/ξ)q − r/√ξσ∗rr(g)/(N

√ξ)

)dx

(3.81)

=

∫ x=q

x=−∞

1√2πσ(g)

exp

−1

2

(x− x(g)

σ(g)

)2

× Φ

(N√ξ−x/r(g) + (1 + 1/ξ)q/r(g) − r/(r(g)

√ξ)

σ∗r

)dx.

(3.82)


Assuming N√ξ →∞, one observes that

Φ

(N√ξ− xr(g) + (1 + 1

ξ ) qr(g) − r

r(g)√ξ

σ∗r

)≈ 1 ⇐⇒

(1 +

1

ξ

)q

r(g)− x

r(g)− r

r(g)√ξ> 0 (3.83)

⇐⇒ x <

(1 +

1

ξ

)q − r√

ξ(3.84)

⇐⇒ q <

(1 +

1

ξ

)q − r√

ξ(3.85)

⇐⇒ 0 <q

ξ− r√

ξ(3.86)

⇐⇒ r√ξ<q

ξ(3.87)

⇐⇒ r√ξ

q< 1. (3.88)

From Equation (3.84) to Equation (3.85), x ≤ q that follows from the bounds of the integration,has been used. The resulting condition (Equation (3.88)) means that the simplification can beused if the probability of generating feasible offspring is high. This can be seen in the followingway. The probability for a particular value q to be feasible is given by

Pr[√

ξr ≤ |q|]

=

∫ r=q/√ξ

r=0pr(r) dr ≈ Φ

(q/√ξ − rσr

)= Φ

(Nq/√ξ − r

σ∗rr(g)

). (3.89)

It is the integration from the cone axis up to the cone boundary at the given value q. ForN →∞, it follows that

Φ

(Nq/√ξ − r

σ∗rr(g)

)≈ 1 ⇐⇒ q/

√ξ − r

σ∗rr(g)> 0 (3.90)

⇐⇒ q/√ξ − r > 0 (3.91)

⇐⇒ q/√ξ > r (3.92)

⇐⇒ q > r√ξ (3.93)

⇐⇒ r√ξ

q< 1. (3.94)

Under this condition, Equation (3.81) can be simplified in the asymptotic case N →∞ to

PQ(q) ≈ Φ

(q − x(g)

σ(g)

)for q > r

√ξ. (3.95)

with

pQ(q) =d

dqPQ(q) ≈ 1√

2πσe− 1

2

(q−x(g)

σ(g)

)2

for q > r√ξ. (3.96)

3.1.2.1.2.5 The PQ(q) Approximation Taking the upper bound from Equation (3.68)for the case that the probability of generating feasible offspring tends to 0 and Equation (3.95)


for the other case, this allows for the definition of the approximated PQ(q) function

PQ(q) ≈

PQfeas(q) := Φ

(q − x(g)

σ(g)

), for q > r

√ξ

PQinfeas(q) := Φ

(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)2

+ σ2r/ξ

, otherwise.

(3.97)

(3.98)

Taking the derivative with respect to q yields

pQ(q) ≈

pQfeas(q) =1√

2πσ(g)e− 1

2

(q−x(g)

σ(g)

)2

, for q > r√ξ

pQinfeas(q) =

(1 + 1/ξ)√σ(g)2

+ σ2r/ξ

× 1√

2πexp

−1

2

(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)2

+ σ2r/ξ

2 , otherwise.

(3.99)

(3.100)

They represent approximations for the cases that the offspring generation step leads to feasibleoffspring with high probability and that the offspring generation step leads to infeasible offspringwith high probability, respectively. Because those are disjoint events in the limit case, theycomplement each other. Figure 3.4 visualizes the bounds and approximations for PQ(q). Whilethe parameters indicated in the subtitle are kept constant for every particular subplot, q varies.The solid line has been generated by numerical integration of Equation (3.27) together withEquations (3.30), (3.31), and (3.36). As described in the derivation of Equation (3.30), the solidline is therefore exact for τ → 0. The function has been evaluated for 100 different values ofq in the interval [max(0, x(g) − 10σ(g), x(g) + 10σ(g))]. The parameters indicated in the title ofevery subplot have been kept constant. The positions of the plus markers on the y-axis havebeen generated by numerical integration of Equation (3.27) together with Equations (3.30),(3.31), and (3.47) for the same values of q. The boxes with a point and the crosses indicatethe results of the derived lower bound (Equation (3.55)) and upper bound (Equation (3.63))of PQ(q), respectively. The stars show the results of the derived simplification for the casethat the probability of feasible offspring tends to 1. That is, the stars show the evaluationof Equation (3.95) for varying q and constant parameters as indicated in the subtitle of theparticular subplot. The vertical black line indicates the value r

√ξ. If it is not shown in a

subplot, it is smaller than the smallest q value that is shown. The subplots in the right columnshow the situation where the parental individual is on the cone boundary. The subplots in theleft column show the situation where the parental individual is inside the cone.Consequently,the plots show the approximation quality for the cases that have been considered. The subplotsin the left column represent the feasible case. In those plots one can see that the normalapproximation for the feasible case indeed is a good approximation for PQ(q). One can see thatwith increasing ξ the upper bound approximates PQ(q) with increasing precision. It can be seenthat the upper bound as an PQ(q) approximation is in particular better than the feasible caseapproximation for q < r

√ξ.

Now, the expected value defined in Equation (3.17) can be approximated for both cases. Theyare denoted as E[q1;λfeas] and E[q1;λinfeas], respectively. Consequently, approximate expressionsfor the progress rate follow with insertion into Equation (3.15) for both cases.


0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

q

N = 10, x(g) = 1, r(g) = 0.01, σ(g) = 0.1, ξ = 0.01

PQ(q)

exact

normal approx. for r

normal approx. for r (lower bound)

normal approx. for r (upper bound)

normal approx. for r (feasible case)0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

q

N = 10, x(g) = 1, r(g) = 10, σ(g) = 0.1, ξ = 0.01

PQ(q)

exact




normal approx. for r (feasible case)

0

0.2

0.4

0.6

0.8

1

0.9 0.95 1 1.05 1.1

q

N = 10, x(g) = 1, r(g) = 0.01, σ(g) = 0.01, ξ = 1

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0.9 0.95 1 1.05 1.1

q

N = 10, x(g) = 1, r(g) = 1, σ(g) = 0.01, ξ = 1

PQ(q)

exact





0

0.2

0.4

0.6

0.8

1

0.9 0.95 1 1.05 1.1

q

N = 10, x(g) = 1, r(g) = 0.01, σ(g) = 0.01, ξ = 10

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0.9 0.95 1 1.05 1.1

q

N = 10, x(g) = 1, r(g) = 0.316228, σ(g) = 0.01, ξ = 10

PQ(q)

exact







0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

q

N = 40, x(g) = 1, r(g) = 0.01, σ(g) = 0.1, ξ = 0.01

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

q

N = 40, x(g) = 1, r(g) = 10, σ(g) = 0.1, ξ = 0.01

PQ(q)

exact





0

0.2

0.4

0.6

0.8

1

0.9 0.95 1 1.05 1.1

q

N = 40, x(g) = 1, r(g) = 0.01, σ(g) = 0.01, ξ = 1

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0.9 0.95 1 1.05 1.1

q

N = 40, x(g) = 1, r(g) = 1, σ(g) = 0.01, ξ = 1

PQ(q)

exact





0

0.2

0.4

0.6

0.8

1

0.99 0.995 1 1.005 1.01

q

N = 40, x(g) = 1, r(g) = 0.01, σ(g) = 0.001, ξ = 10

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0.99 0.995 1 1.005 1.01

q

N = 40, x(g) = 1, r(g) = 0.316228, σ(g) = 0.001, ξ = 10

PQ(q)

exact







0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

q

N = 400, x(g) = 1, r(g) = 0.01, σ(g) = 0.1, ξ = 0.01

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2

q

N = 400, x(g) = 1, r(g) = 10, σ(g) = 0.1, ξ = 0.01

PQ(q)

exact





0

0.2

0.4

0.6

0.8

1

0.99 0.995 1 1.005 1.01

q

N = 400, x(g) = 1, r(g) = 0.01, σ(g) = 0.001, ξ = 1

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0.99 0.995 1 1.005 1.01

q

N = 400, x(g) = 1, r(g) = 1, σ(g) = 0.001, ξ = 1

PQ(q)

exact





0

0.2

0.4

0.6

0.8

1

0.99 0.995 1 1.005 1.01

q

N = 400, x(g) = 1, r(g) = 0.01, σ(g) = 0.001, ξ = 10

PQ(q)

exact





0.2

0.4

0.6

0.8

1

0.99 0.995 1 1.005 1.01

q

N = 400, x(g) = 1, r(g) = 0.316228, σ(g) = 0.001, ξ = 10

PQ(q)

exact





Figure 3.4: Comparison of PQ(q) approximations and bounds. The solid line has been generated by numerical integrationof Equation (3.27) together with Equations (3.30), (3.31), and (3.36). As described in the derivation of Equation (3.30),the solid line is therefore exact for τ → 0. The function has been evaluated for 100 different values of q in the interval[max(0, x(g) − 10σ(g), x(g) + 10σ(g))]. The parameters indicated in the title of every subplot have been kept constant. Thepositions of the plus markers on the y-axis have been generated by numerical integration of Equation (3.27) together withEquations (3.30), (3.31), and (3.47) for the same values of q. The boxes with a point and the crosses indicate the resultsof the derived lower bound (Equation (3.55)) and upper bound (Equation (3.63)) of PQ(q), respectively. The stars showthe results of the derived simplification for the case that the probability of feasible offspring tends to 1. That is, the starsshow the evaluation of Equation (3.95) for varying q and constant parameters as indicated in the subtitle of the particularsubplot. The vertical black line indicates the value r

√ξ. If it is not shown in a subplot, it is smaller than the smallest

q value that is shown. The subplots in the right column show the situation where the parental individual is on the coneboundary. The subplots in the left column show the situation where the parental individual is inside the cone.


3.1.2.1.2.6 The Approximate x Progress Rate in the case that the probability offeasible offspring tends to 1 Insertion of Equations (3.97) and (3.99) into Equation (3.17)yields for the feasible case

E[q1;λfeas] ≈ λ∫ q=∞

q=0q pQfeas(q)[1− PQfeas(q)]

λ−1 dq (3.101)

= λ

∫ q=∞

q=r√ξq

1√2πσ(g)

e− 1

2

(q−x(g)

σ(g)

)2 [1− Φ

(q − x(g)

σ(g)

)]λ−1

dq. (3.102)

The substitutionq − x(g)

σ(g):= −t (3.103)

is used. It follows thatq = −tσ(g) + x(g) (3.104)

anddq = −σ(g) dt. (3.105)

Using normalized quantities, t can be expressed as

t =−N(q − x(g))

σ(g)∗r(g). (3.106)

Assuming N →∞ yields for the upper bound tu = −∞. For the lower bound it follows that

tl =−N(r

√ξ − x(g))

σ(g)∗r(g)(3.107)

= − N

σ(g)∗

(r√ξ

r(g)− x(g)

r(g)

). (3.108)

Because r ' r(g)

√1 + σ(g)∗2

N (Equation (3.46)), for N → ∞ and σ(g)∗2 � N , r√ξ ' r(g)

√ξ ≤

x(g) follows. The last inequality follows from the fact that the parental individual is feasible(ensured by projection). Hence, it follows that for N →∞

tl =N

σ(g)∗

(x(g)

r(g)− r√ξ

r(g)

)︸︷︷︸

≥0

(3.109)

' ∞ (3.110)

holds. Applying the substitution results in

E[q1;λfeas] ≈ −λ∫ t=−∞

t=∞(x(g) − tσ(g))

σ(g)

√2πσ(g)

e−12t2

1− Φ(−t)︸︷︷︸=Φ(t)

λ−1

dt (3.111)

= λ

∫ t=∞

t=−∞(x(g) − tσ(g))

1√2πe−

12t2Φ(t)λ−1 dt (3.112)

= x(g) λ√2π

∫ t=∞

t=−∞e−

12t2Φ(t)λ−1 dt− σ(g) λ√

2π

∫ t=∞

t=−∞te−

12t2Φ(t)λ−1 dt (3.113)

= x(g) − σ(g)c1,λ. (3.114)

(3.115)


In the step from Equation (3.111) to Equation (3.112), the fact that taking the negative ofan integral can be expressed by exchanging the lower and upper bounds of the integral andthe symmetry of the standard normal distribution have been used. In Equation (3.113), theintegrals contain the probability density function of the order statistic of standard normals. Thefirst integral is an integration over all possible values. Hence, this equals 1. The second integralis its expected value. It has been introduced in [10] as the so-called progress coefficient c1,λ. Itis also defined in [5, Equation 3.100]. Consequently, the asymptotic normalized x progress ratefor the case of feasible offspring reads

ϕx∗feas =

N(x(g) − E[q1;λfeas]

)x(g)

(3.116)

=N(x(g) − x(g) + σ(g)c1,λ

)x(g)

(3.117)

=N

x(g)σ(g)c1,λ (3.118)

=N

x(g)σ(g)∗ r

(g)

Nc1,λ (3.119)

=r(g)

x(g)σ(g)∗c1,λ. (3.120)

3.1.2.1.2.7 The Approximate x Progress Rate in the case that the probability offeasible offspring tends to 0 Analogously to the feasible case, insertion of Equations (3.98)and (3.100) into Equation (3.17) yields for the other case

E[q1;λinfeas] ≈ λ∫ q=∞

q=0q pQinfeas(q)[1− PQinfeas(q)]

λ−1 dq (3.121)

= λ

∫ q=∞

q=0q

(1 + 1/ξ)√σ(g)2

+ σ2r/ξ

1√2π

exp×

−1

2

(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)2

+ σ2r/ξ

2×

1− Φ

(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)2

+ σ2r/ξ

λ−1

dq.

(3.122)

The substitution(1 + 1/ξ)q − x(g) − r/√ξ√

σ(g)2+ σ2

r/ξ

:= −t (3.123)

is used. It follows that

q =1

(1 + 1/ξ)

(−√σ(g)2

+ σ2r/ξ t+ x(g) + r/

√ξ

)(3.124)

and

dq = −

√σ(g)2

+ σ2r/ξ

(1 + 1/ξ)dt. (3.125)


Using the normalized σ(g)∗, σr ' σ(g) for N → ∞, and σ(g)∗ � N (derived from Equa-tion (3.46)), t can be expressed as

t = −(1 + 1/ξ)q − x(g) − r/√ξ√σ(g)∗2r(g)2

N2 + σ(g)∗2r(g)2

N2ξ

(3.126)

= − (1 + 1/ξ)q − x(g) − r/√ξ1

N√ξ

√ξσ(g)∗2r(g)2

+ σ(g)∗2r(g)2(3.127)

= −N√ξ

(1 + 1/ξ)q − x(g) − r/√ξ√ξσ(g)∗2r(g)2

+ σ(g)∗2r(g)2

(3.128)

= −N√ξ

[(1 + 1/ξ)q − x(g) − r/√ξ

σ(g)∗r(g)√ξ + 1

]. (3.129)

The lower bound in the transformed integral therefore follows assuming ξ � 1, N → ∞, using∞ > r ' r(g) ≥ 0, and knowing that 0 ≤ x(g) <∞ as

tl = −N√ξ

[(1 + 1/ξ)0− x(g) − r/√ξ

σ(g)∗r(g)√ξ + 1

](3.130)

= −N√ξ

[−x(g) − r/√ξσ(g)∗r(g)

√ξ + 1

](3.131)

' −N[−x(g) − r/√ξσ(g)∗r(g)

]︸︷︷︸

≤0

(3.132)

' ∞. (3.133)

Similarly, the upper bound follows with the same assumptions

tu = limq→∞

[−N

(q − x(g) − r/√ξ

σ(g)∗r(g)

)](3.134)

' −∞. (3.135)


Actually applying the substitution to Equation (3.122) leads to

E[q1;λinfeas] ≈ −λ√2π

(1

1 + 1/ξ

)∫ t=−∞

t=∞

(−√σ(g)2

+ σ2r/ξ t+ x(g) + r/

√ξ

)× e− 1

2t2 [1− Φ(−t)]λ−1 dt

(3.136)

=λ√2π

(1

1 + 1/ξ

)∫ t=∞

t=−∞

(−√σ(g)2

+ σ2r/ξ t+ x(g) + r/

√ξ

)× e− 1

2t2 [1− Φ(−t)]λ−1 dt

(3.137)

=λ√2π

(1

1 + 1/ξ

)∫ t=∞

t=−∞

(−√σ(g)2

+ σ2r/ξ t+ x(g) + r/

√ξ

)× e− 1

2t2Φ(t)λ−1 dt

(3.138)

= −

√σ(g)2

+ σ2r/ξ

1 + 1/ξ

λ√2π

∫ t=∞

t=−∞te−

12t2Φ(t)λ−1 dt︸︷︷︸

=c1,λ

+x(g) + r/

√ξ

1 + 1/ξ

λ√2π

∫ t=∞

t=−∞e−

12t2Φ(t)λ−1 dt︸︷︷︸

=1

(3.139)

= −

√σ(g)2

+ σ2r/ξ

1 + 1/ξc1,λ +

x(g) + r/√ξ

1 + 1/ξ(3.140)

=ξ

1 + ξ

(x(g) + r/

√ξ)− ξ

1 + ξ

(√σ(g)2

+ σ2r/ξ

)c1,λ. (3.141)

In the step from Equation (3.136) to Equation (3.137) the fact that taking the negative of anintegral can be expressed by exchanging the lower and upper bounds of the integral has beenused. The step from Equation (3.137) to Equation (3.138) follows from the symmetry of thestandard normal distribution. In Equation (3.139), the integrals in the first and second sum-mands contain the probability density function of the order statistic of standard normals. Thesecond integral is an integration over all possible values. Hence, this equals 1. The first integralis its expected value. It has been introduced in [10] as the so-called progress coefficient c1,λ. Itis also defined in [5, Equation 3.100]. Now, the normalized x progress rate (Equation (3.7)) for


the second case can be expressed using Equation (3.141)

ϕx∗infeas =

N(x(g) − E[q1;λinfeas]

)x(g)

(3.142)

=N

x(g)

(1 + ξ

1 + ξx(g) − E[q1;λinfeas]

)(3.143)

(3.144)

≈ N

x(g)

[x(g) + ξx(g) − ξx(g)

1 + ξ−√ξr

1 + ξ+

ξ

1 + ξ

(√σ(g)2

+ σ2r/ξ

)c1,λ

](3.145)

=N

1 + ξ

[1−

√ξr(g)

x(g)

√1 +

σ(g)∗2

N

(1− 1

N

)

+ ξr(g)

x(g)

√√√√σ(g)∗2

N2+

1

ξ

σ(g)∗2

N2

1 + σ(g)∗2

2N (1− 1N )

1 + σ(g)∗2

N (1− 1N )c1,λ

] (3.146)

' N

1 + ξ

1−√ξr(g)

x(g)

√1 +

σ(g)∗2

N+ ξ

r(g)

x(g)

σ(g)∗

N

√√√√1 +1

ξ

1 + σ(g)∗2

2N

1 + σ(g)∗2

N

c1,λ

(3.147)

=N

1 + ξ

1−√ξr(g)

x(g)

√1 +

σ(g)∗2

N+√ξ

√ξr(g)

x(g)

σ(g)∗

N

√√√√1 +1

ξ

1 + σ(g)∗2

2N

1 + σ(g)∗2

N

c1,λ

(3.148)

=N

1 + ξ

1−√ξr(g)

x(g)

√1 +

σ(g)∗2

N

+

√ξ

1 + ξ

√ξr(g)

x(g)σ(g)∗c1,λ

√√√√1 +1

ξ

1 + σ(g)∗2

2N

1 + σ(g)∗2

N

. (3.149)

From Equation (3.145) to Equation (3.146), σr of the normal approximation (Equation (3.46))together with normalization has been used. From Equation (3.146) to Equation (3.147), 1

N hasbeen neglected compared to 1 as N →∞.

3.1.2.1.2.8 The Approximate x Progress Rate - Combination Using the BestOffspring’s Feasibility Probability In order to get an approximate combined value ϕ∗x, theexpected values of both cases are weighted by their probability of occurrence. Those weightedvalues are then added yielding

ϕ∗x ≈ Pfeas(x(g), r(g), σ(g))ϕx

∗feas + [1− Pfeas(x

(g), r(g), σ(g))]ϕx∗infeas. (3.150)

The simplified version is applicable if the probability of generating feasible offspring tends to 1in the asymptotic case. In the opposite case, the result derived from the PQ(q) upper boundis used. For the approximated combined value, Pfeas(x

(g), r(g), σ(g)) is considered to denotethe probability that the best offspring after projection has been feasible before projection. Anoffspring with values (x, r)T is feasible if the condition r ≤ x/

√ξ holds. For a particular value

q = x, this probability is denoted by

Pr[√

ξr ≤ |q|]

=

∫ r=q/√ξ

r=0pr(r) dr ≈ Φ

(q/√ξ − rσr

). (3.151)

It is computed by integrating from the cone axis up to the constraint boundary at the givenq value. This particular q is the best among λ with probability density p1;λ(q) = λpQ(q)[1 −


PQ(q)]λ−1. Integration over all possible values of q yields

Pfeas(x(g), r(g), σ(g)) := λ

∫ q=∞

q=0Pr[√

ξr ≤ q]pQ(q)[1− PQ(q)]λ−1 dq (3.152)

≈ λ∫ q=∞

q=0Φ

(q/√ξ − rσr

)pQ(q)[1− PQ(q)]λ−1 dq. (3.153)

The expression in Equation (3.153) denotes the probability that the projected value of the bestout of λ projected values is immediately feasible. The substitution t := PQ(q) that impliesdt = pQ(q) dq and q = P−1

Q (t) results in

λ

∫ q=∞

q=0Φ

(q/√ξ − rσr

)pQ(q)[1− PQ(q)]λ−1 dq (3.154)

= λ

∫ t=1

t=0Φ

(P−1Q (t)/

√ξ − r

σr

)(1− t)λ−1 dt. (3.155)

Considering both cases (Equations (3.97) and (3.98)) for PQ, Equation (3.155) can be writtenas

λ

∫ t=1

t=0Φ

(P−1Q (t)/

√ξ − r

σr

)(1− t)λ−1 dt =

λ

∫ t=1

t=PQ(r√ξ)

Φ

(PQ−1feas(t)/

√ξ − r

σr

)(1− t)λ−1 dt

+ λ

∫ t=PQ(r√ξ)

t=0Φ

(PQ−1infeas(t)/

√ξ − r

σr

)(1− t)λ−1 dt.

(3.156)

Because r ' r(g) for N → ∞ two cases can be distinguished in Equation (3.156). First,for sufficiently small ξ, the second addend gets small. Second, for sufficiently large ξ, the firstaddend gets small. Consequently, assuming sufficiently large ξ, taking Equation (3.98) for PQ(q)and σr ' σ(g) for N →∞ yields

λ

∫ t=1

t=0Φ

(P−1Q (t)/

√ξ − r

σr

)(1− t)λ−1 dt (3.157)

= λ

∫ t=1

t=0Φ

[1

σ(g)

(σ(g)

√1 + ξ

Φ−1(t) +x(g) + r/

√ξ√

ξ + 1/√ξ− r)]

(1− t)λ−1 dt (3.158)

= λ

∫ t=1

t=0Φ

[Φ−1(t)√

1 + ξ+

x(g)

σ(g)(√ξ + 1√

ξ)

+r

σ(g)

1√ξ−√ξ − 1√

ξ√ξ + 1√

ξ

](1− t)λ−1 dt (3.159)

= λ

∫ t=1

t=0Φ

Φ−1(t)√1 + ξ

+1

σ(g)

x(g)√ξ− r

1 + 1ξ

(1− t)λ−1 dt (3.160)

' Φ

Φ−1( 11+λ)

√1 + ξ

+1

σ(g)

x(g)√ξ− r

1 + 1ξ

(3.161)

= Φ

[1√

1 + ξΦ−1

(1

1 + λ

)+

1

(1 + 1/ξ)

1

σ(g)

(x(g)

√ξ− r)]

(3.162)

' Φ

[1

σ(g)

(x(g)

√ξ− r)]

. (3.163)


From the step in Equation (3.160) to Equation (3.161), the fact that f(x) = λ(1 − x)λ−1 isthe probability density function of the Beta(1, λ) distribution has been used. The Beta(α, β)distribution is well-studied (see for example [7]). Its mean reads α+β

β and its variance readsαβ

(α+β+1)(α+β)2 . Hence, the mean in this particular case is 11+λ and the variance λ

(λ+2)(λ+1)2 .

This means that for large λ, the variance is proportional to O( 1λ). As a result, for large λ,

the variance tends to 0. Therefore - as an approximation - under the assumption of sufficientlylarge λ - the random variable can be replaced by its expected value in Equation (3.161). Moreformally, this can be shown by a Taylor expansion in a similar way as in [5, Appendix D.1.1]. Forξ →∞, this can further be simplified yielding the expression in Equation (3.163). Analogouslyto Equations (3.157) to (3.163), assuming sufficiently small ξ and thus taking Equation (3.97)for PQ(q) and σr ' σ(g) for N →∞ yields

λ

∫ t=1

t=0Φ

(P−1Q (t)/

√ξ − r

σr

)(1− t)λ−1 dt (3.164)

= λ

∫ t=1

t=0Φ

[1

σ(g)

(Φ−1(t)σ(g) + x(g)

√ξ

− r)]

(1− t)λ−1 dt (3.165)

= λ

∫ t=1

t=0Φ

[Φ−1(t)√

ξ+

x(g)

σ(g)√ξ− r

σ(g)

](1− t)λ−1 dt (3.166)

' Φ

[Φ−1( 1

1+λ)√ξ

+1

σ(g)

(x(g)

√ξ− r)]

(3.167)

' Φ

[1

σ(g)

(x(g)

√ξ− r)]

. (3.168)

Equations (3.163) and (3.168) lead to an interesting result. For σr ' σ(g), the probability of anoffspring being feasible can be asymptotically approximated by the single offspring feasibilityprobability, i.e.,

Pfeas(x(g), r(g), σ(g)) ' Φ

[1

σ(g)

(x(g)

√ξ− r)]

. (3.169)

This single offspring feasibility probability can be derived as follows. As already explained,an offspring with values (x, r)T is feasible if the condition r ≤ x/

√ξ holds. For a particular

value q = x, this probability is denoted by

Pr[√

ξr ≤ |q|]

=

∫ r=q/√ξ

r=0pr(r) dr ≈ Φ

(q/√ξ − rσr

)' Φ

(q/√ξ − rσ(g)

). (3.170)

The last equality results from Equation (3.46) (σr ≈ σ(g) for N → ∞). Integrating Equa-tion (3.170) over all possible values of q results in

Pfeas(x(g), r(g), σ(g)) '

∫ q=∞

q=−∞Φ

[1

σ(g)

(q√ξ− r)]

px(q) dq. (3.171)

with px(q) = 1√2πσ(g) exp

[−1

2

(q−x(g)

σ(g)

)2]

from Equation (3.31). The substitution t := q−x(g)

σ(g)

is used. It implies q = σ(g)t + x(g) and dq = σdt. With normalization it follows further that


t = N(

q−x(g)

σ(g)∗r(g)

). This implies tl = −∞ and tu =∞. Application of the substitution yields

∫ q=∞

q=0Φ

[1

σ(g)

(q√ξ− r)]

1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2 dq (3.172)

=

∫ t=∞

t=−∞Φ

[1

σ(g)

(σ(g)t+ x(g)

√ξ

− r)]

1√2πe−

12t2 dt. (3.173)

Now, Equation (3.64) can be applied with a = 1√ξ

and b = x(g)√ξσ(g) − r

σ(g) resulting in

∫ t=∞

t=−∞Φ

[1

σ(g)

(σ(g)t+ x(g)

√ξ

− r)]

1√2πe−

12t2 dt (3.174)

= Φ

1

σ(g)√

1 + 1ξ

(x(g)

√ξ− r) ' Φ

[1

σ(g)

(x(g)

√ξ− r)]

. (3.175)

The last step was derived under the assumption ξ →∞.A more exact approximation in comparison to Equation (3.169) can be obtained by making

use of the influence of ξ in Equation (3.156). The first addends in the argument to Φ(·) inEquations (3.162) and (3.167) deviate from each other significantly for small ξ only. Therefore, asan ad hoc approximation for the feasibility probability (discarding less terms), Equations (3.162)and (3.167) can for example be combined by the error function erf(ξ) yielding

Pfeas(x(g), r(g), σ(g))

≈ erf(ξ)Φ

[1√

1 + ξΦ−1

(1

1 + λ

)+

1

(1 + 1/ξ)

1

σ(g)

(x(g)

√ξ− r)]

+ (1− erf(ξ))Φ

[Φ−1( 1

1+λ)√ξ

+1

σ(g)

(x(g)

√ξ− r)]

.

(3.176)

The error function features a smooth transition from 0 to 1 for arguments from 0 to 1.Insertion of Equations (3.120) and (3.149) into Equation (3.150) finally results in the ap-

proximate normalized x progress rate

ϕ∗x ≈ Pfeas(x(g), r(g), σ(g))

r(g)

x(g)σ(g)∗c1,λ

+(

1− Pfeas(x(g), r(g), σ(g))

) N

1 + ξ

1−√ξr(g)

x(g)

√1 +

σ(g)∗2

N

+

√ξ

1 + ξ

√ξr(g)

x(g)σ(g)∗c1,λ

√√√√1 +1

ξ

1 + σ(g)∗2

2N

1 + σ(g)∗2

N

.(3.177)

Under the assumption N →∞, Equation (3.177) can further be simplified. By Taylor expansionand cutoff after the linear term, √

1 +σ(g)∗2

N' 1 +

σ(g)∗2

2N(3.178)


follows. Additionally,

1 + σ(g)∗2

2N

1 + σ(g)∗2

N

' 1 (3.179)

holds. Insertion of the asymptotic expressions in Equations (3.178) and (3.179) into Equa-tion (3.177) yields

ϕ∗x ≈ Pfeas(x(g), r(g), σ(g))

r(g)

x(g)σ(g)∗c1,λ

+(


)[ N

1 + ξ

(1−√ξr(g)

x(g)

(1 +

σ(g)∗2

2N

))

+

√ξ

1 + ξ

√ξr(g)

x(g)σ(g)∗c1,λ

√1 +

1

ξ

] (3.180)

= Pfeas(x(g), r(g), σ(g))

r(g)

x(g)σ(g)∗c1,λ

+(


)[ N

1 + ξ

(1−√ξr(g)

x(g)

)− N

√ξr(g)σ(g)∗2

(1 + ξ)2Nx(g)

+

√ξ

1 + ξ

√ξr(g)

x(g)σ(g)∗c1,λ

√ξ + 1

ξ

] (3.181)


r(g)

x(g)σ(g)∗c1,λ

+(


)[ N

1 + ξ

(1−√ξr(g)

x(g)

)−√ξr(g)

x(g)

σ(g)∗2

2(1 + ξ)

+

√ξr(g)

x(g)σ(g)∗c1,λ

√ξ(1 + ξ)

ξ(1 + ξ)2

] (3.182)


r(g)

x(g)σ(g)∗c1,λ

+(


)[ N

1 + ξ

(1−√ξr(g)

x(g)

)

+

√ξr(g)

x(g)

σ(g)∗√

1 + ξ

(c1,λ −

σ(g)∗

2√

1 + ξ

)].

(3.183)

Figure 3.5 shows plots comparing the derived closed-form approximation for ϕ∗x with Monte Carlosimulations and numerical integration results. The subplots show the normalized x progress ratefor varying normalized mutation strengths σ(g)∗. The parameters in the title are kept constantfor every particular subplot. The top row in every set of 3 × 2 plots shows the case that theparental individual is near the cone axis (note that here the feasible case dominates). Thebottom row in every set of 3 × 2 plots shows the case that the parental individual is in thevicinity of the cone boundary (note that here the infeasible case dominates). And the middlerow shows a case in between. The solid line has been generated by Monte Carlo simulation. Forthis, the generational loop has been run 105 times for a fixed parental individual and constantparameters. The solid line shows the mean of the normalized x progress over those 105 iterations.The crosses have been computed by numerical integration (using the normal approximation for


r). For this computation, Equations (3.15) and (3.17) have been numerically computed andsubsequently normalized using Equations (3.27), (3.30), (3.31), and (3.47). The boxes with apoint have been calculated by evaluating Equation (3.177) with Equation (3.176) The pluseshave been calculated by evaluating Equation (3.177) with Equation (3.169). The vertical blackline indicates the value of the normalized mutation strength in the steady state for the givenparameters. It has been calculated using Equation (3.298) (see Section 3.1.3.2 for steady stateinvestigations). As one can see, the approximation quality of ϕ∗x in the vicinity of the steadystate is rather good. Therefore, it can be used to investigate the dynamical behavior of the ESin the vicinity of the steady state.

3.1.2.2 Derivation of the r Progress Rate

From the definition of the progress rate (Equation (3.6)) and the pseudo-code of the ES (Algo-rithm 1, Lines 5 and 20) it follows that

ϕr(x(g), r(g), σ(g)) = r(g) − E[r(g+1) |x(g), r(g), σ(g)] (3.184)

= r(g) − E[qr1;λ |x(g), r(g), σ(g)]. (3.185)

This means that the expectation of the r value after projection of the best offspring is neededfor the derivation of the progress rate in r direction

E[qr1;λ |x(g), r(g), σ(g)] := E[qr1;λ] =

∫ qr=∞

qr=0qr pqr1;λ

(qr) dqr. (3.186)

Equation (3.186) follows directly from the definition of expectation where

pqr1;λ(q) := pqr1;λ

(qr |x(g), r(g), σ(g)) (3.187)

indicates the probability density function of the best (offspring with smallest q value) offspring’sqr value. This can be derived with a similar order statistic argument as for the x progress casein Section 3.1.2.1. Considering an arbitrary offspring with (x, r)T values before projection and(q, qr)

T values after projection. The qr value is the best among the λ values if its correspondingq value is the smallest among the λ offspring’s q values. This follows from the definition of theobjective function (Equation (2.1)). One arbitrarily selected mutation out of the total λ mu-tations has joint probability density pQ,Qr(q, qr) := pQ,Qr(q, qr |x(g), r(g), σ(g)) for its projectedvalues q and qr. For the particular q value to be the smallest, all other λ− 1 must have larger qvalues. Because they are statistically independent, this results in the probability [1−PQ(q)]λ−1

of all other mutations being larger than q. PQ(q) denotes the cumulative distribution functionof the random variable of the q values Q, introduced in Section 3.1.2.1. The density for themutation considered is thus pQ,Qr(q, qr)[1− PQ(q)]λ−1. Since there are λ possibilities for best qvalues, one obtains λpQ,Qr(q, qr)[1− PQ(q)]λ−1. Integration over all possible values of q yields

pqr1;λ(qr) = λ

∫ q=∞

q=0pQ,Qr(q, qr)[1− PQ(q)]λ−1 dq. (3.188)

Insertion of Equation (3.188) into Equation (3.186) results in

E[qr1;λ] =

∫ qr=∞

qr=0qr λ

∫ q=∞

q=0pQ,Qr(q, qr)[1− PQ(q)]λ−1 dq dqr (3.189)

= λ

∫ q=∞

q=0

[∫ qr=∞

qr=0qrpQ,Qr(q, qr) dqr

]︸︷︷︸

=:I(q)

[1− PQ(q)]λ−1 dq. (3.190)


0

0.05

0.1

0.15

0.2

0.25

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗x

x progress Monte Carlo sim.x progress with r normal approx.x progress closed-form approx.

x progress closed-form approx. (asym. feas. prob.)0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗x


x progress closed-form approx. (asym. feas. prob.)

0

10

20

30

40

50

60

70

80

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 5.005, λ = 20, τ = 1/√2N

ϕ∗x



1

2

3

4

5

6

7

8

9

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 0.505, λ = 20, τ = 1/√2N

ϕ∗x



−35−30−25−20−15−10−50

5

10

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 10, λ = 20, τ = 1/√2N

ϕ∗x


x progress closed-form approx. (asym. feas. prob.)−20

−15

−10

−5

0

5

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 1, λ = 20, τ = 1/√2N

ϕ∗x





0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗x



0.05

0.1

0.15

0.2

0.25

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗x



0

0.5

1

1.5

2

2.5

3

3.5

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.163114, λ = 20, τ = 1/√2N

ϕ∗x



20

40

60

80

100

120

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 5.005, λ = 20, τ = 1/√2N

ϕ∗x



0

0.5

1

1.5

2

2.5

3

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.316228, λ = 20, τ = 1/√2N

ϕ∗x



−40

−30

−20

−10

0

10

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 10, λ = 20, τ = 1/√2N

ϕ∗x





0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗x



0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗x



0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 0.505, λ = 20, τ = 1/√2N

ϕ∗x



1

2

3

4

5

6

7

8

9

10

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.163114, λ = 20, τ = 1/√2N

ϕ∗x



−45−40−35−30−25−20−15−10−50

5

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 1, λ = 20, τ = 1/√2N

ϕ∗x



−20

−15

−10

−5

0

5

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.316228, λ = 20, τ = 1/√2N

ϕ∗x



Figure 3.5: The subplots show the normalized x progress rate for varying normalized mutation strengths σ(g)∗. Theparameters in the title are kept constant for every particular subplot. The top row in every set of 3× 2 plots shows the casethat the parental individual is near the cone axis (note that here the feasible case dominates). The bottom row in everyset of 3 × 2 plots shows the case that the parental individual is in the vicinity of the cone boundary (note that here theinfeasible case dominates). And the middle row shows a case in between. The solid line has been generated by Monte Carlosimulation. For this, the generational loop has been run 105 times for a fixed parental individual and constant parameters.The solid line shows the mean of the normalized x progress over those 105 iterations. The crosses have been computedby numerical integration (using the normal approximation for r). For this computation, Equations (3.15) and (3.17) havebeen numerically computed and subsequently normalized using Equations (3.27), (3.30), (3.31), and (3.47). The boxes witha point have been calculated by evaluating Equation (3.177) with Equation (3.176) The pluses have been calculated byevaluating Equation (3.177) with Equation (3.169). The vertical black line indicates the value of the normalized mutationstrength in the steady state for the given parameters. It has been calculated using Equation (3.298) (see Section 3.1.3.2 forsteady state investigations).


The integral in I(q) in Equation (3.190) can be expressed in terms of the values before theprojection. Because the value qr is an individual’s r value after projection, it only takes valuesin the interval [0, x/

√ξ] for x ∈ R, x ≥ 0. All values outside this interval are projected onto

the cone boundary. The integral in Equation (3.190) is therefore split into multiple parts.Considering an infinitesimally small dq,

I(q) dq =

∫ r=q/√ξ

r=0rp1;1(q, r) dr dq

+q√ξ

(pQ(q) dq −

∫ r=q/√ξ

r=0p1;1(q, r) dr dq

)︸︷︷︸

= dPline

(3.191)

is derived. The first summand in Equation (3.191) corresponds to the part inside the cone(r ≤ q/√ξ). There, no projection occurs. Consequently, this part is equal to the first summandin I(q). The second summand corresponds to the projected part. Everything on the projectionline resulting in particular (q, qr)

T values, is projected to qr = q/√ξ. It can be expressed in

terms of Q. The probability for an offspring to fall onto the infinitesimally small area at qaround the projection line, equals dPline. The summand in dPline corresponds to the probabilitythat the projected x value equals q. It includes the feasible case. The subtrahend subtracts theprobability for an offspring to fall onto the infinitesimally small area at q around the line fromthe cone axis to the cone boundary. As described in Section 3.1.2.1, p1;1(x, r) denotes the jointprobability density of an offspring’s (x, r)T values before projection. The same argument leadingto Equation (3.30) is used. That is, assuming a sufficiently small value for τ , the log-normallydistributed offspring mutation strength σl tends to the parental mutation strength σ(g). Underthis assumption that σl ≈ σ(g), Equation (3.191) can be simplified yielding

I(q) dq =

∫ r=q/√ξ

r=0rpx(q)pr(r) dr dq

+q√ξ

(pQ(q) dq −

∫ r=q/√ξ

r=0px(q)pr(r) dr dq

) (3.192)

I(q) dq = px(q) dq

∫ r=q/√ξ

r=0rpr(r) dr

+q√ξ

(pQ(q) dq − px(q) dq

∫ r=q/√ξ

r=0pr(r) dr

) (3.193)

I(q) = px(q)

∫ r=q/√ξ

r=0rpr(r) dr

+q√ξ

(pQ(q)− px(q)

∫ r=q/√ξ

r=0pr(r) dr

).

(3.194)


Insertion of Equations (3.31) and (3.47) into Equation (3.194) yields

I(q) ≈ 1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2∫ r=q/

√ξ

r=0r

1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

+q√ξ

pQ(q)− 1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2

×∫ r=q/

√ξ

r=0

1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

).

(3.195)

The integral ∫ r=q/√ξ

r=0r

1√2πσr

exp

[−1

2

(r − rσr

)2]

dr (3.196)

can be solved by substituting t := r−rσr

, which implies dr = σr dt and r = σrt + r. The upper

bound tu = q/√ξ−rσr

and lower bound tl = − rσr

follow. The lower bound tl tends to −∞ for

N →∞ because normalization yields tl = −N rσ∗rr(g) . The application of the substitution results

in ∫ t=q/√ξ−rσr

t=−∞(σrt+ r)

1√2πe−

12t2 dt = σr

1√2π

∫ t=q/√ξ−rσr

t=−∞te−

12t2 dt

+ r

∫ t=q/√ξ−rσr

t=−∞

1√2πe−

12t2 dt

(3.197)

= −σr1√2π

exp

[−1

2

(q/√ξ − rσr

)2]

+ rΦ

(q/√ξ − rσr

).

(3.198)

In the last step, the identity ∫ t=x

t=−∞te−

12t2dt = −e− 1

2x2

(3.199)


from [5, Equation A.16] and the cumulative distribution function of the standard normal distri-bution have been used. Insertion of Equation (3.198) into Equation (3.195) results in

I(q) ≈ 1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2

×{−σr

1√2π

exp

[−1

2

(q/√ξ − rσr

)2]

+ rΦ

(q/√ξ − rσr

)}

+q√ξ

pQ(q)− 1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2

×∫ r=q/

√ξ

r=0

1√2πσr

exp

[−1

2

(r − rσr

)2]

dr

)(3.200)

=1√

2πσ(g)exp

−1

2

(q − x(g)

σ(g)

)2

×{−σr

1√2π

exp

[−1

2

(q/√ξ − rσr

)2]

+ rΦ

(q/√ξ − rσr

)}

+q√ξpQ(q)− q√

ξ

1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2Φ

(q/√ξ − rσr

)(3.201)

= − σr

2πσ(g)exp

−1

2

(q − x(g)

σ(g)

)2 exp

[−1

2

(q/√ξ − rσr

)2]

+r√

2πσ(g)exp

−1

2

(q − x(g)

σ(g)

)2Φ

(q/√ξ − rσr

)

+q√ξpQ(q)− q√

ξ

1√2πσ(g)

exp

−1

2

(q − x(g)

σ(g)

)2Φ

(q/√ξ − rσr

)(3.202)

=1√

2πσ(g)e− 1

2

(q−x(g)

σ(g)

)2 [(r − q√

ξ

)Φ

(q/√ξ − rσr

)− σr√

2πe− 1

2

(q/√ξ−rσr

)2]+

q√ξpQ(q).

(3.203)

3.1.2.2.1 The Approximate r Progress Rate in the case that the probability offeasible offspring tends to 0 Insertion of Equation (3.203) into Equation (3.190) results in


the expected r value after projection of the best offspring for the infeasible case

E[qr1;λinfeas] ≈ λ

∫ q=∞

q=0

1√2πσ(g)

e− 1

2

(q−x(g)

σ(g)

)2

×[(r − q√

ξ

)Φ

(q/√ξ − rσr

)− σr√

2πe− 1

2

(q/√ξ−rσr

)2]+

q√ξpQ(q)

}[1− PQ(q)]λ−1 dq

(3.204)

= λ

∫ q=∞

q=0

1√2πσ(g)

e− 1

2

(q−x(g)

σ(g)

)2

×[(r − q√

ξ

)Φ

(q/√ξ − rσr

)− σr√

2πe− 1

2

(q/√ξ−rσr

)2][1− PQ(q)]λ−1 dq

+1√ξλ

∫ q=∞

q=0q pQ(q)[1− PQ(q)]λ−1 dq︸︷︷︸

=E[q1;λ]

.

(3.205)

≈ 1√ξ

E[q1;λ]. (3.206)

In the last step the remaining integral is assumed to be negligible in the asymptotic case.Similar to Section 3.1.2.1 for the x progress rate, the normalized r progress rate can now beformulated. Using Equation (3.141) and σr ' σ(g) for N →∞ to get from Equation (3.212) toEquation (3.213), it reads

ϕr∗infeas = N

(r(g) − E[qr1;λinfeas

])

r(g)(3.207)

≈ N

(r(g) − 1√

ξE[q1;λinfeas]

)r(g)

(3.208)

= N

(1− 1√

ξr(g)E[q1;λinfeas]

)(3.209)

= N

(1− 1√

ξr(g)

[ξ

1 + ξ

(x(g) + r/

√ξ)− ξ

1 + ξ

(√σ(g)2

+ σ2r/ξ

)c1,λ

])(3.210)

= N

(1− ξ√

ξ(1 + ξ)r(g)

[x(g) + r/

√ξ −

(√σ(g)2

+ σ2r/ξ

)c1,λ

])(3.211)

= N

1−√ξ

(1 + ξ)

x(g)

r(g)+

r

r(g)√ξ−

√σ(g)2

+ σ2r/ξ

r(g)

c1,λ

(3.212)

' N

1−√ξ

(1 + ξ)

x(g)

r(g)+r(g)

√1 + σ(g)∗2/N

r(g)√ξ

−(σ(g)

r(g)

√1 + 1/ξ

)c1,λ

(3.213)

= N

1−√ξ

(1 + ξ)

x(g)

r(g)+

√1 + σ(g)∗2/N√ξ

−(σ(g)

r(g)

√1 + 1/ξ

)c1,λ

. (3.214)

3.1.2.2.2 The Approximate r Progress Rate in the case that the probability offeasible offspring tends to 1 If the offspring is feasible almost surely in the asymptotic


N → ∞ case, Equation (3.193) can be simplified further. The complete probability mass liesinside the cone. Consequently, the second summand vanishes. Additionally, the integral in thefirst summand yields the expected r value r because the bounds indicate the integration overthe whole feasible region for the given area dq. In the feasible case, I(q) therefore reads

Ifeas(q) = px(q)r. (3.215)

By insertion of Ifeas(q) as I(q) into Equation (3.190), use of PQfeas from Equation (3.97), anduse of px(x) from Equation (3.31) one obtains

E[qr1;λfeas] ≈ r λ

∫ q=∞

q=0

1√2πσ(g)

e− 1

2

(q−x(g)

σ(g)

)2 [1− Φ

(q − x(g)

σ(g)

)]λ−1

dq︸︷︷︸=1

(3.216)

= r. (3.217)

Similar to Section 3.1.2.1 for the x progress rate, the normalized r progress rate can now beformulated. It reads

ϕr∗feas = N

(r(g) − E[qr1;λfeas

])

r(g)(3.218)

≈ N(r(g) − r

)r(g)

(3.219)

' N

(r(g) − r(g)

√1 + σ(g)∗2

N

)r(g)

(3.220)

= N

1−

√1 +

σ(g)∗2

N

(3.221)

' N −N − Nσ(g)∗2

2N(3.222)

= −σ(g)∗2

2. (3.223)

In the step from Equation (3.221) to Equation (3.222) Taylor expansion of the square root andcutoff after the linear term (with subsequent distribution of N over the terms in the parentheses)have been performed.

3.1.2.2.3 The Approximate r Progress Rate - Combination Using the Best Off-spring’s Feasibility Probability Analogously to Equation (3.150), both cases are combinedinto

ϕ∗r ≈ Pfeas(x(g), r(g), σ(g))ϕr

∗feas + [1− Pfeas(x

(g), r(g), σ(g))]ϕr∗infeas. (3.224)


Insertion of Equations (3.214) and (3.223) into Equation (3.224) finally results in the approxi-mate normalized r progress rate

ϕ∗r ≈(


)×N

1−√ξ

(1 + ξ)

x(g)

r(g)+

√1 + σ(g)∗2/N√ξ

−(σ(g)

r(g)

√1 + 1/ξ

)c1,λ

− Pfeas(x

(g), r(g), σ(g))σ(g)∗2

2.

(3.225)

Figure 3.6 shows plots comparing the derived closed-form approximation for ϕ∗r with Monte Carlosimulations and numerical integration results. The subplots show the normalized r progress ratefor varying normalized mutation strengths σ(g)∗. The parameters in the title are kept constantfor every particular subplot. The top row in every set of 3 × 2 plots shows the case that theparental individual is near the cone axis (note that here the feasible case dominates). The bottomrow in every set of 3 × 2 plots shows the case that the parental individual is in the vicinity ofthe cone boundary (note that here the infeasible case dominates). And the middle row showsa case in between. The solid line has been generated by Monte Carlo simulation. For this, thegenerational loop has been run 105 times for a fixed parental individual and constant parameters.The solid line shows the mean of the normalized r progress over those 105 iterations. The crosseshave been computed by numerical integration (using the normal approximation for r). For thiscomputation, Equations (3.185) and (3.188) have been numerically computed and subsequentlynormalized using Equations (3.27), (3.30), (3.31), and (3.47). The boxes with a point have beencalculated by evaluating Equation (3.224) with Equations (3.176), (3.208), and (3.223). Thepluses have been calculated by evaluating Equation (3.224) with Equations (3.169), (3.208),and (3.223). The vertical black line indicates the value of the normalized mutation strength inthe steady state for the given parameters. It has been calculated using Equation (3.298) (seeSection 3.1.3.2 for steady state investigations).

3.1.2.3 Derivation of the SAR

From the definition of the SAR (Equation (3.10)) and the pseudo-code of the ES (Algorithm 1,Line 18) it follows that

ψ(x(g), r(g), σ(g)) = E

[σ(g+1) − σ(g)

σ(g)

∣∣∣∣x(g), r(g), σ(g)

](3.226)

= E

[σ1;λ − σ(g)

σ(g)

∣∣∣∣x(g), r(g), σ(g)

](3.227)

=

∫ σ=∞

σ=0

(σ − σ(g)

σ(g)

)pσ1;λ

(σ) dσ (3.228)

where pσ1;λ(σ) := pσ1;λ

(σ |x(g), r(g), σ(g)) denotes the probability density function of the bestoffspring’s mutation strength. Note that σ1;λ is not obtained by direct selection. It is theσ value of the individual with the best objective function value that is selected among the λoffspring. This probability density function can be derived as follows. A random sample σ fromthe conditional distribution density

pσ(σ |σ(g)) =1√2πτ

1

σexp

−1

2

(ln(σ/σ(g))

τ

)2 (3.229)


−50−45−40−35−30−25−20−15−10−50

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗r

r progress Monte Carlo sim.r progress with r normal approx.r progress closed-form approx.

r progress closed-form approx. (asym. feas. prob.)−90−80−70−60−50−40−30−20−10

0

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗r


r progress closed-form approx. (asym. feas. prob.)

−35

−30

−25

−20

−15

−10

−5

0

5

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 5.005, λ = 20, τ = 1/√2N

ϕ∗r


r progress closed-form approx. (asym. feas. prob.)−40

−35

−30

−25

−20

−15

−10

−5

0

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 0.505, λ = 20, τ = 1/√2N

ϕ∗r



−35

−30

−25

−20

−15

−10

−5

0

5

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 10, λ = 20, τ = 1/√2N

ϕ∗r



−15

−10

−5

0

5

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 1, λ = 20, τ = 1/√2N

ϕ∗r





−400

−350

−300

−250

−200

−150

−100

−50

0

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗r



−60

−50

−40

−30

−20

−10

0

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗r



−35

−30

−25

−20

−15

−10

−5

0

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.163114, λ = 20, τ = 1/√2N

ϕ∗r



−60

−50

−40

−30

−20

−10

0

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 5.005, λ = 20, τ = 1/√2N

ϕ∗r



0

0.5

1

1.5

2

2.5

3

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.316228, λ = 20, τ = 1/√2N

ϕ∗r


r progress closed-form approx. (asym. feas. prob.)−50−45−40−35−30−25−20−15−10−50

5

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 10, λ = 20, τ = 1/√2N

ϕ∗r





−140

−120

−100

−80

−60

−40

−20

0

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗r



−500

−400

−300

−200

−100

0

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ϕ∗r



−140

−120

−100

−80

−60

−40

−20

0

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 0.505, λ = 20, τ = 1/√2N

ϕ∗r



−350

−300

−250

−200

−150

−100

−50

0

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.163114, λ = 20, τ = 1/√2N

ϕ∗r



−45−40−35−30−25−20−15−10−50

5

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 1, λ = 20, τ = 1/√2N

ϕ∗r



−20

−15

−10

−5

0

5

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.316228, λ = 20, τ = 1/√2N

ϕ∗r



Figure 3.6: The subplots show the normalized r progress rate for varying normalized mutation strengths σ(g)∗. Theparameters in the title are kept constant for every particular subplot. The top row in every set of 3× 2 plots shows the casethat the parental individual is near the cone axis (note that here the feasible case dominates). The bottom row in everyset of 3 × 2 plots shows the case that the parental individual is in the vicinity of the cone boundary (note that here theinfeasible case dominates). And the middle row shows a case in between. The solid line has been generated by Monte Carlosimulation. For this, the generational loop has been run 105 times for a fixed parental individual and constant parameters.The solid line shows the mean of the normalized r progress over those 105 iterations. The crosses have been computed bynumerical integration (using the normal approximation for r). For this computation, Equations (3.185) and (3.188) havebeen numerically computed and subsequently normalized using Equations (3.27), (3.30), (3.31), and (3.47). The boxeswith a point have been calculated by evaluating Equation (3.224) with Equations (3.176), (3.208), and (3.223). The pluseshave been calculated by evaluating Equation (3.224) with Equations (3.169), (3.208), and (3.223). The vertical black lineindicates the value of the normalized mutation strength in the steady state for the given parameters. It has been calculatedusing Equation (3.298) (see Section 3.1.3.2 for steady state investigations).


is considered. In order for this σ to be selected, the corresponding offspring’s q value has to bethe best value among all the λ offspring. This is the case if its q value is smaller than the smallestq value of the remaining (λ− 1) trials. This latter case is denoted by q1;λ−1. The correspondingprobability is written as Pr[q < q1;λ−1 |σ]. Hence, the density of σ of the best offspring reads

pσ1;λ(σ) = pσ(σ |σ(g))Pr[q < q1;λ−1 |σ]. (3.230)

The probability Pr[q < q1;λ−1 |σ] can be derived as follows. The projected values ql are consid-ered for λ offspring in one generation. The projected ql density for a given mutation strength isgiven by pQ(ql |x(g), r(g), σ). Considering that the k-th generated descendant has the best pro-jected value among all λ offspring means q = qk and all the remaining offspring must have largerprojected values. More formally, this results in the condition ql > qk = q for some k ∈ {1, . . . , λ}and for all l ∈ {1, . . . , λ} \ {k}. For a single individual, this is the case with probability

Pr[q < ql] = 1− Pr[ql ≤ q] = 1− PQ(q). (3.231)

As all (λ − 1) individuals are created independently, the condition must hold for all of them.Consequently, the probability that the individual with the particular value q is the best onein all (λ − 1) comparisons becomes [1 − PQ(q)]λ−1. With this, one arrives at the probabilitydensity that a given q value is the best pQ(q |x(g), r(g), σ)[1−PQ(q)]λ−1. As there are λ differentpossibilities for the one offspring being the best among λ descendants, it must be multiplied byλ. Integrating over all possible q values the result reads

Pr[q < q1;λ−1 |σ] = λ

∫ q=∞

q=0pQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq. (3.232)

Insertion of this result into Equation (3.230) finally yields

pσ1;λ(σ) = pσ(σ |σ(g))λ

∫ q=∞


This result can now be inserted into Equation (3.228) resulting in

ψ(x(g), r(g), σ(g)) =

∫ σ=∞

σ=0

(σ − σ(g)

σ(g)

)pσ(σ |σ(g))

× λ∫ q=∞

q=0pQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq dσ.

(3.234)

Due to difficulties in analytically solving Equation (3.234), an approximate solution is derived.To this end, the approach used in [5, Section 7.3.2.4] is followed. For this, Equation (3.234) iswritten as

ψ(x(g), r(g), σ(g)) =

∫ σ=∞

σ=0f(σ)pσ(σ |σ(g)) dσ (3.235)

=

∫ σ=∞

σ=0f(σ)

1√2πτ

1

σexp

−1

2

(ln(σ/σ(g))

τ

)2 dσ (3.236)

with

f(σ) =

(σ − σ(g)

σ(g)

)λ

∫ q=∞



Now, τ is assumed to be small. Therefore, Equation (3.236) can be expanded into a Taylor seriesat τ = 0. Further, as τ is small, the probability mass of the log-normally distributed values isconcentrated around σ(g). Therefore, f(σ) can be expanded into a Taylor series at σ = σ(g).After further calculation (it is referred to [5, Section 7.3.2.4] for all the details) one obtains

ψ(x(g), r(g), σ(g)) = f(σ(g)) +τ2

2σ(g) ∂f

∂σ

∣∣∣∣∣σ=σ(g)

+τ2

2σ(g)2 ∂2f

∂σ2

∣∣∣∣∣σ=σ(g)

+O(τ4). (3.238)

To proceed further, the expressions f(σ(g)), ∂f∂σ

∣∣∣σ=σ(g)

, and ∂2f∂σ2

∣∣∣σ=σ(g)

need to be evaluated.

Because f(σ(g)) = σ(g)−σ(g)

σ(g) × · · · = 0,

f(σ(g)) = 0 (3.239)

immediately follows. For ∂f∂σ

∣∣∣σ=σ(g)

, one derives with the product rule

∂f

∂σ=

1

σ(g)λ

∫ q=∞

q=0pQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq

+σ − σ(g)

σ(g)λ

∫ q=∞

q=0

∂

∂σpQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq.

(3.240)

Therefore, it follows that

∂f

∂σ

∣∣∣σ=σ(g)

=1

σ(g)λ

∫ q=∞

q=0pQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq

∣∣∣σ=σ(g)

+σ − σ(g)

σ(g)

∣∣∣σ=σ(g)

λ

∫ q=∞

q=0

∂

∂σpQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq

∣∣∣σ=σ(g)

(3.241)

=1

σ(g)λ

∫ q=∞

q=0pQ(q |x(g), r(g), σ(g))[1− PQ(q)]λ−1 dq︸︷︷︸

=1

+σ(g) − σ(g)

σ(g)︸︷︷︸=0

λ

∫ q=∞

q=0

∂


∣∣∣σ=σ(g)

(3.242)

=1

σ(g). (3.243)

Computing the derivative with respect to σ of Equation (3.240) using the product rule for thesecond summand results in

∂2f

∂σ2=

1

σ(g)λ

∫ q=∞

q=0

∂


+1

σ(g)λ

∫ q=∞

q=0

∂


+σ − σ(g)

σ(g)λ

∫ q=∞

q=0

∂2

∂σ2pQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq

(3.244)

=2

σ(g)λ

∫ q=∞

q=0

∂


+σ − σ(g)

σ(g)λ

∫ q=∞

q=0

∂2

∂σ2pQ(q |x(g), r(g), σ)[1− PQ(q)]λ−1 dq.

(3.245)


This implies that

∂2f

∂σ2

∣∣∣σ=σ(g)

=2

σ(g)λ

∫ q=∞

q=0

∂


∣∣∣σ=σ(g)

. (3.246)

Hence, ∂∂σpQ(q |x(g), r(g), σ) has to be derived next. To this end, Equations (3.97) and (3.98)

are revisited. The infeasible case is treated first followed by the feasible case.

3.1.2.3.1 The Approximate SAR in the case that the probability of feasible off-spring tends to 0 For the SAR approximation in the case that the probability of feasibleoffspring tends to 0, Equation (3.98) is simplified further using

√1 + x ' 1 + x

2 +O(x2)

r '

√1 +

σ(g)2N

r(g)2 ' 1 +σ(g)2

N

2r(g)2 (3.247)

and for N →∞σ(g) ≈ σr (3.248)

yielding

PQinfeas(q |x(g), r(g), σ) ≈ Φ

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

(3.249)

and

pQinfeas(q |x(g), r(g), σ) (3.250)

=d

dqPQinfeas(q) (3.251)

≈ 1√2π

√1 + 1

ξ

σexp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

2 . (3.252)


Taking the derivative of Equation (3.252) with respect to σ one obtains

∂

∂σpQinfeas(q |x(g), r(g), σ)

= − 1√2π

√1 + 1

ξ

σ2exp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

2

+1√2π

√1 + 1

ξ

σexp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

2

×(−2

2

)(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

×

−(

1 + 1ξ

)q − x(g) − r(g)√

ξ

σ2√

1 + 1ξ

−r(g)√ξ

N

2r(g)2√1 + 1

ξ

(3.253)

=1√2π

√1 + 1

ξ

σ2

−1 +

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

σ√

1 + 1ξ

+σN

2√ξr(g)

√1 + 1

ξ

×

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

× exp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

2 .

(3.254)

Addition of the term σN

2√ξr(g)

√1+ 1

ξ

− σN

2√ξr(g)

√1+ 1

ξ

, which is equal to 0, to the expressions inside


the first pair of square brackets yields further

∂

∂σpQinfeas(q |x(g), r(g), σ) =

1√2π

√1 + 1

ξ

σ2

−1 +

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

σ√

1 + 1ξ

+σN

2√ξr(g)

√1 + 1

ξ

+σN

2√ξr(g)

√1 + 1

ξ

− σN

2√ξr(g)

√1 + 1

ξ

×

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

× exp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

2

(3.255)

=1√2π

√1 + 1

ξ

σ2

−1 +

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

+2σN

2√ξr(g)

√1 + 1

ξ

×

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

× exp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ2N

2r(g)2

)σ√

1 + 1ξ

2 .

(3.256)

Insertion of this result and Equation (3.249) into Equation (3.246) results in

∂2finfeas

∂σ2

∣∣∣σ=σ(g)

=2

σ(g)λ

∫ q=∞

q=0

1√2π

√1 + 1

ξ

σ(g)2

{− 1

+

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)2

N

2r(g)2

)σ(g)

√1 + 1

ξ

+σ(g)N

√ξr(g)

√1 + 1

ξ

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)2

N

2r(g)2

)σ(g)

√1 + 1

ξ

× exp

−1

2

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)2

N

2r(g)2

)σ(g)

√1 + 1

ξ

2

×

1− Φ

(

1 + 1ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)2

N

2r(g)2

)σ(g)

√1 + 1

ξ

λ−1

dq.

(3.257)


For solving this integral,

− t :=

(1 + 1

ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)2

N

2r(g)2

)σ(g)

√1 + 1

ξ

(3.258)

is substituted. It further implies

dt

dq= − 1 + 1/ξ

σ(g)√

1 + 1/ξ(3.259)

dq = − σ(g)√1 + 1/ξ

dt. (3.260)

Expressing t with normalized σ(g) = r(g)σ(g)∗

N one obtains

t = −N

(1 + 1

ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)∗2r(g)2

N

2r(g)2N2

)σ(g)∗r(g)

√1 + 1

ξ

(3.261)

= −N

(1 + 1

ξ

)q − x(g) − r(g)√

ξ

(1 + σ(g)∗2

2N

)σ(g)∗r(g)

√1 + 1

ξ

. (3.262)

For N → ∞, the integration bounds after substitution follow as tl = ∞ and tu = −∞. Theresulting integral reads

∂2finfeas

∂σ2

∣∣∣σ=σ(g)

(3.263)

= − 2

σ(g)2

λ√2π

∫ t=−∞

t=∞

−1 + t2 − σ(g)N√1 + 1

ξ

√ξr(g)

t

e−12t2 [1− Φ(−t)︸︷︷︸

=Φ(t)

]λ−1 dt (3.264)

=2

σ(g)2

λ√2π

∫ t=∞

t=−∞

−1 + t2 − σ(g)N√1 + 1

ξ

√ξr(g)

t

e−12t2 [Φ(t)]λ−1 dt (3.265)

=2

σ(g)2

λ√2π

∫ t=∞

t=−∞t2e−

12t2 [Φ(t)]λ−1 dt

− 2

σ(g)2

λ√2π

∫ t=∞

t=−∞e−

12t2Φ(t)λ−1 dt

− 2

σ(g)2

σ(g)N√1 + 1

ξ

√ξr(g)

λ√2π

∫ t=∞

t=−∞te−

12t2Φ(t)λ−1 dt

(3.266)

=2

σ(g)2

(d

(2)1,λ − 1

)− 2

σ(g)2

σ(g)N√1 + 1

ξ

√ξr(g)

d(1)1,λ. (3.267)

In the step from Equation (3.264) to Equation (3.265), the fact that negating an integral isequivalent to exchanging the upper and lower bounds and the identity 1 − Φ(−t) = Φ(t) havebeen used. In the step from Equation (3.266) to Equation (3.267) the higher-order progresscoefficients defined in [5, Equation 4.41] have been used to express the integrals. It holds that


d(1)1,λ = c1,λ. With this, the approximate SAR expression can be expressed for the infeasible case.

Insertion of Equations (3.239), (3.243), and (3.267) into Equation (3.238) yields

ψinfeas ≈ 0 +τ2

2+τ2

2σ(g)2

2

σ(g)2

(d

(2)1,λ − 1

)− 2

σ(g)2

σ(g)N√1 + 1

ξ

√ξr(g)

c1,λ

(3.268)

= τ2

[(d

(2)1,λ −

1

2)− c1,λσ

(g)∗√

1 + ξ

]. (3.269)

3.1.2.3.2 The Approximate SAR in the case that the probability of feasible off-spring tends to 1 Now, the case that the probability of feasible offspring tends to 1 isconsidered in Equation (3.246). Note that Equations (3.239) and (3.243) are the same for bothcases, the infeasible case and the feasible case. To treat Equation (3.246) further for the feasiblecase,

∂

∂σpQfeas(q |x(g), r(g), σ) =

∂

∂σ

1√2πσ

e− 1

2

(q−x(g)

σ

)2 (3.270)

has to be derived. With the use of the product rule one obtains

∂

∂σ

1√2πσ

e− 1

2

(q−x(g)

σ

)2 = − 1√

2πσ2e− 1

2

(q−x(g)

σ

)2

+1√2πσ

e− 1

2

(q−x(g)

σ

)2

×(−2

2

)(q − x(g)

σ

)(− 1

σ2

)(q − x(g))

(3.271)

=1√

2πσ2

−1 +

(q − x(g)

σ

)2 e− 1

2

(q−x(g)

σ

)2

. (3.272)

Insertion of this result and Equation (3.98) into Equation (3.246) results in

∂2ffeas

∂σ2

∣∣∣σ=σ(g)

=2

σ(g)λ

∫ q=∞

q=0

1√2πσ(g)2

−1 +

(q − x(g)

σ(g)

)2 e− 1

2

(q−x(g)

σ(g)

)2

× [1− Φ

(q − x(g)

σ(g)

)]λ−1 dq.

(3.273)

The substitution −t := q−x(g)

σ(g) , which implies dq = −σ(g)dt, and therefore for N → ∞ lowerbound tl =∞ and upper bound tu = −∞, yields

∂2ffeas

∂σ2

∣∣∣σ=σ(g)

= − 2

σ(g)2

λ√2π

∫ t=−∞

t=∞[−1 + t2]e−

12t2 [1− Φ(−t)]λ−1 dt (3.274)

=2

σ(g)2

λ√2π

∫ t=∞

t=−∞[−1 + t2]e−

12t2 [Φ(t)]λ−1 dt (3.275)

=2

σ(g)2

(d

(2)1,λ − 1

). (3.276)


With this, the approximate SAR expression can be expressed for the feasible case. Insertion ofEquations (3.239), (3.243), and (3.276) into Equation (3.238) yields

ψfeas ≈ 0 +τ2

2+τ2

2σ(g)2 2

σ(g)2

(d

(2)1,λ − 1

)(3.277)

= τ2

(d

(2)1,λ −

1

2

). (3.278)

3.1.2.3.3 The Approximate SAR - Combination Using the Best Offspring’s Feasi-bility Probability Similarly to Equation (3.150), both cases are combined into

ψ ≈ Pfeas(x(g), r(g), σ(g))ψfeas + [1− Pfeas(x

(g), r(g), σ(g))]ψinfeas. (3.279)

Insertion of Equations (3.269) and (3.278) into Equation (3.279) finally results in the approxi-mate SAR

ψ ≈(


)τ2

[(d

(2)1,λ −

1

2

)− c1,λσ

(g)∗√

1 + ξ

]

+ Pfeas(x(g), r(g), σ(g))τ2

(d

(2)1,λ −

1

2

) (3.280)

= τ2

[(d

(2)1,λ −

1

2

)−(


) c1,λσ(g)∗

√1 + ξ

]. (3.281)

Figure 3.7 shows plots comparing the derived closed-form approximation for ψ to Monte Carlosimulations and numerical integration results. The subplots show the SAR for varying mutationstrengths σ(g). The parameters in the title are kept constant for every particular subplot. Thetop row in every set of 3× 2 plots shows the case that the parental individual is near the coneaxis (note that here the feasible case dominates). The bottom row in every set of 3 × 2 plotsshows the case that the parental individual is in the vicinity of the cone boundary (note thathere the infeasible case dominates). And the middle row shows a case in between. The solidline has been generated by Monte Carlo simulation. For this, the generational loop has beenrun 105 times for a fixed parental individual and constant parameters. The solid line showsthe mean of the SAR over those 105 iterations. The crosses have been computed by numericalintegration (using the normal approximation for r). For this computation, Equation (3.228)has been numerically computed using Equations (3.27), (3.30), (3.31), (3.47), and (3.229). Theboxes with a point have been calculated by evaluating Equation (3.280) with Equation (3.176).The pluses have been calculated by evaluating Equation (3.280) with Equation (3.169). Thevertical black line indicates the value of the normalized mutation strength in the steady statefor the given parameters. It has been calculated using Equation (3.298) (see Section 3.1.3.2 forsteady state investigations). Again, the approximation quality in the vicinity of the steady stateis acceptable. However, due to the linear approximation used, ψ must necessarily deviate forsufficiently large σ∗.

As it turns out (see Section 3.1.3.2), the ES in the steady state moves along the cone bound-ary. Therefore, for the case near the cone axis, the mutation strength gets increased for all thevalues considered in the first row of the plots.


0.039

0.0395

0.04

0.0405

0.041

0.0415

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ψ

SAR Monte Carlo sim.SAR with r normal approx.

SAR closed-form approx.SAR closed-form approx. (asym. feas. prob.)

0.0396

0.0398

0.04

0.0402

0.0404

0.0406

0.0408

0.041

0.0412

0.0414

0.0416

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ψ



−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 5.005, λ = 20, τ = 1/√2N

ψ



−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 0.505, λ = 20, τ = 1/√2N

ψ



−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0 2 4 6 8 10 12 14

σ(g)∗

N = 40, ξ = 0.01, x(g) = 1, r(g) = 10, λ = 20, τ = 1/√2N

ψ



−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0 2 4 6 8 10 12 14 16 18 20

σ(g)∗

N = 40, ξ = 1, x(g) = 1, r(g) = 1, λ = 20, τ = 1/√2N

ψ





0.0396

0.0398

0.04

0.0402

0.0404

0.0406

0.0408

0.041

0.0412

0.0414

0.0416

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ψ



−0.014−0.012−0.01−0.008−0.006−0.004−0.002

0

0.002

0.004

0.006

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ψ



−0.4−0.35−0.3−0.25−0.2−0.15−0.1−0.05

0

0.05

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.163114, λ = 20, τ = 1/√2N

ψ



0.00375

0.0038

0.00385

0.0039

0.00395

0.004

0.00405

0.0041

0.00415

0.0042

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 5.005, λ = 20, τ = 1/√2N

ψ



−0.4−0.35−0.3−0.25−0.2−0.15−0.1−0.05

0

0.05

0 10 20 30 40 50 60 70

σ(g)∗

N = 40, ξ = 10, x(g) = 1, r(g) = 0.316228, λ = 20, τ = 1/√2N

ψ



−0.025

−0.02

−0.015

−0.01

−0.005

0

0.005

0 2 4 6 8 10 12

σ(g)∗

N = 400, ξ = 0.01, x(g) = 1, r(g) = 10, λ = 20, τ = 1/√2N

ψ





−0.014−0.012−0.01−0.008−0.006−0.004−0.002

0

0.002

0.004

0.006

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ψ



−0.008

−0.006

−0.004

−0.002

0

0.002

0.004

0.006

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.01, λ = 20, τ = 1/√2N

ψ



0.0037

0.0038

0.0039

0.004

0.0041

0.0042

0.0043

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 0.505, λ = 20, τ = 1/√2N

ψ



−0.025

−0.02

−0.015

−0.01

−0.005

0

0.005

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.163114, λ = 20, τ = 1/√2N

ψ



−0.025

−0.02

−0.015

−0.01

−0.005

0

0.005

0 2 4 6 8 10 12 14 16 18

σ(g)∗

N = 400, ξ = 1, x(g) = 1, r(g) = 1, λ = 20, τ = 1/√2N

ψ



−0.025

−0.02

−0.015

−0.01

−0.005

0

0.005

0 5 10 15 20 25 30 35 40 45

σ(g)∗

N = 400, ξ = 10, x(g) = 1, r(g) = 0.316228, λ = 20, τ = 1/√2N

ψ



Figure 3.7: The subplots show the SAR for varying mutation strengths σ(g). The parameters in the title are keptconstant for every particular subplot. The top row in every set of 3× 2 plots shows the case that the parental individual isnear the cone axis (note that here the feasible case dominates). The bottom row in every set of 3× 2 plots shows the casethat the parental individual is in the vicinity of the cone boundary (note that here the infeasible case dominates). And themiddle row shows a case in between. The solid line has been generated by Monte Carlo simulation. For this, the generationalloop has been run 105 times for a fixed parental individual and constant parameters. The solid line shows the mean of theSAR over those 105 iterations. The crosses have been computed by numerical integration (using the normal approximationfor r). For this computation, Equation (3.228) has been numerically computed using Equations (3.27), (3.30), (3.31), (3.47),and (3.229). The boxes with a point have been calculated by evaluating Equation (3.280) with Equation (3.176). The pluseshave been calculated by evaluating Equation (3.280) with Equation (3.169). The vertical black line indicates the value of thenormalized mutation strength in the steady state for the given parameters. It has been calculated using Equation (3.298)(see Section 3.1.3.2 for steady state investigations).


3.1.3 The Evolution Dynamics in the Deterministic Approximation

3.1.3.1 The Evolution Equations

As already explained in Section 3.1.1, the evolution equations (Equations (3.2) to (3.4)) aredifficult to treat analytically. Therefore, approximations are usually used. One such approx-imation is to ignore the stochastic fluctuations and only use the mean terms. This results inEquations (3.11) to (3.13). Expressing the x and r evolution equations using the respectivenormalized variants, the deterministic evolution equations read

x(g+1) = x(g) − x(g)ϕ(g)x

∗

N= x(g)

(1− ϕ

(g)x

∗

N

)(3.282)

r(g+1) = r(g) − r(g)ϕ(g)r

∗

N= r(g)

(1− ϕ

(g)r

∗

N

)(3.283)

σ(g+1) = σ(g) + σ(g)ψ(g) = σ(g)(

1 + ψ(g)). (3.284)

Equations (3.282) to (3.284) represent the so-called mean value iterative system. It can beiterated starting with initial x(0), r(0), and σ(0) values to predict the actual values of real ESruns.

3.1.3.2 The ES in the Stationary State

Figure 3.8 shows the mean value dynamics of the (1, λ)-σSA-ES applied to the conically con-strained problem with different parameters as indicated in the title of the subplots. The plotsare organized into three rows and two columns. The first two rows show the x (first row, firstcolumn), r (first row, second column), σ (second row, first column), and σ∗ (second row, sec-ond column) dynamics. The third row shows x and r converted into each other by

√ξ. The

third row shows that after some initial phase, the ES transitions into a stationary state. Inthis steady state, the ES moves along the cone boundary. This becomes clear in the plots be-cause the equation for the cone boundary is r = x/

√ξ or equivalently x = r

√ξ. This supports

the assumption in Equation (3.206). In the first two rows, the solid line has been generatedby averaging 100 real runs of the ES. The dashed line has been determined by iterating themean value iterative system (Equations (3.282) to (3.284)) with Monte Carlo simulations for

ϕ(g)x

∗, ϕ

(g)r

∗, and ψ(g). The dashed-dotted and dotted lines have been computed by iterating the

mean value iterative system (Equations (3.282) to (3.284)) with the derived approximations inEquation (3.177), Equation (3.224) (with Equations (3.208) and (3.223)), and Equation (3.280)

for ϕ(g)x

∗, ϕ

(g)r

∗, and ψ(g), respectively. The difference in the dashed-dotted and dotted lines

is feasibility probability approximation. Equation (3.176) has been used for the dashed-dottedline and Equation (3.169) has been used for the dotted line. Note that it is possible that in ageneration g, the iteration of the mean value iterative system yields infeasible (x(g), r(g))T . Insuch cases, the particular (x(g), r(g))T have been projected back and projected values used in thefurther iterations.

The plots show that the approximation deviates from the real dynamics for small N (e.g.,N = 40). The deviations from the real runs get smaller with larger N (e.g., N = 400). Moredetailed investigations have led to the conclusion that the SAR function ψ is mainly responsiblefor the deviations. The intuition for this came from taking a closer look at Figures 3.5 to 3.7.Regarding the critical values around the steady state mutation strength, one can see that theψ approximation deviates most (compared to ϕx and ϕr) from the exact numerical integration.


The deviations are particularly noticeable for the parameter configuration N = 40 and ξ = 10.To further investigate this trace, the mean value iterative system has been iterated with theapproximations for ϕx and ϕr. However, ψ was replaced by Monte Carlo simulations. Thisresults in the approximative dynamics having less deviations from the real runs and the iterationby Monte Carlo simulation.

10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x

real runiterated by Monte Carlo sim.

iterated by approx.iterated by approx. (asym. feas. prob.)

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r



10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ



10−5

10−4

10−3

10−2

10−1

100

101

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ∗



10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x real runr real run ×√ξ

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r real runx real run /

√ξ

Continued below.


10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x



10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r



10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ



10−5

10−4

10−3

10−2

10−1

100

101

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ∗



10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


√ξ

Continued below.


10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x



10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r



10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ



10−4

10−3

10−2

10−1

100

101

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ∗



10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


√ξ

Continued below.


10−40

10−30

10−20

10−10

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x



10−40

10−30

10−20

10−10

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r



10−50

10−40

10−30

10−20

10−10

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ



10−4

10−3

10−2

10−1

100

101

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ∗



10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


√ξ

Continued below.


10−50

10−40

10−30

10−20

10−10

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x



10−50

10−40

10−30

10−20

10−10

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r



10−50

10−40

10−30

10−20

10−10

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ



10−3

10−2

10−1

100

101

102

103

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ∗



10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x real runr real run ×√ξ 10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


√ξ

Continued below.


10−100

10−80

10−60

10−40

10−20

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x



10−100

10−80

10−60

10−40

10−20

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r



10−100

10−80

10−60

10−40

10−20

100

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ



10−3

10−2

10−1

100

101

102

103

104

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ∗



10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


10−10

10−8

10−6

10−4

10−2

100

102

0 500 1000 1500 2000 2500 3000 3500 4000

g

N = 40, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


√ξ

Continued below.


10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x



10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r



10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ



10−4

10−3

10−2

10−1

100

101

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ∗



10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


√ξ

Continued below.


10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x



10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r



10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ



10−4

10−3

10−2

10−1

100

101

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ∗



10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 0.01, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


√ξ

Continued below.


10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x



10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r



10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ



10−3

10−2

10−1

100

101

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ∗



10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


√ξ

Continued below.


10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x



10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r



10−45

10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ



10−3

10−2

10−1

100

101

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ∗



10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 1, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


√ξ

Continued below.


10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

x



10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

r



10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ



10−2

10−1

100

101

102

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N

σ∗



10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 10, τ = 1√

2N


√ξ

Continued below.


10−50

10−40

10−30

10−20

10−10

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

x



10−50

10−40

10−30

10−20

10−10

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

r



10−50

10−40

10−30

10−20

10−10

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ



10−2

10−1

100

101

102

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N

σ∗



10−40

10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


10−35

10−30

10−25

10−20

10−15

10−10

10−5

100

0 5000 10000 15000 20000 25000 30000 35000 40000

g

N = 400, ξ = 10, x(0) = 100, r(0) = x(0)

10√ξ, σ(0) = 0.0001, λ = 20, τ = 1√

2N


√ξ

Figure 3.8: Mean value dynamics closed-form approximation and real-run comparison of the (1, λ)-σSA-ES with repairby projection applied to the conically constrained problem. The plots are organized into three rows and two columns.The first two rows show the x (first row, first column), r (first row, second column), σ (second row, first column), andσ∗ (second row, second column) dynamics. The third row shows x and r converted into each other by

√ξ. The third

row shows that after some initial phase, the ES transitions into a stationary state. In this steady state, the ES movesalong the cone boundary. This becomes clear in the plots because the equation for the cone boundary is r = x/

√ξ or

equivalently x = r√ξ. This supports the assumption in Equation (3.206). In the first two rows, the solid line has been

generated by averaging 100 real runs of the ES. The dashed line has been determined by iterating the mean value iterative

system (Equations (3.282) to (3.284)) with Monte Carlo simulations for ϕ(g)x

∗, ϕ

(g)r

∗, and ψ(g). The dashed-dotted and

dotted lines have been computed by iterating the mean value iterative system (Equations (3.282) to (3.284)) with the derived

approximations in Equation (3.177), Equation (3.224) (with Equations (3.208) and (3.223)), and Equation (3.280) for ϕ(g)x

∗,

ϕ(g)r

∗, and ψ(g), respectively. The difference in the dashed-dotted and dotted lines is feasibility probability approximation.

Equation (3.176) has been used for the dashed-dotted line and Equation (3.169) has been used for the dotted line. Notethat it is possible that in a generation g, the iteration of the mean value iterative system yields infeasible (x(g), r(g))T . Insuch cases, the particular (x(g), r(g))T have been projected back and projected values used in the further iterations.

For constant exogenous parameters the state of the (1, λ)-σSA-ES on the conically con-strained problem is completely described by (x(g), r(g), σ(g))T . The stationary state (also calledsteady state) is the state obtained for sufficiently large g, i.e., for large time scales. A correct


working ES is expected to steadily move towards the optimizer. Consequently, for sufficientlylarge g, the normalized mutation strength is expected to be constant. In other words, the steadystate normalized mutation strength σ∗ss is expected to be constant. More formally this reads

σ∗ss := limg→∞

σ(g)∗. (3.285)

This means that for sufficiently large g, σ(g)∗ = σ(g+1)∗ = σ∗ss should hold. In order to proceedfurther, Equation (3.284) is expressed in normalized terms

r(g+1)σ(g+1)∗

N=r(g)σ(g)∗

N

(1 + ψ(g)

). (3.286)

Requiring that σ(g)∗ = σ(g+1)∗ one obtains further

r(g+1) = r(g)(

1 + ψ(g)). (3.287)

Insertion of Equation (3.283) results in

r(g)

(1− ϕ

(g)r

∗

N

)= r(g)

(1 + ψ(g)

)(3.288)

1− ϕ(g)r

∗

N= 1 + ψ(g) (3.289)

ϕ(g)r

∗

N= −ψ(g). (3.290)

For the infeasible case in the vicinity of the cone boundary, i.e., if Pfeas ≈ 0 and x(g)√ξr(g) ≈ 1, one

obtains using Equations (3.142) and (3.206)

ϕ∗r = N

(1− 1

r(g)E[qr1;λ]

)(3.291)

≈ N(

1− 1

r(g)

1√ξ

E[q1;λ]

)(3.292)

= N

(1− x(g)

√ξr(g)

(1− ϕ∗x

N

))(3.293)

≈ N(

1−(

1− ϕ∗xN

))(3.294)

= ϕ∗x. (3.295)

Considering this case further (Pfeas ≈ 0 and x(g)√ξr(g) ≈ 1), use of Equation (3.149) and Equa-

tion (3.280) together with Equation (3.290) and Equation (3.295) results after simplification forthe asymptotic case N →∞ in

1

1 + ξ

1−

√1 +

σ(g)∗2

N

+

√ξ

N(1 + ξ)σ(g)∗c1,λ

√1 +

1

ξ= −τ2

[(d

(2)1,λ −

1

2

)− c1,λσ

(g)∗√

1 + ξ

](3.296)

1−√

1 + σ(g)∗2

N

1 + ξ+σ(g)∗c1,λ

N√

1 + ξ= −τ2

(d

(2)1,λ −

1

2

)+ τ2 c1,λσ

(g)∗√

1 + ξ.

(3.297)


This leads to a quadratic equation. Solving it is not particularly difficult but long (use of aComputer Algebra System (CAS) can be advantageous). Hence, only the solution is given

σss∗ ≈

c1,λ

√1 + ξ(1−Nτ2)(1 + (d

(2)1,λ − 1

2)(1 + ξ)τ2)

1− 1+ξN c2

1,λ(1−Nτ2)2

+

√(1 + ξ)(c2

1,λ(1−Nτ2)2 +Nτ2(d(2)1,λ − 1

2)(2 + (d(2)1,λ − 1

2)(1 + ξ)τ2))

1− 1+ξN c2

1,λ(1−Nτ2)2.

(3.298)

Since σ∗ ≥ 0 does hold, only the positive root has been chosen.

Aiming for a simpler form, assuming further N � σ(g)2allows simplifying Equation (3.149)

(using Taylor expansion and cutoff after the linear term for the first square root and assuming

1+σ(g)∗22N

1+σ(g)∗2N

≈ 1) to

ϕx∗ ' N

1 + ξ

(1− 1− σ(g)∗2

2N

)+

√ξ

1 + ξσ(g)∗c1,λ

√1 +

1

ξ(3.299)

=c1,λ√1 + ξ

σ(g)∗ − σ(g)∗2

2(1 + ξ)(3.300)

= c1,λ

(σ(g)∗√

1 + ξ

)− 1

2

(σ(g)∗√(1 + ξ)

)2

. (3.301)

Inserting this into the derived steady state condition Equation (3.290) yields using Equa-tions (3.281) and (3.301)

ϕx∗ = −Nψ (3.302)

c1,λ

(σ∗ss√1 + ξ

)− 1

2

(σ∗ss√

(1 + ξ)

)2

= −Nτ2

[(d

(2)1,λ −

1

2

)− c1,λσ

∗ss√

1 + ξ

](3.303)

c1,λσ∗ss −

1

2σ∗ss

2

√(1 + ξ)

1 + ξ= −Nτ2

(d

(2)1,λ −

1

2

)√1 + ξ +Nτ2c1,λσ

∗ss (3.304)

1

2σ∗ss

2 1√(1 + ξ)

+ σ∗ss(Nτ2c1,λ − c1,λ) = Nτ2

(d

(2)1,λ −

1

2

)√1 + ξ. (3.305)

Solving this quadratic equation results in

σss∗ ≈

[c1,λ(1−Nτ2) +

√c2

1,λ(1−Nτ2)2 +Nτ2(

2d(2)1,λ − 1

)]√1 + ξ. (3.306)

Here, only the positive root has been chosen since σ∗ ≥ 0 does hold. This is a remarkable result.One recognizes the equations for the sphere model (see [5, Page 301, Equation 7.171]), i.e.,

σ∗ss =√

1 + ξσ∗sssphere. (3.307)

Additionally, insertion of σ∗ss into ϕ∗x yields

ϕ∗xss = ϕ∗sssphere. (3.308)


Therefore, in the steady state, the (1, λ)-σSA-ES with repair by projection applied to the con-ically constrained problem approaches the optimizer with the same rate as if a sphere were tobe optimized irrespectively of the value of ξ. That is, due to the definition of ϕ∗ one observes

x((g)) and r(g) dynamics proportional to exp(−ϕ∗N g)

in the steady state. This implies linear

convergence order with a convergence rate of ϕ∗N . This result can be interpreted in the sense

that the optimal repair has transformed the conical constraint into a sphere model.Figure 3.9 shows plots of the steady state computations. The derived closed-form approxi-

mation has been compared to real ES runs.For the first three rows of plots, the solid and dotted lines have been determined by computing

the normalized steady state mutation strength σ∗ss using Equation (3.298) and Equation (3.306),respectively, for different values of ξ. The results for ϕ∗x have been determined by using thecomputed steady state σ∗ss values with Equations (3.176) and (3.177) for the solid line andwith Equations (3.169) and (3.177) for the dotted line. Similarly, the results for ϕ∗r have beendetermined by using the computed steady state σ∗ss values with Equations (3.176), (3.208),(3.223), and (3.224) for the solid line and with Equations (3.169), (3.208), (3.223), and (3.224)for the dotted line. The pluses have been determined by computing the averages of the particularvalues in real ES runs. For this, the ES was run for 400N generations and the particular valueshave been averaged over these generations. This procedure was repeated 100 times and those100 times have been averaged to compute the steady state values indicated by the pluses. Forcomparison purposes, the fourth row of plots shows the normalized steady state progress ratescombined in one plot.

For N = 40, the approximation for σ∗ss is good for small values of ξ. For higher values of ξ,the deviations of the approximation from the real runs increase. For N = 400, the approximationfor σ∗ss comes close to the results of the real runs for the values of ξ under consideration. Thedeviations from the real runs of the approximations for the normalized steady state progressrates are particularly large for small values of ξ. These deviations can be explained by theassumption of sufficiently large ξ in Equation (3.68) and Equations (3.69) to (3.80). Becausefor large N the offspring feasibility probability Equation (3.169) tends to 0, the expressions ofthe infeasible case are prevalent for the cone boundary case. As the ES moves along the coneboundary in the steady state (see Figure 3.8), the steady state progress rate expressions are theones from the infeasible case.


0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 10, τ = 1/√2N

ϕ∗xss

steady state approx.steady state approx. (sphere)

steady state from real run0

2

4

6

8

10

12

14

16

18

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 20, τ = 1/√2N

ϕ∗xss


steady state from real run

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 10, τ = 1/√2N

ϕ∗rss

steady state approxsteady state approx (sphere)


2

4

6

8

10

12

14

16

18

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 20, τ = 1/√2N

ϕ∗rss



0

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 10, τ = 1/√2N

σ∗ss



20

40

60

80

100

120

140

160

180

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 20, τ = 1/√2N

σ∗ss



0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 10, τ = 1/√2N

ϕ∗xss approx.ϕ∗rss approx.

ϕ∗xss from real runϕ∗rss from real run

0

2

4

6

8

10

12

14

16

18

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 40, λ = 20, τ = 1/√2N





0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 10, τ = 1/√2N

ϕ∗xss


steady state from real run0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 20, τ = 1/√2N

ϕ∗xss



0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 10, τ = 1/√2N

ϕ∗rss


steady state from real run0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 20, τ = 1/√2N

ϕ∗rss



1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 10, τ = 1/√2N

σ∗ss



3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 20, τ = 1/√2N

σ∗ss



0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 10, τ = 1/√2N



0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

0 1 2 3 4 5 6 7 8 9 10

ξ

N = 400, λ = 20, τ = 1/√2N



Figure 3.9: Steady state closed-form approximation and real-run comparison of the (1, λ)-σSA-ES with repair byprojection applied to the conically constrained problem. For the first three rows of plots, the solid and dotted lines have beendetermined by computing the normalized steady state mutation strength σ∗ss using Equation (3.298) and Equation (3.306),respectively, for different values of ξ. The results for ϕ∗x have been determined by using the computed steady state σ∗ssvalues with Equations (3.176) and (3.177) for the solid line and with Equations (3.169) and (3.177) for the dotted line.Similarly, the results for ϕ∗r have been determined by using the computed steady state σ∗ss values with Equations (3.176),(3.208), (3.223), and (3.224) for the solid line and with Equations (3.169), (3.208), (3.223), and (3.224) for the dotted line.The pluses have been determined by computing the averages of the particular values in real ES runs. For this, the ESwas run for 400N generations and the particular values have been averaged over these generations. This procedure wasrepeated 100 times and those 100 times have been averaged to compute the steady state values indicated by the pluses. Forcomparison purposes, the fourth row of plots shows the normalized steady state progress rates combined in one plot.

Chapter 4

Conclusion

An optimization problem with a conically shaped feasibility region has been presented. A(1, λ)-σSA-ES that can be applied to the presented problem has then been described. Whereasprior work dealt with resampling, the repair approach in this work is based on projection, i.e.,the minimal repair principle has been applied. The derivation of closed-form approximationshas been shown. First, approximations for the one-generation behavior of the algorithm havebeen shown. Second, those approximate expressions have been used in derived deterministicevolution equations. As a result, closed-form approximations have been derived for predictingthe evolution dynamics of the ES. Plots comparing the derived approximations to simulationshave been presented to show the quality of the approximations.

As a remarkable result it is to be mentioned that the optimal repair by projection resultsin a linear convergence order of the evolution process. That is, the conically constrained linearoptimization problem has been turned into a spherical unconstrained problem.

Analysis of a multi-recombinative ES applied to the presented problem is a topic for futurework. Moreover, it is of interest to analyze cumulative step size adaptation as the mutationstrength control.

91

Bibliography

[1] B. Arnold, N. Balakrishnan, and H. Nagaraja. A First Course in Order Statistics. WileySeries in Probability and Statistics. Wiley, 1992.

[2] D. V. Arnold. Analysis of a repair mechanism for the (1, λ)-ES applied to a simple con-strained problem. In Proceedings of the 13th Annual Conference on Genetic and Evolution-ary Computation, pages 853–860. ACM, 2011.

[3] D. V. Arnold. On the behaviour of the (1, λ)-ES for a simple constrained problem. InProceedings of the 11th Workshop Proceedings on Foundations of Genetic Algorithms, pages15–24. ACM, 2011.

[4] D. V. Arnold. On the behaviour of the (1, λ)-ES for a conically constrained problem.In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation,pages 423–430. ACM, 2013.

[5] H.-G. Beyer. The Theory of Evolution Strategies. Natural Computing Series. Springer,2001.

[6] M. Fisz. Wahrscheinlichkeitsrechnung und mathematische Statistik. Hochschulbucher furMathematik. Berlin: VEB Deutscher Verlag der Wissenschaften, 1971.

[7] C. Forbes, M. Evans, N. Hastings, and B. Peacock. Statistical Distributions. Wiley, 2011.

[8] M. Hellwig and D. V. Arnold. Comparison of constraint-handling mechanisms for the (1,λ)-ES on a simple constrained problem. Evolutionary Computation, 24(1):1–23, 2016.

[9] S. Meyer-Nieberg. Self-adaptation in evolution strategies. PhD thesis, Dortmund Universityof Technology, 12 2007.

[10] I. Rechenberg. The Evolution Strategy. A Mathematical Model of Darwinian Evolution.Springer Berlin Heidelberg, Berlin, Heidelberg, 1984.

92

Documents

Technical Report: Analysis of the (1; -Self-Adaptation ......Technical Report: Analysis of the (1; )-˙-Self-Adaptation Evolution Strategy with Repair by Projection Applied to a Conically