进化计算中的经典优化思想 - Shantou Universityimagelab.stu.edu.cn/upload/files/2019041617033664.pdf · qQuacquarelli Symonds (QS) World University Rankings 2018 ... o Genetic

进化计算中的经典优化思想

Qingfu Zhang Department of Computer Science

City University of Hong Kong Hong Kong

Shantou University, 03/2019

1

CityU

2

3

CS@CityU

q U.S. News and World Report University Rankings 2019o Computer Science: 11th

q Quacquarelli Symonds (QS) World University Rankings 2018o Computer Science & Information Systems: 50th

q ShanghaiRanking's Global Ranking of Academic Subjects 2018o Computer Science & Engineering: 38th

4

q This talk will provide three examples:

o In single objective optimization, genetic algorithms/ant colony algorithms might be regarded as a gradient method on the space of prob. models.

o In multiobjective optimization, aggregation methods in traditional opt can be used for designing ef f ic ient mul t iobject ive evolutionary algorithms.

o Regularity properties can be used for designing efficient multiobjective optimization algorithms.

5

Example 1: Genetic Algorithms/Ant colony optimization vs gradient methods

6

Single objective optimization Problem

7

Dxxf subject to

)( min

Classic Optimization: Gradient based Methods

8

q Newton’s Method: Consider the second order approximation of function

q Newton direction:

q Key Issue in traditional opt: how to approximate Newton direction without H.

9

Evolutionary Computation

q One of the three areas in Computation Intelligence (CI). q One of major areas in metaheuristics. qMost of evolutionary algorithms are for optimization.

qMainly inspired by nature at early stage. o Genetic Algorithm (GA) in 1975 by J. H. Holland et al (1929-

2015)o Ant Colony Optimization (ACO) in 1992 by Marco Dorigo (1961-) o ……

10

o Population: a set of candidate solutions. o Selection: select fittest solutions to be parents for next generation.o Crossover: mix two parent solutions to produce new solutions. o Mutation: modify a parent solution to produce a new solution.

Genetic Algorithm =Population Based Iterative Search Method

11

One iteration in a simple EA

SelectionCrossoverMutation

……

Population at generation t Population at generation t+1Parent Set

Ant Colony Optimization

q Behavior of Ants

12

NEST FOOD

q Ants leave pheromone as they moves.

q At a fork point, Ants choose routes with heavier pheromone in a probabilistic manner.

13

NEST FOOD

Ant Algorithm for TSP

q TSP: a person wants to visit N cities and then returns to his starting place. He should visit each city once and only once. In which he should visit these cities to minimize the total distance traveled?

q Ant Algorithm:o K (say, K=5 or 100) artificial ants make their tour based on a probabilistic rule, which is determined by pheromone info. and

other info.o Ants leave/update pheromone on the route just travelled.

14

Why we like/dislike EC methods

q Easy to understand. Low manpower. q Gradient info is not needed. Wide applications. q Parallel by nature. Parallel evaluation of solutions.

q No solid theory: difficult to analyze.q Poor performance if they are not well designed. qMimic the metaphor of nature.

15

Estimation of Distribution Algorithm (1995, H Mühlenbein et al)

q At each iteration, it maintains a probability model on the search space.

q New solutions are sampled from the model.

q The obj. value of each solution is evaluated and then the model is updated.

16

A Genetic Algorithm can be modelled as an EDA

q Uniform Crossover Sampling from a Prob. Model

17

x1 x2 x3 x40 0 1 11 0 0 10 0 1 11 0 1 01 0 0 1

1 0 1 1

P(x1=1) P(x2=1) P(x3=1) P(x4=1)3/5 0 3/5 4/5

0 0 1 1

Ant Algorithm is an EDA

q Pheromone values define a probabilistic model .

18

Model based Search

q Evolution Strategies (Rechenberg, 1971)o CMA-ES (Hansen 2001), NES (Wierstra et al, 2007)

q Cross-Entropy method (Rubinstein, 1999)

q Probability Collective (Wolpert, 2001)

q IGO: Information Geometric Optimization (Olliver et al, 2016)

19

Gradient methods on Expectation

20

Gradient Descent on the Expectation

21

Natural Gradient

q Gradient descento The gradient is not invariant under different parameterization.

ü It does not consider the metric of the parameter space.o The parameters may be no longer feasible.

üe.g. the covariance matrix loses positive-definiteness

q The natural gradient provides the steepest direction on the space of probability models. (Amari, 1999).o It remains invariant under different parameterizations.o It takes into account the metric of the parameter space.

22

q The steepest direction on the manifold (set) of probability distributions is given by the natural gradient

q F is the Fisher Information Matrix. o It takes second order information into account.

q Natural gradient remains invariant under different parameterizations.23

q Evolutionary algorithms can be modelled as estimation of distribution algorithms.

q The search directions of EAs in the space of probability models = (natural) gradient.

q Various classical optimization techniques can be used to design evolutionary algorithms.

24

q Borrowing ideas from L-BFGS method, we have designed an EDA for large scale optimization problems.

With m=2, Rm-ES costs about 1/1000 of the running time of CMA-ES. 25

Fig: Ellipsoid (left) and Rosenbrock function (right). The objective values on the running time in seconds with n = 1000.

Example 2: MOEA/D: bridge between traditional aggregation methods and evolutionary algorithms

26

27

Multiobjective Optimization Problem (MOP)

DxxfxfxfxF m

subject to))(,),(),(()( min 21

search space

objective vector

Why multiobjective:

o By nature, many real-life problems have multiple objectives.

o Decision makers (DM) or modellers don’t know how to combine them into one.

o DMs want to know trade-off relationship among these objectives.

28

Pareto Optimal Solutions= Best Trade-off Candidates

q x is Pareto optimal iff no other solution dominates it.

q A (rational) decision maker doesn’t like non-Pareto optimal solutions.

q Pareto set (PS)= the set of all Pareto optimal solutions in the x-space.

q Pareto front (PF) = the image of the PS in the F-space.

Search Space Feasible objective region

PSPF

29

Basic Strategies in MCMD

q A-priori

q A-posteriori

q interactive

)(7.0)(3.0)( 21 xfxfxf

The final solution2f

1f

)(DF

2f

1f

A number of Pareto optimal solutions The final solution

2f

1f

)(DF

2f

1f

2f

1f

?

2f

1f

How to approximate the Pareto front?

30

Search Space

PSPF

Three major evolutionary algorithms

q Pareto dominance based

q Performance indicator based

q Decomposition based (Zhang & Li, 2007)

31

MOEA/D=Decomposition + Collaboration

q Decomposition (from traditional opt)

o Decompose the task of approximating the PF into N subtasks. Each subproblem can be single objective or multiobjective.

q Collaboration (from EC)

o N agents (procedures) are used. Each agent is for a different subproblem.

o These N subproblems are related to one another. N agents can solve these subproblems in a collaborative manner.

32

33

Problem Decompositionq Weighted Sum Approach

There are many other aggregation methods. 34

.0, and 1 where)()()|(min

2121

2211

xfxfλxg ws

)1 ,0( )(1)(0)|(min

)9.0 ,1.0( )(9.0)(1.0)|(min

)1.0 ,9.0( )(1.0)(9.0)|(min

)0 ,1( )(0)(1)|(min

1121

11

1021

10

221

2

121

1

xfxfxg

xfxfxg

xfxfxg

xfxfxg

ws

ws

ws

ws

2f

1f

2f

1f

)2,1(

Pareto optimal solution

Approximation of the PF N single obj opt suproblems

• It works for convex PF.

q At each generation, each agent does the following:

1. Mating selection: obtain the current solutions of some neighbours (collaboration).

2. Reproduction: generate a new solution by applying reproduction operators on its own solution and borrowed solutions.

3. Replacement: 1. replace its old solution by the new one if the new one is better than old one

for its objective. 2. pass the new solution on to some of its neighbours, each of them replaces

its old solution by this new one if it is better for its objective. 35

)( min 1x, λg

)(min 2x, λg

)(min Nx, λg

f2

f1

Decomposition,Neighborhood, Memory

Resources q MOEA/D Web: https://sites.google.com/view/moead

36

q Survey papers:

37

q Example 3: Regularity Based Multiobjective Optimization

38

39

q Regularity of continuous MOP:

o This property has been ignored by MOEA researchers.

o There is very little research on the shape of PSs.

Motivations

1fPareto front (PF)

2x

2f

)(DF

F

Pareto set (PS)

1xUnder certain conditions, the PS (PF) is a (m-1)-dimensional piecewise continuous manifold in decision (objective) space. Where m is the # of the objs.

How can we deal with a continuous MOP if its PS is (m-1)-D piecewise continuous manifold?

40

q Suppose we use the following commonly-used framework:

q Why commonly-used genetic operators do not work well for complicated PSs?

o When two parents are in the PS, their offspring may not be close to the PS.

o The PS is not an equilibrium.

PS in decision Space.

Population

New Solutions

Reproductionoperators

CompetitionReplacement

selection

So we resort to EDA (Modelling and Sampling).

Lots of workVery little work

41

Population

New Solutions

Modelling &Sampling


selection

Basic Idea

q In the case of 2 objso The PS is a 1-D curve. o If the algorithm works well.

o The principal (central) curve of the population could be an approximation to the PS.

Population PS

42

Population

New Solutions

Modelling &Sampling


selection

noise. Gaussian D-a is

. curve D-1a on ddistributeuniformly is where

of samplea as regarded is populationcurrent theinpoint Each

nC

x

This model is different from other models in EDAs. How to model C and ?

43

Modelling: How to model C and ?

We assume that: o The central curve C consists of

several line segments. This assumption makes C computable.

q How to model Co Divide the population into several

clusters by local PCA. o Compute the central line of each

cluster. q How to model

o the deviation of the points in each cluster to its central line.

: point in the current population.

: central curve: C

simplification

The number of clusters needs to be preset.

44

Sampling

q How to sample new solutions:o The number of new solutions

sampled around :

o Sampling around üUniformly randomly pick a

point x1 in ü x2~N(0, ),ü x=x1+x2.

)()()()(

321___# CLCLCL

CL isolutionsnewof iC

iC

iC

nniI

45

Population

New Solutions

Modelling &Sampling


selection

Selection

q We use the non-dominated sorting (Deb et al) in selection.

46

Experiments: Test Problems we used

q Test instances with 2 objs used in our experiments: modified

versions of ZDT.o PS

Variable transformation used:

o PS

Transformation:

,10 211 nxxxx

,10 22211 nxxxx

),,2( - , 111 nixxxxx ii

),,2( - , 12

11 nixxxxx ii

47

q Test instances with 3 objs used in our experiments: modified versions of DTLZ with variable transformation.o DTLZ-1 with PS

o DTLZ-2 with PS

,1,0 3121 nxxxxx

,1,0 223121 nxxxxx

48

Experiments: Parameter setting

q The number of decision variables=30. q Pop_Size=100 for two objs, and 200 for three objs for all the

algorithmsq The number of clusters in Local PCA=5. q All the statistics are based on 20 independent runs.

The algorithms in comparison: o PCX-NSGA-II, New solution generator: PCX. o GDE 3 New solution generator: DE o MIEDA: New solution generator: ordinary EDA

All these algorithms use the same selection as RM-MEDA. The only difference is how to generate new solutions.

49

Experiments: Resultsq Test Instance

o PS: PF: convex

q RM-MEDA and GDE3 are better than two others. Why? If all the parents are in the PS,

ü In RM-MEDA and GDE3, the offspring will be in (very close to) the PS.

ü In the 2 others, the offspring will not.

50

q Test Instance

o PS:

o Optimal solutions are not uniformly distributed.

q RM-MEDA>GDE3, Whyo If all the parents are in the PS, but

not uniformly distributed.ü In RM-MEDA, the offspring will

be uniformly distributed in the PF. ü In GDE3, the offspring will not be

uniformly distributed.

51

q Test Instance

o PS:

q RM-MEDA>others, Why If all the parents are in the PF,

ü In RM-MEDA, the offspring will be very close to the PF.

ü In all the other algorithms, the offspring will be far away from the PF.

52

q Test Instance

o PS:

q RM-MEDA>others, Why If all the parents are in the PF,

ü In RM-MEDA, the offspring will be very close to the PF.

ü In all the other algorithms, the offspring will be far away from the PF.

53

Conclusions on RM-MEDA

q Based on regularity property. q New solution generator is problem-specific for continuous MOPs.q Very promising experimental results. q Recent work: How to approximate the PS in the x-space

Downsides: o Local PCA: a little bit complicated.A simple yet efficient version of RM-MEDA is under development.

q Future Work: how to get a math description of PS.

Conclusions

q Some evolutionary algorithms can be regarded as a gradient method in a transformed space.

q Ideas/techniques in traditional opt can be used for designing efficient evolutionary algorithms.

q There are not so many fundamental principles/problem-solving strategies.

54

Documents

进化计算中的经典优化思想 - Shantou Universityimagelab.stu.edu.cn/upload/files/2019041617033664.pdf · qQuacquarelli Symonds (QS) World University Rankings 2018 ... o Genetic