Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
进化计算中的经典优化思想
Qingfu Zhang Department of Computer Science
City University of Hong Kong Hong Kong
Shantou University, 03/2019
1
CityU
2
3
CS@CityU
q U.S. News and World Report University Rankings 2019o Computer Science: 11th
q Quacquarelli Symonds (QS) World University Rankings 2018o Computer Science & Information Systems: 50th
q ShanghaiRanking's Global Ranking of Academic Subjects 2018o Computer Science & Engineering: 38th
4
q This talk will provide three examples:
o In single objective optimization, genetic algorithms/ant colony algorithms might be regarded as a gradient method on the space of prob. models.
o In multiobjective optimization, aggregation methods in traditional opt can be used for designing ef f ic ient mul t iobject ive evolutionary algorithms.
o Regularity properties can be used for designing efficient multiobjective optimization algorithms.
5
Example 1: Genetic Algorithms/Ant colony optimization vs gradient methods
6
Single objective optimization Problem
7
Dxxf subject to
)( min
Classic Optimization: Gradient based Methods
8
q Newton’s Method: Consider the second order approximation of function
q Newton direction:
q Key Issue in traditional opt: how to approximate Newton direction without H.
9
Evolutionary Computation
q One of the three areas in Computation Intelligence (CI). q One of major areas in metaheuristics. qMost of evolutionary algorithms are for optimization.
qMainly inspired by nature at early stage. o Genetic Algorithm (GA) in 1975 by J. H. Holland et al (1929-
2015)o Ant Colony Optimization (ACO) in 1992 by Marco Dorigo (1961-) o ……
10
o Population: a set of candidate solutions. o Selection: select fittest solutions to be parents for next generation.o Crossover: mix two parent solutions to produce new solutions. o Mutation: modify a parent solution to produce a new solution.
Genetic Algorithm =Population Based Iterative Search Method
11
One iteration in a simple EA
SelectionCrossoverMutation
……
Population at generation t Population at generation t+1Parent Set
Ant Colony Optimization
q Behavior of Ants
12
NEST FOOD
q Ants leave pheromone as they moves.
q At a fork point, Ants choose routes with heavier pheromone in a probabilistic manner.
13
NEST FOOD
Ant Algorithm for TSP
q TSP: a person wants to visit N cities and then returns to his starting place. He should visit each city once and only once. In which he should visit these cities to minimize the total distance traveled?
q Ant Algorithm:o K (say, K=5 or 100) artificial ants make their tour based on a probabilistic rule, which is determined by pheromone info. and
other info.o Ants leave/update pheromone on the route just travelled.
14
Why we like/dislike EC methods
q Easy to understand. Low manpower. q Gradient info is not needed. Wide applications. q Parallel by nature. Parallel evaluation of solutions.
q No solid theory: difficult to analyze.q Poor performance if they are not well designed. qMimic the metaphor of nature.
15
Estimation of Distribution Algorithm (1995, H Mühlenbein et al)
q At each iteration, it maintains a probability model on the search space.
q New solutions are sampled from the model.
q The obj. value of each solution is evaluated and then the model is updated.
16
A Genetic Algorithm can be modelled as an EDA
q Uniform Crossover Sampling from a Prob. Model
17
x1 x2 x3 x40 0 1 11 0 0 10 0 1 11 0 1 01 0 0 1
1 0 1 1
P(x1=1) P(x2=1) P(x3=1) P(x4=1)3/5 0 3/5 4/5
0 0 1 1
Ant Algorithm is an EDA
q Pheromone values define a probabilistic model .
18
Model based Search
q Evolution Strategies (Rechenberg, 1971)o CMA-ES (Hansen 2001), NES (Wierstra et al, 2007)
q Cross-Entropy method (Rubinstein, 1999)
q Probability Collective (Wolpert, 2001)
q IGO: Information Geometric Optimization (Olliver et al, 2016)
19
Gradient methods on Expectation
20
Gradient Descent on the Expectation
21
Natural Gradient
q Gradient descento The gradient is not invariant under different parameterization.
ü It does not consider the metric of the parameter space.o The parameters may be no longer feasible.
üe.g. the covariance matrix loses positive-definiteness
q The natural gradient provides the steepest direction on the space of probability models. (Amari, 1999).o It remains invariant under different parameterizations.o It takes into account the metric of the parameter space.
22
q The steepest direction on the manifold (set) of probability distributions is given by the natural gradient
q F is the Fisher Information Matrix. o It takes second order information into account.
q Natural gradient remains invariant under different parameterizations.23
q Evolutionary algorithms can be modelled as estimation of distribution algorithms.
q The search directions of EAs in the space of probability models = (natural) gradient.
q Various classical optimization techniques can be used to design evolutionary algorithms.
24
q Borrowing ideas from L-BFGS method, we have designed an EDA for large scale optimization problems.
With m=2, Rm-ES costs about 1/1000 of the running time of CMA-ES. 25
Fig: Ellipsoid (left) and Rosenbrock function (right). The objective values on the running time in seconds with n = 1000.
Example 2: MOEA/D: bridge between traditional aggregation methods and evolutionary algorithms
26
27
Multiobjective Optimization Problem (MOP)
DxxfxfxfxF m
subject to))(,),(),(()( min 21
search space
objective vector
Why multiobjective:
o By nature, many real-life problems have multiple objectives.
o Decision makers (DM) or modellers don’t know how to combine them into one.
o DMs want to know trade-off relationship among these objectives.
28
Pareto Optimal Solutions= Best Trade-off Candidates
q x is Pareto optimal iff no other solution dominates it.
q A (rational) decision maker doesn’t like non-Pareto optimal solutions.
q Pareto set (PS)= the set of all Pareto optimal solutions in the x-space.
q Pareto front (PF) = the image of the PS in the F-space.
Search Space Feasible objective region
PSPF
29
Basic Strategies in MCMD
q A-priori
q A-posteriori
q interactive
)(7.0)(3.0)( 21 xfxfxf
The final solution2f
1f
)(DF
2f
1f
A number of Pareto optimal solutions The final solution
2f
1f
)(DF
2f
1f
2f
1f
?
2f
1f
How to approximate the Pareto front?
30
Search Space
PSPF
Three major evolutionary algorithms
q Pareto dominance based
q Performance indicator based
q Decomposition based (Zhang & Li, 2007)
31
MOEA/D=Decomposition + Collaboration
q Decomposition (from traditional opt)
o Decompose the task of approximating the PF into N subtasks. Each subproblem can be single objective or multiobjective.
q Collaboration (from EC)
o N agents (procedures) are used. Each agent is for a different subproblem.
o These N subproblems are related to one another. N agents can solve these subproblems in a collaborative manner.
32
33
Problem Decompositionq Weighted Sum Approach
There are many other aggregation methods. 34
.0, and 1 where)()()|(min
2121
2211
xfxfλxg ws
)1 ,0( )(1)(0)|(min
)9.0 ,1.0( )(9.0)(1.0)|(min
)1.0 ,9.0( )(1.0)(9.0)|(min
)0 ,1( )(0)(1)|(min
1121
11
1021
10
221
2
121
1
xfxfxg
xfxfxg
xfxfxg
xfxfxg
ws
ws
ws
ws
2f
1f
2f
1f
)2,1(
Pareto optimal solution
Approximation of the PF N single obj opt suproblems
• It works for convex PF.
q At each generation, each agent does the following:
1. Mating selection: obtain the current solutions of some neighbours (collaboration).
2. Reproduction: generate a new solution by applying reproduction operators on its own solution and borrowed solutions.
3. Replacement: 1. replace its old solution by the new one if the new one is better than old one
for its objective. 2. pass the new solution on to some of its neighbours, each of them replaces
its old solution by this new one if it is better for its objective. 35
)( min 1x, λg
)(min 2x, λg
)(min Nx, λg
f2
f1
Decomposition,Neighborhood, Memory
Resources q MOEA/D Web: https://sites.google.com/view/moead
36
q Survey papers:
37
q Example 3: Regularity Based Multiobjective Optimization
38
39
q Regularity of continuous MOP:
o This property has been ignored by MOEA researchers.
o There is very little research on the shape of PSs.
Motivations
1fPareto front (PF)
2x
2f
)(DF
F
Pareto set (PS)
1xUnder certain conditions, the PS (PF) is a (m-1)-dimensional piecewise continuous manifold in decision (objective) space. Where m is the # of the objs.
How can we deal with a continuous MOP if its PS is (m-1)-D piecewise continuous manifold?
40
q Suppose we use the following commonly-used framework:
q Why commonly-used genetic operators do not work well for complicated PSs?
o When two parents are in the PS, their offspring may not be close to the PS.
o The PS is not an equilibrium.
PS in decision Space.
Population
New Solutions
Reproductionoperators
CompetitionReplacement
selection
So we resort to EDA (Modelling and Sampling).
Lots of workVery little work
41
Population
New Solutions
Modelling &Sampling
CompetitionReplacement
selection
Basic Idea
q In the case of 2 objso The PS is a 1-D curve. o If the algorithm works well.
o The principal (central) curve of the population could be an approximation to the PS.
Population PS
42
Population
New Solutions
Modelling &Sampling
CompetitionReplacement
selection
noise. Gaussian D-a is
. curve D-1a on ddistributeuniformly is where
of samplea as regarded is populationcurrent theinpoint Each
nC
x
This model is different from other models in EDAs. How to model C and ?
43
Modelling: How to model C and ?
We assume that: o The central curve C consists of
several line segments. This assumption makes C computable.
q How to model Co Divide the population into several
clusters by local PCA. o Compute the central line of each
cluster. q How to model
o the deviation of the points in each cluster to its central line.
: point in the current population.
: central curve: C
simplification
The number of clusters needs to be preset.
44
Sampling
q How to sample new solutions:o The number of new solutions
sampled around :
o Sampling around üUniformly randomly pick a
point x1 in ü x2~N(0, ),ü x=x1+x2.
)()()()(
321___# CLCLCL
CL isolutionsnewof iC
iC
iC
nniI
45
Population
New Solutions
Modelling &Sampling
CompetitionReplacement
selection
Selection
q We use the non-dominated sorting (Deb et al) in selection.
46
Experiments: Test Problems we used
q Test instances with 2 objs used in our experiments: modified
versions of ZDT.o PS
Variable transformation used:
o PS
Transformation:
,10 211 nxxxx
,10 22211 nxxxx
),,2( - , 111 nixxxxx ii
),,2( - , 12
11 nixxxxx ii
47
q Test instances with 3 objs used in our experiments: modified versions of DTLZ with variable transformation.o DTLZ-1 with PS
o DTLZ-2 with PS
,1,0 3121 nxxxxx
,1,0 223121 nxxxxx
48
Experiments: Parameter setting
q The number of decision variables=30. q Pop_Size=100 for two objs, and 200 for three objs for all the
algorithmsq The number of clusters in Local PCA=5. q All the statistics are based on 20 independent runs.
The algorithms in comparison: o PCX-NSGA-II, New solution generator: PCX. o GDE 3 New solution generator: DE o MIEDA: New solution generator: ordinary EDA
All these algorithms use the same selection as RM-MEDA. The only difference is how to generate new solutions.
49
Experiments: Resultsq Test Instance
o PS: PF: convex
q RM-MEDA and GDE3 are better than two others. Why? If all the parents are in the PS,
ü In RM-MEDA and GDE3, the offspring will be in (very close to) the PS.
ü In the 2 others, the offspring will not.
50
q Test Instance
o PS:
o Optimal solutions are not uniformly distributed.
q RM-MEDA>GDE3, Whyo If all the parents are in the PS, but
not uniformly distributed.ü In RM-MEDA, the offspring will
be uniformly distributed in the PF. ü In GDE3, the offspring will not be
uniformly distributed.
51
q Test Instance
o PS:
q RM-MEDA>others, Why If all the parents are in the PF,
ü In RM-MEDA, the offspring will be very close to the PF.
ü In all the other algorithms, the offspring will be far away from the PF.
52
q Test Instance
o PS:
q RM-MEDA>others, Why If all the parents are in the PF,
ü In RM-MEDA, the offspring will be very close to the PF.
ü In all the other algorithms, the offspring will be far away from the PF.
53
Conclusions on RM-MEDA
q Based on regularity property. q New solution generator is problem-specific for continuous MOPs.q Very promising experimental results. q Recent work: How to approximate the PS in the x-space
Downsides: o Local PCA: a little bit complicated.A simple yet efficient version of RM-MEDA is under development.
q Future Work: how to get a math description of PS.
Conclusions
q Some evolutionary algorithms can be regarded as a gradient method in a transformed space.
q Ideas/techniques in traditional opt can be used for designing efficient evolutionary algorithms.
q There are not so many fundamental principles/problem-solving strategies.
54