View
216
Download
3
Category
Preview:
Citation preview
Modeling with IRENEIntegrated R-code for Engineered Neural EvolutionTrevor Grant and Olcay AkmanDepartment of MathematicsIllinois State University
OverviewNeural Evolution
What is a Neural Network? Using genetic algorithms to find optimal
parameters to nonlinear functions Neural evolution
Special Population Attributes Jump Connections User defined libraries and learning functions Mutating learning functions
Engineered Genetic Algorithms
Starting out simpleWe begin by modeling the data with a simple linear model.
We then look at the sum of the squared residuals (SSR). A value is assigned to the model based on this SSR.
β0
β1
β2
β3
Inputs (X1, X2, …, Xn)
Output (Y)
Example1974 Statistics regarding Income
Income:
per capita income (1974)
Life Exp:
life expectancy in years (1969–71)
Murder:
murder and non-negligent manslaughter rate per 100,000 population (1976)
HS Grad:
percent high-school graduates (1970)
Frost:
mean number of days with minimum temperature below freezing (1931–1960) in capital or large city
β0
β1
β2
β3
Inputs (X1, X2, …, Xn)
Output (Y)
Income
Life Expectancy
Murder Rate
HS Grad %
Frost
Sum of Squared Residuals
Height
Age
A linear model is estimatedwhich minimizes the sum of squared residuals (SSR). The distance between theestimates and the actualdata points.
Relationship
Linear Traditionally we estimate
linear relationships.
Nonlinear True relationship may be (often
is) non-linear
Sometimes we know relationship and can use nonlinear regression methods such as Neural Networks Nonlinear least squares
Sometimes we don’t know the functional form of the relationship. IRENE explores functional
forms while estimating parameters.
Sum of Squared Residuals
Height
Age
A nonlinear model reducesthe sum of squared residualsand better models theactual data.
What’s in a node?
A node contains a learning functionThe learning function takes input and parameters converts it to output.
This is repeated for each observation
Drag picture to placeholder or click icon to add
Each model has it’s own unique set of α. The fitted values of the output are functions of
Observation h1
1 .1108
2 1.524
3 .529
4 1.011
… …
n 1.752
After this is complete a linear model is estimated. The values of the nodes in the last layer are regressed on the output. The sum of the squared residuals is assigned as the model’s value.
The linear model estimated
The sum of the squared residuals of the model (SSR) is referred to as the value of the model. We want a model that minimizes sum of squared residuals (or value).
Linear model estimated in a more complex neural network
h11
h12
h13
h21
h22
NOTE: h11, h12, h13 are not included in the final linear model. Only the nodes in the final layer are included in the linear model
Optimizing Parameters with Genetic Algorithms Step 1: A population of models is created each with
randomly assigned parameters
Step 2: Models ‘mate’ in the hope of creating ‘children’ models with better value (lower SSR).
From now on we will refer to each unique set of parameters in a model as a creature. A collection of creatures, models with identical topology but different parameters, is referred to as a species.
Copy this model 200 times, each copy has randomly assigned parameter values
Each individual collection of parameters is referred to as a creature. The collection of creatures for a given topology (arrangement of layers and nodes) is referred to as a species.
CreatureSpecies
Species
A species has a unique arrangement of nodes, layers and learning functions. Even though these creatures have the same arrangement of layers and nodes, they have a different learning function and so they are different species
≠
Sigmoid Learning Function Exponential Learning Function
Then each creature has a different computed value (SSR), and assigned ID#, this is saved in a table.
ID # 001
ID # 002
ID # 003
41,240
215,635
3,612
Model ID Sum Squared Resid. (SSR)
Two creatures are selected with probability weighted according to model fitness.
ID # 001
ID # 002
ID # 003
41,240
215,635
3,612
Model ID Sum Squared Resid. (SSR)
Two methods of mating
Average The average of each
parameter in the mother’s and father’s DNA is averaged in the child’s DNA
Crossover A ‘cut point’ is randomly
determined, every parameter before the cut point is inherited from the father, after the cut point each parameter is inherited from the mother
DNA is selected from the two creatures chosen to mate.
ID # 001
ID # 002
ID # 003
41,240
215,635
3,612
Model ID Sum Squared Resid. (SSR)
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
α11=3.613 Father α12=26.252 α13=-25.12 α14=104.4
Average Method
Α11=3.613 Father Α12=26.252 Α13=-25.12 Α14=104.4
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
Α11=(3.613+2.512)/2
=3.0625
Child Α12=(26.252+.105)/2
=13.1785
Α13=(-25.12+51.25)/2
=13.065
Α14=(104.4-15.2)/2
=44.6
Average Method
Α11=3.613 Father Α12=26.252 Α13=-25.12 Α14=104.4
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
Α11=3.0625 Child Α12=13.1785 Α13=13.065 Α14=44.6
Crossover MethodA random number between one and the length
of the parameter sequence is chosen.
This is the ‘cut point’. The child inherits parameters from the father before this point, from the mother after.
Crossover Method: Cut point at position two
Α11=3.613Father Α12=26.252 Α13=-25.12 Α14=104.4
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
Child
Α11=3.613 Α12=26.252
α13=51.25 α14=-15.2
The least fit creatures are killed to make room for the new children
ID # 001
ID # 002
ID # 003
41,240
3,289
215,635
Model ID Sum Squared Resid. (SSR)
The least fit creatures are killed to make room for the new children
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
The least fit creatures are killed to make room for the new children
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
Α11=3.0625Model Structure Α12=13.1785 Α13=13.065 Α14=44.6
The children are assigned new ID numbers and their value (SSR) is computed
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
ID # 004 6,755
This process repeats several times
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
ID # 004 6,755
This process repeats several times
ID # 005
ID # 002
4,242
3,289
Model ID Sum Squared Resid. (SSR)
ID # 004 6,755
This process repeats several times
ID # 005
ID # 002
4,242
3,289
Model ID Sum Squared Resid. (SSR)
ID # 007 3,111
This process repeats several times
ID # 008
ID # 002
4,841
3,289
Model ID Sum Squared Resid. (SSR)
ID # 007 3,111
Eventually there is convergence at an optimum (either local or global)
ID # 239
ID # 159
2,015
2,015
Model ID Sum Squared Resid. (SSR)
ID # 412 2,015
At convergence we kill all the extra creatures in the species (to free up memory)
ID # 239
ID # 159
2,015
2,015
Model ID Sum Squared Resid. (SSR)
ID # 412 2,015
What is neural evolution?Neural evolution: simultaneously explore new
topologies while optimizing existing topologies.
New species are born out of old species.
Who lives? Who dies? After each
generation a roster of all creatures is created and ordered according to value.
Species ID
Creature ID Value (SSR)
003 043 12123
003 021 12552
002 231 13241
003 054 15125
001 152 20150
005 024 25124
003 122 35102
002 105 53039
… … …
001 412 124310151
Who lives? Who dies? If there is at least one creature of species in
the top 60%* of a list of all creatures the species survives. Otherwise the entire species is eradicated.
*60% is arbitrary. We can set that to other proportions. We’ll talk about this more in engineered genetic algorithms.
Drag picture to placeholder or click icon to add
Example:
Species
2
2
3
3
2
2
3
1
1
3
Species 2
Species 1
Species 3
60%
Survivors: No creature of Species 1 is among them
Special Population
AttributesJump connections, user defined libraries and
learning functions, and mutating functional forms.
Jump Connections
With jump connections, all nodes and input are regressed on the output.
In a standard neural network, only the nodes in the final layer is regressed on the output.
Colinearity If jump connections are used and the learning
function is linear then the final linear model will have perfect colinearity. (The computer won’t be able to estimate the final model, this is bad and a failsafe is built in to prevent this from happening)
Libraries of Learning Functions Each time a node is
created a learning function is randomly selected from the library.
Function1: Exponential
Function 2: Sigmoid
Function 3: Logit
Function 4: Step Function
…
User Defined Functions Suppose theory dictates that a particular nonlinear
relationship possibly exists. For example consider Michaelis-Menten kinetics model of enzyme-kinetics.
The researcher can add this functional form to the library to be selected as a possible learning function for nodes.
The standard library contains common functional forms, however certain cases may require special functional forms which can be added by the researcher as needed.
Mutating learning functions Function 3: Sigmoid:
Function 5: Exponential
New Function: Composite:
The researcher can also choose to allow for mutating learning functions.
New composite learning function
Population attributesMax creatures in species
Library
Allow functional mutations
Maximum layers
Maximum nodes
Mutation rates
Allow jump connections
etc.
Determining population attributesHow many generations should a population be
allowed to run?
Should Jump connections be allowed?
What portion of the roster should be the cut off point for determining species survival?
Should function mutations be allowed?
And other settable attributes…
Populations can be represented with DNA tooPopulation 1
Max creatures 200
Library StdLib
Maximum Layers
3
Maximum Generations
5000
Allow Jump Connections
YES
Population 2
Max creatures 150
Library UsrDef
Maximum Layers
4
Maximum Generations
7000
Allow Jump Connections
NO
Engineered Genetic Algorithms
Engineered Genetic Algorithms refers to using genetic algorithms to find the optimal
population settings for a neural evolution algorithm.
Evaluating PopulationsCreatures are evaluated on how well they fit
the training data. Creatures that minimize SSE in training data set are considered most fit
Populations are evaluated on how well they predict out of sample. The best creature the population is able to produce is evaluated in the validation data set and SSE is computed. Population that produces creature that minimizes SSE in validation data set is considered most fit.
The Champion
Recall: each species is comprised of several creatures. The champion is the optimal creature of the optimal species in the populations. I.e. the creature that best minimizes SSE in the entire population.
Out of Sample Evaluation
0
1
2
3
4
5
6
Real DataPop1 PredictionPop2 PredictionPop3 Prediction
Validation Data Set
In this example, Pop3 preforms best, Pop1 is worst.
Pop3 and Pop2 are most likely to be selected for mating
Population parameters come in two varieties
Numerical Continuous Examples:
Max Layers (3) Initial Species
Population (300) Mating may be either
crossover or averaging Need to round if
averaging
Switches Examples
Allow Jump Connections
(TRUE/FALSE)
Mating Rule ( Average / Crossover /
Both )
Mating must be crossover with higher probability of selecting father’s (higher value model’s) traits.
Population mating
Father (Pop3)
Jump Connections
Initial Species PopulationMax Layers
Max nodes per layer
Mutation Rate
YES
300
3
7
.15
Child (Pop4)
Jump Connections
Initial Species PopulationMax Layers
Max nodes per layer
Mutation Rate
225
.10
Mother (Pop2)
Jump Connections
Initial Species PopulationMax Layers
Max nodes per layer
Mutation Rate
NO
150
2
4
.05
YES
300 150
3
4
.15 .05
Recall Previous Example:
Species
4
2
4
2
4
4
2
3
2
3
Species 2
Species 4
Species 3
60%
Survivors: No creature of Species 3 is among them
The optimal creature of the now extinct Species 3 is saved
Recall Previous Example:
Population 1Museum of “Natural” History
This creature is saved in the Museum.
The optimal creature of each speciesas it goes extinct is also saved.
When the population has completed its specified number of generations theoptimal creature from each remainingspecies is also saved to the museum.
Population 1Museum of “Natural” History
Why have a Museum?
Neural Networks may ‘over fit’ trainingdata. A good predictive model maygo extinct.
Validation Data Set
Evaluate models in the museum to make sure we didn’t miss a good predictive model.
The End… (not of the slide show, don’t get up yet)Creature Value
At a specified ‘end’ of the algorithmall creatures from all museums are collectedinto a master list.
Validation Data Set
Each creature in the list is evaluated on the validation data set.2412
3516
12302
98415120
5123
11023
6123
191241
The End… (not of the slide show, don’t get up yet)Creature Value
Validation Data Set
The best model is selected. If it passes a second round of validation, it is selected. If it doesn’t pass the second round of validation, the next best model is selected.
2412
3516
12302
98415120
5123
11023
6123
191241Second Validation Data Set
SUCCESS!
References Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. "Weighted evolving
networks: coupling topology and weight dynamics." Physical review letters 92.22 (2004): 228701.
Maniezzo, Vittorio. "Genetic evolution of the topology and weight distribution of neural networks." Neural Networks, IEEE Transactions on 5.1 (1994): 39-53.
Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. "Modeling the evolution of weighted networks." Physical Review E 70.6 (2004): 066149.
Sher, Gene. "DXNN: evolving complex organisms in complex environments using a novel tweann system." Proceedings of the 13th annual conference companion on Genetic and evolutionary computation. ACM, 2011.
Sher, Gene I. "Discover & eXplore Neural Network (DXNN) Platform, a Modular TWEANN." arXiv preprint arXiv:1008.2412 (2010).
Michalewicz, Zbigniew, Cezary Z. Janikow, and Jacek B. Krawczyk. "A modified genetic algorithm for optimal control problems." Computers & Mathematics with Applications 23.12 (1992): 83-94.
Wang, Ling, and D-Z. Zheng. "A modified genetic algorithm for job shop scheduling." The International Journal of Advanced Manufacturing Technology 20.1 (2002): 72-76.
Recommended