28
5/27/2010 1  PROGRAMMING Dr. Sundaram Suresh School of Computer Engineering Nanyang Technological University Singapore Email: [email protected] Textbook Candida Ferreira, Gene Expression Programming: 2 , do Heroismo, Portugal. 2002 Weblink http://www .gene-expression- programming.co m/ www.gepsoft.com 2 GEP code can be found in http://jgep.sourceforge.net/ http://www.gene-expression-programming.com/Downloads.asp Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Talk II - GEP - SS (1)

Embed Size (px)

Citation preview

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 1/28

5/27/20

 PROGRAMMINGDr. Sundaram Suresh

School of Computer Engineering

Nanyang Technological University

Singapore

Email: [email protected]

Textbook

Candida Ferreira, Gene Expression Programming:

2

,do Heroismo, Portugal. 2002

Weblink

http://www.gene-expression-programming.com/

www.gepsoft.com

2

GEP code can be found in

http://jgep.sourceforge.net/

http://www.gene-expression-programming.com/Downloads.asp

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 2/28

5/27/20

What is Gene ExpressionProgramming?

GEP is also an evolutionary based algorithm. Gene ex ression ro rammin is develo ed b

3

 incorporating both the idea of simple, linearchromosomes of fixed length used in GAs and theramified structures of different sizes and shapesused in GP.

Genes - codes for a smaller program or sub-expression tree.

3

 designed to allow the creation of multiple genes.

It is worth emphasizing that GEP is the onlygenetic algorithm with multiple genes.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

GEP GA – does not represent the I/O relationship

mathematicall

4

GP – complexity in genetic operators andincrease in tree length due to genetic operation.

GEP – is combination GA string representationand GP mathematical expression

GEP uses genetic operators in GA to change the

4

ree eng . u , ere e eng o e s r ngremains the same. GEP is based on human genome…

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 3/28

5/27/20

GEP Representation

We want to represent thearithmetic expression

5

Chromosome made ofgenes

Function set arguments (n)

Gene – head and tails

Heads (h) are specifiedfor a given problem

  ai s are ca cu ate aseon number of heads and

t = h(n-1)+1

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

GEP Representation

Arithmetic expression - Gene Equivalent – K-

6

xpress on

Three Gene Re resentation

0 1 2 3 4 5 6

Q * + - a b c

7

d

6Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 4/28

5/27/20

Tree Construction of Three Gene7

7Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Biological process in Gene

The main operations that occur in a Genome are:

 

8

enome ep ications

Genome restructuring

Transcriptions

Translation and post-translation modifications

8Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 5/28

5/27/20

Replication

Replication of DNAmolecules.

9

The strands acts as atemplate for a new,complementary strand.

When copying is complete,there will be two daughterDNA molecules, eachidentical in sequence to the

mother molecule.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Genome Restructuring This operation modify the gene structure.

Introduce enetic diversit

10

Similar to GA and GP, in GEP also, populations of individuals(computer programs) evolve by developing new abilities andbecoming better adapted to the environment due to thegenetic modifications accumulated over a certain number ofgenerations.

Mutation

10

Recombination Transposition

Gene duplication

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 6/28

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 7/28

5/27/20

Solution Representation

Like in GP, GEP the chromosomes (solutions) are representedusin function set and terminal set.

13

.

In GEP – chromosomes are represented using genes

Genes – heads and tails

Heads are coded using function and terminal set

Tails are coded using only terminal set.

Let F be function set, F = {*,+,-,Q}, where

13

Q – square root

Let T be terminal set, T = {a,b,c,d}

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Solution Representation We want to represent the

arithmetic expression

14

Chromosome made of genes

Max. of arguments for theelements in the Function set (n)

Gene – head and tails

Heads (h) are specified for agiven problem

Tails are calculated based on

num er o ea s ant = h(n-1)+1

Arithmetic expression - GeneEquivalent – K-Expression

0 1 2 3 4 5 6

Q * + - a b c

7

d

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 8/28

5/27/20

Example

K – expression

15

Equivalent Tree is

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Representation… The structural organization of GEP genes is better

understood in terms of open reading frames (ORFs).

16

In biology, an ORF, or coding sequence of a gene,begins with the “start” codon, continues with the aminoacid codons, and ends at a termination codon.

However, a gene is more than the respective ORF, withsequences upstream from the start codon and sequencesdownstream from the stop codon.

Although in GEP the start site is always the first

16

pos on o a gene, e erm na on po n oes noalways coincide with the last position of a gene.

It is common for GEP genes to have non-coding regionsdownstream from the termination point.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 9/28

5/27/20

Use of non-coding region

They are, in fact, the essence of GEP and17

,genome using any genetic operator withoutrestrictions, always producing syntactically correctprograms without the need for a complicatedediting process or highly constrained ways ofimplementing genetic operators.

17

Indeed, this is the paramount difference between

GEP and previous GP implementations, with orwithout linear genomes

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Example with Head and Tail Function set : {Q,+,-,*,/} Terminal set : {a,b}

 

18

unct on arguments = Head = 10 Tail = 10(2-1)+1=11 The K- expression

The bold face representthe tail.

Here ORF ends at 10,where as the gene end at20.

ORF is phenotyperepresentation

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 10/28

5/27/20

Use of Non-Coding Region19

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Use of Non-Coding Region20

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 11/28

5/27/20

Use of Non-Coding Region21

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

MultiGene Representation

We saw the representation for single gene

22

represen a on.

Now, we discuss the multi-gene representation forthe chromosome

Number of genes can be greater than one in a

chromosome.

22

For all problem, number of genes and number ofheads are fixed prior.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 12/28

5/27/20

Three Gene Representation

Three genes

n = 2; h = 4

23

K-Expressions for

Gene1 :

Gene2 :

Gene3 :

Position ‘0’ is the start of thegene and position ‘8’ is the endof the gene.

The ORF ending of each genecan be calculated after treeconstruction.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Three Gene Representation24

24Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 13/28

5/27/20

Tree Construction25

25Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Translation The process of converting the K-expressions into tree (ET) and

reducing it to mathematical form is called Translation. GEP chromosomes are com osed of one or more ORFs and

26

 obviously the encoded individuals have different degrees ofcomplexity.

The simplest individuals are encoded in a single gene, and the ìorganismî is, in this case, the product of a single gene - an ET.

In other cases, the organism is a multi-subunit ET, in which thedifferent sub-ETs are linked together by a particular function.

In other cases, the organism emerges from the spatial organizationof different sub-ETs (e.g., in planning and problems with multipleoutputs).

26

, ,of conventional sub-ETs with different domains (e.g., neuralnetworks). However, in all cases, the whole organism is encoded in alinear genome.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 14/28

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 15/28

5/27/20

Example with + Link29

29Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Analysis

Can we represent the final tree with K-Expression?

30

.

What is the equivalent K-expression?

Issues?

It is difficult to use the genetic operators to evolve because of less

30

number of tails

Multi-genes representation – faster convergence

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 16/28

5/27/20

Example 2 for Non-coding region inmulti-gene

Chromosome has two genes

Head = 3 and tail = 4

31

Operators: N – NOT, O – OR

Fig a) represent the Kexpression

Fig b) the first operator is theconnecting operator

In gene1 – OOcacab – ‘thelast two character ‘ab’

e ongs o non-co ng reg on

In gene2 – NNNbbcb – the

last three character ‘bcb’belongs to non-coding region

Gene1

Gene2

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Example 332

32Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 17/28

5/27/20

Example 433

33Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Points to Remember The type of linking function, as well as the number of

genes and the length of each gene, are a priori chosen

34

for each problem.

So, we can always start by using a single genechromosome, gradually increasing the length of thehead; if it becomes very large, we can increase thenumber of genes and of course choose a function to linkthem.

 

34

,another linking function might be more appropriate.

The idea, of course, is to find a good solution, and GEPprovides the means of finding one.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 18/28

5/27/20

Mutation

Mutations can occur anywhere in the chromosome.35

,chromosomes must remain intact.

In the heads any symbol can change into another(function or terminal); in the tails terminals can onlychange into terminals.

This way, the structural organization of chromosomes

35

is maintained, and all the new individuals produced

by mutation are structurally correct programs.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Mutation – Mother genome36

K – Expression – Equation 3.5

36

Equivalent ET-tree

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 19/28

5/27/20

Daughter Genome37

37Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Mutation…38

38Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 20/28

5/27/20

2

Neutral Mutation

Mutation occur in non-codingregion is called neutral mutation.This mutation does not affect theET of mother and daughter.

ORF ends at position 7 of thehead

Suppose, mutation occur at tailposition 9.

Chan e ‘a’ to ‘b’ 

The ‘phenotype’ of daughter

genome is same as mother.

39Workshop on bio-inspired Computing,

VTU, Mysore, 7-10, June, 2010

Comments on mutation If a function is mutated into a terminal or vice versa,

or a function of one ar ument is mutated into a

40

function of two arguments or vice versa, the ET ismodified drastically.

The change in tree size take place with-outincreasing the computational complexity.

It is worth noticing that in GEP there are noconstraints neither in the kind of mutation nor the

40

number of mutations in a chromosome: in all casesthe newly created individuals are syntacticallycorrect programs.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 21/28

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 22/28

5/27/20

2

A) Transposition…

Any sequence in the genome might become an IS element,therefore these elements are randoml selected throu hout the

43

 chromosome.

A copy of the transposition is made and inserted at anyposition in the head of a gene, except at the start position.

Typically, an IS transposition rate (pis) of 0.1 and a set of threeIS elements of different length are used.

43

The transposition operator randomly chooses the chromosome,the start of the IS element, the target site, and the length of the

transposition.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

A) Transposition… Suppose that the sequence “bba” in gene 2 (positions 12 through 14) was

chosen to be an IS element, and the target site was bond 6 in gene 1 (betweenpositions 5 and 6).

Then, a cut is made in bond 6 and the block “bba” is copied into the s ite ofinsertion.

During transposition, the sequence upstream from the insertion site staysunchanged, whereas the sequence downstream from the copied IS elementloses, at the end of the head, as many symbols as the length of the IS element(in this case the sequence “a*b” was deleted).

--Mother

-- Daughter

44Workshop on bio-inspired Computing,

VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 23/28

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 24/28

5/27/20

2

Root…

During root transposition, the whole head shifts to accommodate the RISelement, losing, at the same time, the last symbols of the head (as many as

47

.

As with IS elements, the tail of the gene subjected to transposition and allnearby genes stay unchanged.

Note, again, that the newly created programs are syntactically correctbecause the structural organization of the chromosome is maintained.

The modifications caused by root transposition are extremely radical,because the root itself is modified.

In nature, if a transposable element is inserted at the beginning of the

47

, ,changes the encoded protein.

Like mutation and IS transposition, root insertion has a tremendoustransforming power and is excellent for creating genetic variation.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

c) Gene Transposition

In gene transposition an entire gene functions as a transposonand trans oses itself to the be innin of the chromosome.

48

.

In contrast to the other forms of transposition, in genetransposition the transposition (the gene) is deleted in the placeof origin.

This way, the length of the chromosome is maintained.

The chromosome to undergo gene transposition is randomly

48

chosen, and one of its genes (except the first, obviously) israndomly chosen to transpose.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 25/28

5/27/20

2

Gene…49

49Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Recombination operator

Three type of recombination operator

 

50

ne-point

Two-point

Gene

Two parents are randomly chosen and paired toexchange the genetic material between them.

50Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 26/28

5/27/20

2

One-point51

51Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Two-point

In two-point recombination the chromosomes are

52

pa re an e wo po n s o recom na on arerandomly chosen.

The material between the recombination points isafterwards exchanged between the twochromosomes, forming two new daughter

52

c romosomes.

Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 27/28

5/27/20

2

Two-point…53

53Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

Gene Recombination54

54Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010

8/7/2019 Talk II - GEP - SS (1)

http://slidepdf.com/reader/full/talk-ii-gep-ss-1 28/28

5/27/20

Observation55

55Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010