28
U-curve Search for Biological States Characterization and Genetic Network Design Marcelo Ris Universidade de São Paulo – Instituto de Matemática e Estatística Junior Barrera – Universidade de São Paulo – Instituto de Matemática e Estatística Helena Brentani - Hospital do Câncer, Fundação Antônio Prudente

presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

U-curve Search for Biological StatesCharacterization and Genetic Network Design

Marcelo Ris – Universidade de São Paulo – Instituto de Matemática e EstatísticaJunior Barrera – Universidade de São Paulo – Instituto de Matemática e EstatísticaHelena Brentani - Hospital do Câncer, Fundação Antônio Prudente

Page 2: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

Outline

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 3: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 4: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Biological ProblemsP1. Biological states characterization

P2. Genetic Network Design

• Gene expression dataP1. States samples

P2. Time-course samples

• Mathematical approach– Feature Selection Problem

– State of the art: heuristic optimizations

– U-curve algorithm

Page 5: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 6: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

][

][

][

][2

1

=

tx

tx

tx

tx

n

M

])}[],[(,]),1[],1[(]),0[],0[{( mymxyxyx K

},,2,1{ nA K⊂

Feature Selection

},,1{][

{][ 21

cKty

rrrrRtx ili

K

K

=∈

∈=∈ R },,,,

KRA →||:ψ

),( YXP

Page 7: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

∑−∈

=

→−

}1,0,1{

1)(

]1,0[}1,0,1{:

y

yP

P

Y Distribution

)''()'(

)'()(

)(log)()(}1,0,1{

YHYH

YHYH

yPyPYHy

=

>

−= ∑−∈

Y Entropy

0)|()(),( ≥−= XYHYHYXI

Mutual Information

-1 0 1

P(Y)

Y

-1 0 1

P(Y’)

Y’

-1 0 1

P(Y’’)

Y’’

Page 8: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

)|(log)|()()]|([ |

}1,0,1{ }1,0,1{

| xyPxyPxPXYHE XY

x y

XYX∑ ∑−∈ −∈

=

Mean Conditional Entropy

Mean Mutual Information

)]|([)()],([ XYHEYHYXIE −=

)|(log)|()()]|([ |

}1,0,1{ }1,0,1{

| xyPxyPxPXYHE XY

x y

XYX

))))

∑ ∑−∈ −∈

=

Estimation

Estimation

)]|([)()],([ XYHEYHYXIE)))

−=

Page 9: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Problem– find the subset A that optimizes the cost

function

– Ex: mean conditional entropy minimization (cost function)

– Exponential

• Search Space– Complete boolean lattice of order n– Each node represents a possible candidate

A– Cost function: estimated for each node

– Find the node with the minimum cost

Page 10: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

Boolean Lattice of order 44-element chain is emphasized

• Heuristics: SFS, SFFS– Incremental– Does not search all the candidates space

– Could not obtain the “best” result• Ex: 2 elements alone turns the result worse, but

together improves it a lot

Page 11: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 12: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• U-curve property of Ê[H(Y|X)]

Ë[H(Y|X)]

|A|

– Why ?

– Estimation composed by:• Real measure – decreases from H(Y) to the real value

E[H(Y|X)]• Estimation error – increases as more attributes are

added to X

– For a fixed number of samples

– For any chain of the search space

– Ê[H(Y|X)] forms an U-curve

Page 13: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Features of the algorithm– Branch-and-Bound: go through the whole space

without having to visit all the candidates– Stochastic– Some definitions:

• U-cost Boolean Lattice

• Local minimum

• Exhausted minimum

• Global minimum

Page 14: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Search space characterized by:– Upper Bound List

– Lower Bound List

• An element is reachable if there is a chain from an upper or lower list element

• At each step:– Select with some

probability a beginning list

– Select an aleatoryelement from this list

– Build a chain iteratively:

• Inserts to the chain an aleatory reachable adjacent to the last one

• Stop, when the cost of the last element is greater than the last one 0000

0100

0110

1110

10

7

9

6

Prune

Procedure

Page 15: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Additional Procedures– Minimum exhausting

• Avoid more than one visit to the same candidate• Using a stack

– Pruning elements from an element E• Upper bound list – remove elements U’s that contain E, and

inserts elemets reachable from U that not contain E• Lower bound list – remove elements L’s that are contained

in E, and inserts elemets reachable from L that is not contained in E

Page 16: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 17: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

][

][

][

][2

1

=

tx

tx

tx

tx

n

M

])}[],[(,]),1[],1[(]),0[],0[{( mymxyxyx K

},,2,1{ nA K⊂

U-curve algorithm

},,1{][

},,,,{][ 21

cKty

rrrrRtx ili

K

K

=∈

∈=∈ R

KRA →||:ψ

),( YXP

Quantized

Microarray

Quantized

Values

Biological

States

Page 18: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 19: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Dynamical Systems– State: vector x– Transition function Ф

– x[t+1] = Ф(x[t])

• Stochastic Process– Stochastic transition function

• Next State – aleatory vector realization

– Ex: Markov Chain (πX|Y , π0 )• Time-discrete, finite-size vector, finite domain

• Aleatory state sequence

||||||||3|||2|||1

3|||3|33|23|1

2|||2|32|22|1

1|||1|31|21|1

|

||

3

2

1

0

=

=

nnnnn

n

n

n

n

RRRRR

R

R

R

XY

R pppp

pppp

pppp

pppp

p

p

p

p

L

MOMMM

L

L

L

M

ππ

Page 20: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

space-sub on this of projection theis e wher

, :assuch , ,dimension

of space-sub a is there,, e.

,1| is there},,..,1{

e is, that tic,determinis-almost d.

),|(,,

is, that t,independenlly condiciona é c.

, ,0 b.

,on independs s,homogeneou is a.

:axioms following with the),(Chain Markov -

,

||

|

|

1|

|

|

||

0|

,

xx

ppnjj

NiRx

pRrnNi

Rx

xyppRyx

Ryxp

tp

xyxy

n

xry

n

XY

n

i ixy

n

XY

n

xy

xyXY

XY

ii

i

=<<

∈∀∈∀

≈∈=∈

∈∀

=∈∀

∈∀>

=

=∏π

π

π

ππ

• Probabilistic Genetic Networks - PGN

Page 21: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

,,,

||||||||||||

3|3|3|3|

2|2|2|2|

1|1|1|1|

|

|||

321

321

321

321

21

=

nl

nnn

l

l

l

i

n

RrRrRrRr

rrrr

rrrr

rrrr

XX

XXXXXX

pppp

pppp

pppp

pppp

P

PPP

L

MOMMM

L

L

L

K

3|33|33|23|1

3|33|33|23|1

2|32|32|22|1

1|31|31|21|1

|

=

nnnnn

n

n

n

pppp

pppp

pppp

pppp

XY

L

MOMMM

L

L

L

π

• Markov Chain

• Probabilistic Genetic Networks - PGN

Almost Deterministic

Page 22: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

Expression (Gene 1)

time

Expression (Gene 2)

time

Expression (Gene 3)

time

. . . . . .

. . . . . .

. . . . . .Expression (Gene n)

time

]1[x ]2[x ]3[x ]4[x ]5[x ]6[x ]7[x ]9[x ]10[x ]11[x ]12[x ]13[x ]1[ −mx ][mx....

ExpressionMeasurement

Techniques

Time-Course Gene Expression Data

Page 23: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

][

][

][

][2

1

=

tx

tx

tx

tx

n

M

])}[],[(,]),1[],1[(]),0[],0[{( mymxyxyx jjj K

},,2,1{ nAj K⊂

U-curve algorithm

]1[][

},,,,{][ 21

+=

∈=∈

txty

rrrrRtx

jj

ili RK

KR jA

j →||

njYXP j ,,1),,( K=

Quantized

Microarray

at t

Quantized

Values

Gene j quantized

expression at t+1

Page 24: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Introduction

• Feature selection problem

• U-curve search algorithm

• Characterization of biological states

• Genetic network design

• Application

Page 25: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

• Application to design a estrogen regulated network– 21 target genes estrogen regulated

– Time-course gene expression (Ed Liu et al.)• MCF-7 cells treated with estrogen• 16 microarray experiments in 24 hours:

– each hour: 8 first hours

– each 2 hours: 16 last hours

– Quantization (Barrera et al.)• 3 levels {-1, 0, 1}

– For each target gene• 15x21 matrix

– 20 first columns – gene expressions on the first 15 experiments

– Last column – gene target expression on the last15 experiments

– The algorithm returns between the 20 genes the subset that best predict the target

Page 26: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006
Page 27: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

THBS1

PGR

CXCL12

STC2

SERPINE1

NRIP1

FZD8

CRABP2

BMP7

IL6ST

REA

EPHA4

CD7

JAK1

CTSD

JDP1

CCNG2

ABCA3

GATA3

NMAIGFBP4

Page 28: presentation - vision.ime.usp.brvision.ime.usp.br/~jb/lectures/genetic networks/presentation.pdf · Title: Microsoft PowerPoint - presentation.ppt Author: jb Created Date: 3/1/2006

Thanks!!!