35
A Study on Measuring Distance between Two Trees Advisor: 阮阮阮 阮阮 阮阮阮 阮阮 Presenter : 阮阮阮 阮阮阮

A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

Embed Size (px)

Citation preview

Page 1: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

A Study on Measuring Distance between Two Trees

Advisor: 阮夙姿 教授阮夙姿 教授Presenter : 林陳輝林陳輝

Page 2: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 2

OutlineIntroduction

Problem definition

Related workThe metric and algorithms

Mixture distanceBasic algorithmThe modified algorithm

Mixture - matching distanceMixture - matching distance

Conclusions and Future work

Page 3: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 3

Introduction

Evolutionary tree

Comparing trees

Comparing trees is not easy

-Phylogenetic tree, wikipedia

Page 4: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 4

Mixture tree

taxa

Time

S.-C. Chen and B. G. Lindsay, “Building Mixture Trees from Binary Sequence Data,” Biometrika, 2006.

Page 5: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 5

Problem definition

1111

99 88

11 33 55 77

A B C D E F G H

v1

v2 v3

v4v5 v6

v7

•The leaves are associating taxas

•There is a time parameter on every internal node

Page 6: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 6

OutlineIntroduction

Problem definition

Related workThe metric and algorithms

Mixture distanceBasic algorithmThe modified algorithm

Mixture - matching distanceMixture - matching distance

Conclusions and Future work

Page 7: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 7

Related workPath difference metric

dp(T1, T2) = ||d(T1) – d(T2)||2

d(Ti) is a vector that contains all pair leaves distance of

Ti.

M. A. Steel and D. Penny, “Distributions of Tree Comparison Metrics – Some New Results,” Syst. Biol. 42(2):126-141, 1993.

Page 8: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 8

Related workNodal metric

In full binary trees, the complexity is O(n3).In complete binary trees, the complexity is O(n2 log n). John Bluis and Dong-Guk Shin, “Nodal Distance Algorithm: Calculating a Phylogenetic Tree Comparison Metric,” Proc. of the 3rd IEEE Symposium on BioInformatics and BioEngineering, 87- 94, 2003

leaves. are for ,) ,() ,(Distance21

yx,yxDyxD TT

Page 9: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 9

Related work

Matching distanceP. W. Diaconis and S. P. Holmes, “Matchings and Phylogenetic Trees.," Proc. Natl Acad Sci U S A, Vol. 95, No. 25, pp. 14600~14602, 1998.

The algorithm for matching distanceG. Valiente, A Fast Algorithmic Technique for Comparing Large Phylogenetic Trees," SPIRE, pp. 370~375, 2005.

Page 10: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 10

Matching Representation

1 2

3 4

5 6

0

0

0

0

07 8

9 10

11

{1,2} {5,6} {3,7} {4,8} {9,10}

Page 11: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 11

Matching distance

{1,2} {5,6} {3,7} {4,8} {9,10}

{1,3} {4,6} {2,7} {5,8} {9,10}

The distance is 2

3 4

5 6

8

9 10

7

1 2

2 5

4 6

8

9 10

7

1 3

11 11T1

T2

T1

T2

Page 12: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 12

OutlineIntroduction

Problem definition

Related workThe metric and algorithms

Mixture distanceBasic algorithmThe modified algorithm

Mixture - matching distanceMixture - matching distance

Conclusion and Future work

Page 13: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 13

Mixture distance and algorithmsDefinition:

pTi (x, y) is time parameter of the LCA of leaves x, y

leaves. are for ,),(),(Distance21

yx,yxpyxp TT

99

11 33

A B C D

v1

v3v2

99

22 33

A BC D

v1

v3v2

Page 14: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 14

Distance conditions

The distance from an object to itself is zero.

The distance from A to B is the same as the distance from B to A.

The Triangle Inequality holds true.

- J. Felsenstein, Inferring phylogenies. Sunderland, MA: Sinauer Associates, 2004.

Page 15: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 16

Algorithm

C(n, 2)

Algorithmic idea: grouping

Full binary tree99

11 33

A B C D

v1

v2

88

44

11

A B C D

v1

v2

v3v3

AB: |8 – 1| = 7

AC: |8 – 9| = 1

AD: |8 – 9| = 1

BC: |4 – 9| = 5

BD: |4 – 9| = 5

CD: |1 – 3| = 2

Distance = 21

leaves. are for ,),(),(Distance21

yx,yxpyxp TT

Page 16: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 17

99

77 88

22 33 44 55

A B C D E F G H

v1

v2 v3

v4 v5 v6v7

T199

66 88

11 33 44 55

HG FA B CD E

v1

v2 v3

v4 v5 v6v7

T2

Algorithm

Page 17: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 18

99

HG FA B CD E

T2

Red:1 Green:1

99

7788

22 33 44 55

A B C D E F G H

v1

v2v3

v4 v5 v6v7

Red:0 Green:1

Red:1 Green:0

Red:0 Green:1

Red:1 Green:0

66 88

11 33 44 55

v1

v2 v3

v4 v5 v6v7

Red:1Green:1

Red:2 Green:2

T1

|pT1(v1) - pT2

(v6)| × (1 × 1+0 × 0) = |9 - 4| × (1*1+0*0) =

5

|pT1(v1) - pT2

(v7)| × (0 × 0+1 × 1) = |9 - 5| × (0*0+1*1) =

4

|pT1(v1) - pT2

(v3)| × (1 × 1+1 × 1) = |9 - 8| × (1*1+1*1) =

2

Page 18: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 19

T2

99

66 88

11 33 44 55

HG FA B CD E

v1

v2 v3

v4 v5 v6v7

Red:0 Green:1

Red:0Green:1

99

77 88

22 3344 55

A B C D E F G H

v1

v2 v3

v4 v5 v6v7

T1

Red:1 Green:0

Red:1 Green:0

Red:0 Green:0

Red:0 Green:0

Red:0Green:2

Red:2Green:0

|pT1(v2) - pT2

(v2)| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) =

0|pT1(v2) - pT2

(v3)| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1)

= 0|pT1(v2) - pT2

(v1)| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) =

8

Red:2Green:2

CTLab
2/0--2/0--1/1 |- 2/0--0/0
Page 19: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 20

Complexity analysis

For every internal node of T1, coloring all leaves

needs O(n).

Counting distance in T2 needs O(n).

The time complexity is O(n2).

Page 20: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 21

The modified algorithm

Boost up the basic algorithm

Too much empty color information

Page 21: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 22

T2

99

66 88

11 33 44 55

HG FA B CD E

v1

v2 v3

v4 v5 v6v7

Red:0 Green:1

Red:0Green:1

99

77 88

22 3344 55

A B C D E F G H

v1

v2 v3

v4 v5 v6v7

T1

Red:1 Green:0

Red:1 Green:0

Red:0 Green:0

Red:0 Green:0

Red:0Green:2

Red:2Green:0

|pT1(v2) - pT2

(v2)| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) =

0|pT1(v2) - pT2

(v3)| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1)

= 0|pT1(v2) - pT2

(v1)| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) =

8

Red:2Green:2

Empty color information

CTLab
2/0--2/0--1/1 |- 2/0--0/0
Page 22: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 23

T2

99

66 88

11 33 44 55

HG FA B CD E

v1

v2 v3

v4 v5 v6v7

T2

99

88

11

A B CD

v1

v3

v4

Page 23: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 24

The modified algorithm

Finding LCA in constant time with O(n) preprocessing

MA Bender, MIF Colton, The LCA Problem Revisited, Proc. LATIN, 2000

2-way merge problemR.C.T. Lee, S. S. Tseng, R.C. Chang and Y. T. Tsai, Introduction to the Design and Analysis of Algorithms. McGraw-Hill Education, 2005

Page 24: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 25

9

7 8

2 3 4 5

HG FA B CD E

v1

v2 v3

v4 v5 v6v7

T2

9

6 8

1 3 4 5

A B C D E F G H

v1

v2 v3

v4 v5 v6v7

T1

1 2

3

4 5

6

7

8 9

10

11 12

13

14

15

1 2 45 8 911 12

Page 25: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 26

9

7 8

2 3 4 5

HG FA B CD E

v1

v2 v3

v4 v5 v6

v7

T2

1 2

45 8 911 12

1, 2 11, 12 5,84, 9

13 v4 |1 – 2| (1 1 + 0 0) = 19

6 8

1 3 4 5

A B C D E F G H

v1

v2 v3

v4 v5 v6v7

T1

1 2

3

4 5

6

7

8 9

10

11 12

13

14

15

1 2

Page 26: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 27

9

7 8

2 3 4 5

HG FA B CD E

v1

v2 v3

v4 v5 v6

v7

T2

45 8 9

11 12

1, 2 11, 12 5,84, 9

1, 2, 11, 12 4, 5, 8, 9

1, 2, 4, 5, 8, 9, 11, 12

|9 – 7| (2 2 – 0 0) = 8

9

6 8

1 3 4 5

A B C D E F G H

v1

v2 v3

v4 v5 v6v7

T1

1 2

3

4 5

6

7

8 9

10

11 12

13

14

15

9

1 5

v1

v4

3 13v7

11 121 2

1 2

15

HGA B

Page 27: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 28

Complexity analysis

To reconstruct subtree of T1 is in linear time

Counting distance in reconstructed subtree needs O(m).

The height of complete binary tree is O(logn)

The total complexity is O(nlogn) in complete binary tree.

Page 28: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 29

OutlineIntroduction

Problem definition

Related worksThe metric and algorithms

Mixture distanceBasic algorithmThe modified algorithm

Mixture - matching distanceMixture - matching distance

Conclusions and Future work

Page 29: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 30

Mixture-matching distance

Distance =

i is matching distance between T1 and T2.

PTm denotes the product of all time parameter in Tm

2 ,1 , and ,for , /1 mnPPiPP mnmn TTTT

Page 30: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 31

9

7 8

2 3 4 5

HG FA B CD E

T2

9

6 8

1 3 4 5

A B C D E F G H

T1

1 2 3 4 5 6 7 8

9 10 11 12

13 14

15

1 2 4 58

9 11 10

367

12

13 14

15

{1, 2} {3, 4} {5, 6} {7, 8} {9,10} {11, 12} {13, 14}

{1, 2} {3, 6} {4, 5} {7, 8} {9,12} {10, 11} {13, 14}

Distance = 1 - (25920 / 60480) + 2 ≒ 2.571

604801 TP

259202 TP

T1

T2

Page 31: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 32

0

1

The sameNo different leaves

i

i transposition

Distance

Distance = 1 - (25920 / 60480) + 2 ≒ 2.571

The time complexity is O(n)

2 ,1 , and ,for , /1 mnPPiPP mnmn TTTTDistance =

Page 32: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 33

OutlineIntroduction

Problem definition

Related worksThe metric and algorithms

Mixture distanceBasic algorithmThe modified algorithm

Mixture - matching distanceMixture - matching distance

Conclusions and Future work

Page 33: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 34

Conclusions

Metric ConsiderenceTime complexity

Full binary tree

Complete binary tree

Path difference metric Structure N/ANodal distance Structure O(n3) O(n2logn)

Mixture distanceStructure and

time parameterO(n2) O(nlogn)

Matching distance Structure O(n)

Mixture-matching distance

Structure and

time parameterO(n)

Page 34: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University 35

Future work

Improve the time complexity

Extend to k - ary trees

Add mutation point

Page 35: A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

Thanks for Your Listening.