9
Genome Informatics 16(1): 13–21 (2005) 13 Relationship between Segmental Duplications and Repeat Sequences in Human Chromosome 7 Hiroo Murakami Sachiyo Aburatani [email protected] [email protected] Katsuhisa Horimoto [email protected] Laboratory of Biostatistics, Institute of Medical Science, University of Tokyo, Shirokane-dai 4-6-1, Minato-ku, Tokyo 108-8639, Japan Abstract Various types of repeat sequences are abundant in genomic sequences, and they are associated with the biological phenomena at distinct levels. In particular, comparative analyses of whole- genome-sized sequence data have revealed that repeat sequences cause segmental duplications, which are a type of chromosomal structural arrangement. In this study, we analyzed the rela- tionships between segmental duplications and repeat sequences in human chromosome 7. For this purpose, three methods for detecting repeat sequences were applied to the genomic sequences of human chromosome 7: RepeatMasker for the dispersed repeats, TRF for the tandem repeats, and STEPSTONE for the inter-spread repeats. By plotting the detected repeat sequences against the locations on the chromosome, all three types of repeats were found to be concentrated around the regions of segmental duplications, as a macroscopic feature of their distributions. Furthermore, the latter two repeat sequences were classified in terms of their periods, and the distribution bias of the detected repeat sequences was statistically tested between the segmental duplication regions and the other regions. As a result, the periods of two repeats were biased, with less than a 5% level of significance probability by the χ 2 test, and the repeats with long periods, about 130bp and more than 400bp, were attributed to a bias with a 5% level of significance probability by the normalized residual test. The mechanism of segmental duplications is discussed based on the present results. Keywords: chromosome rearrangement, segmental duplication, dispersed repeat, tandem repeat, inter-spread repeat 1 Introduction It is well known that mammalian genomic DNA sequences are filled with large low-copy duplicated sequences. A segmental duplication is a nearly identical sequence of a genomic DNA segment, typ- ically ranging in size from 1kb to 200 kb [1]. Segmental duplication is one of the common types of chromosomal structural arrangements, and it may contribute to gene and genome evolution [2, 5]. Furthermore, segmental duplication plays an important role in the phenotypes of gene expression and genetic disease [1]. Recently, comparative analyses of whole-genome-sized sequence data have revealed that the repeat sequences cause segmental duplications [1, 4]. A model of the molecular mechanism involved in segmental duplication was proposed, based on the recombination properties of Alu, a type of dispersed repeat, in addition to the physical properties of the genomic sequence [10]. However, it is still unclear what type of repeat sequence is actually involved in segmental duplication [9]. Thus, a comprehensive analysis of the relationship between repeat sequences and segmental duplications is needed. In this study, we analyzed the relationships between segmental duplications and repeat sequences in human chromosome 7, in which has the most segmental duplications among the human chromo- somes [6]. For this purpose, the repeat sequences were detected by three tools, and the distributions

Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

Genome Informatics 16(1): 13–21 (2005) 13

Relationship between Segmental Duplications and

Repeat Sequences in Human Chromosome 7

Hiroo Murakami Sachiyo [email protected] [email protected]

Katsuhisa [email protected]

Laboratory of Biostatistics, Institute of Medical Science, University of Tokyo,Shirokane-dai 4-6-1, Minato-ku, Tokyo 108-8639, Japan

Abstract

Various types of repeat sequences are abundant in genomic sequences, and they are associatedwith the biological phenomena at distinct levels. In particular, comparative analyses of whole-genome-sized sequence data have revealed that repeat sequences cause segmental duplications,which are a type of chromosomal structural arrangement. In this study, we analyzed the rela-tionships between segmental duplications and repeat sequences in human chromosome 7. For thispurpose, three methods for detecting repeat sequences were applied to the genomic sequences ofhuman chromosome 7: RepeatMasker for the dispersed repeats, TRF for the tandem repeats, andSTEPSTONE for the inter-spread repeats. By plotting the detected repeat sequences against thelocations on the chromosome, all three types of repeats were found to be concentrated around theregions of segmental duplications, as a macroscopic feature of their distributions. Furthermore, thelatter two repeat sequences were classified in terms of their periods, and the distribution bias of thedetected repeat sequences was statistically tested between the segmental duplication regions andthe other regions. As a result, the periods of two repeats were biased, with less than a 5% level ofsignificance probability by the χ2 test, and the repeats with long periods, about 130bp and morethan 400bp, were attributed to a bias with a 5% level of significance probability by the normalizedresidual test. The mechanism of segmental duplications is discussed based on the present results.

Keywords: chromosome rearrangement, segmental duplication, dispersed repeat, tandem repeat,inter-spread repeat

1 Introduction

It is well known that mammalian genomic DNA sequences are filled with large low-copy duplicatedsequences. A segmental duplication is a nearly identical sequence of a genomic DNA segment, typ-ically ranging in size from 1kb to 200 kb [1]. Segmental duplication is one of the common types ofchromosomal structural arrangements, and it may contribute to gene and genome evolution [2, 5].Furthermore, segmental duplication plays an important role in the phenotypes of gene expression andgenetic disease [1].

Recently, comparative analyses of whole-genome-sized sequence data have revealed that the repeatsequences cause segmental duplications [1, 4]. A model of the molecular mechanism involved insegmental duplication was proposed, based on the recombination properties of Alu, a type of dispersedrepeat, in addition to the physical properties of the genomic sequence [10]. However, it is still unclearwhat type of repeat sequence is actually involved in segmental duplication [9]. Thus, a comprehensiveanalysis of the relationship between repeat sequences and segmental duplications is needed.

In this study, we analyzed the relationships between segmental duplications and repeat sequencesin human chromosome 7, in which has the most segmental duplications among the human chromo-somes [6]. For this purpose, the repeat sequences were detected by three tools, and the distributions

Page 2: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

14 Murakami et al.

of the detected repeat sequences were corresponded with the regions of the chromosome, in which thesegmental duplications were observed. In particular, we focused on the period of the repeat sequenceto reveal the features of the repeat sequences that are related to the segmental duplications.

2 Materials and Method

2.1 Data

In this study, we analyzed the human chromosome 7 genomic sequence (153794793 bp, GenBank ac-cession: BL000002, version 020621) [6]. The regions of segmental duplications in chromosome 7 weredefined as the sequences with more than 90% similarity over more than 1kb in length, and the locationsof the segmental duplication regions were identified from the supplemental figure [6, 12]; the figurewas converted to a high-resolution bit-map, in which one dot corresponds to 29,289 bp in the genomicsequence. The segmental duplications were classified into two types: one is the intra-chromosomal seg-mental duplication, which indicates the segmental duplications in the same chromosome, and the otheris the inter-chromosomal segmental duplication, which indicates the segmental duplication betweendifferent chromosomes [6].

The number of intra-chromosomal segmental duplications was 47, with an average length of 7.47dots, and that of inter-chromosomal segmental duplication was 20, with an average length of 7.35dots. In this study, we define the repeats related with the segmental duplications, if they are locatedwithin 1 dot from both ends of the segmental duplication regions.

2.2 Methods for Detecting Repeat Sequences

Three methods were used to detect the repeat sequences: RepeatMasker [11], Tandem Repeat Finder(TRF, released in 2002) [3] and STEPSTONE (version 1.07) [7]. RepeatMasker detects dispersedrepeats by using similarity-search programs with a consensus pattern database, RepBase. TandemRepeat Finder (TRF) is one of the commonly used programs for detecting tandem repeats. STEP-STONE detects inter-spread repeats, which are periodic repeat sequences composed of a repeatedconsensus sequence and a non-consensus spacer region. Thus, three types of repeat sequences, dis-persed repeat, tandem repeat, and inter-spread repeat, were investigated in the present study.

3 Results

First, we explored the macroscopic relationship between the segmental duplications and the repeatsequences, by plotting the locations and the periods of the repeats against the locations of the seg-mental duplications over the entire region of human chromosome 7. Then, the characteristics of therepeat sequences related with the segmental duplications were revealed by statistical tests, in termsof the period of the repeat sequence.

3.1 Densities of Repeat Sequences Detected by RepeatMasker, TRF and STEP-STONE

To investigate the relationship between the segmental duplications and the repeat sequences, first, weplotted the locations of the repeat sequences detected by the three methods over the entire chromo-some, with the segmental duplication regions. To view the macroscopic features of the distribution ofthe detected repeat sequences, the density of the repeat sequences within a range of the chromosomewas calculated.

The densities of the three types of repeat sequences are schematically shown in Figure 1. Thetotal number of dispersed repeats was 264,407 (upper plot in Figure 1). The number of dispersed

Page 3: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

Relationship between Segmental Duplications and Repeat Sequences 15

repeats located around the intra-chromosomal segmental duplication regions in the plot was 19,645,and that around the inter-chromosomal duplication regions was 5,435. In the figure, the densityof dispersed repeats is relatively higher nearby and within the regions of intra- and inter-segmentalduplications than in the other regions. Indeed, in the intra-segmental regions, the density of dispersedrepeats per dot is 55.95 (= 19, 645/(47 × 7.47)), while the density in the remaining regions is 49.95(= (264, 407 − 19, 645)/(5, 251 − 47 × 7.47)). As an exception, the density is low in the large regionof inter-segmental duplication in the middle of the chromosome. This is because the density in theinter-segmental regions, 36.97 (= 5, 435/(7.35× 20)), is less than that in the remaining regions, 50.74(= (264, 407 − 5, 435)/(5, 251 − 7.35 × 20)).

To identify the relationship between the segmental duplications and the tandem and inter-spreadrepeat sequences, we plotted the densities of the repeat sequences detected by TRF and STEPSTONE(middle and lower plots in Figure 1), after removing the dispersed repeats detected by RepeatMasker.The total number of repeat sequences detected by TRF was 6,749; 599 repeat sequences were locatedaround the intra-segmental duplication regions, and 275 repeat sequences were located around theinter-segmental duplication regions. The total number of repeats detected by STEPSTONE was37,190; 2,517 repeat sequences were located around the intra-segmental duplication regions, and 972repeat sequences were located around the inter-segmental duplication regions. Since the locationdata of the segmental duplications are estimated by the supplemental figure [6, 12] in the presentstudy, the density of the two types of repeats could not be estimated precisely, in the remainingregions after removing the dispersed repeats detected by RepeatMasker. By a rough calculation, thedensities of repeats detected by TRF are 1.71 (= 599/(47 × 7.47)) and 1.87 (= 275/(7.35 × 20))in the intra- and inter-segmental duplication regions, while those in the remaining regions are 1.26(= (6, 749 − 599)/(5, 251 − 47 × 7.47)) and 1.27 (= (6, 749 − 275)/(5, 251 − 7.35 × 20)) . As for thedensities of repeats detected by STEPSTONE, the densities in the segmental duplication regions arealmost equal to those in the remaining regions; 7.17 (= 2, 517/(47×7.47)) and 6.61 (= 972/(7.35×20))in the intra- and inter-segmental duplication regions and 7.08 (= (37, 190−2, 517)/(5, 251−47×7.47))and 7.10 (= (37, 190 − 972)/(5, 251 − 7.35 × 20)) in the remaining regions. As a result, the densityof repeat sequences detected by TRF and STEPSTONE is slightly high in the segmental duplicationregions.

The fluctuation of the densities is large in the repeats detected by RepeatMasker and STEPSTONE,while it is small in those detected by TRF. Although the degree of the density fluctuation is differentbetween the densities detected by TRF and by STEPSTONE, the macroscopic form of the densityover the entire chromosome is similar between the repeats detected by TRF and by STEPSTONE.In addition, the densities at both ends of the chromosome are high due to the telomere region, whererepeats were frequently found in a previous study [3].

3.2 Periods of Repeat Sequences Detected by TRF and STEPSTONE

To inspect the relationship between segmental duplications and repeat sequences apart from the repeatdensity, further, each period of the detected repeat sequences was plotted on the chromosome, as shownin Figure 2. Note that the periods in the figure are investigated for the repeats after removing thedispersed repeats detected by RepeatMasker. This is because the periods of dispersed repeats detectedby RepeatMasker are well characterized in the database [11]. At any rate, this plot is useful forintuitively investigating the relationship between the periods and the segmental duplications over theentire chromosome.

The repeats detected by STEPSTONE show more diverse ranges of the periods, especially the longperiods, than those revealed by TRF. One of the remarkable features of the periods in the segmentalduplications is that relatively long periods frequently emerge in both distributions of the periods bythe two methods. In particular, the long periods in the segmental duplications are found not in themiddle of the segmental duplication regions but in the boundaries of them. For example, in about

Page 4: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

16 Murakami et al.

1

5

0,0

00

,00

0 1

00

,00

0,0

00

1

50

,00

0,0

00

Location

Fig

ure

1:

Co

rres

po

nd

ence

bet

wee

n s

egm

enta

l d

up

lica

tio

n r

egio

ns

and

det

ecte

d r

epea

t se

qu

ence

s. T

he

ho

rizo

nta

l ax

is i

s th

e lo

cati

on

of

the

gen

om

ic s

equ

ence

,

and

th

e v

erti

cal

axis

is

the

den

sity

of

the

rep

eat

seq

uen

ces

det

ecte

d b

y RepeatMasker

(u

pp

er),

TRF

(m

idd

le)

and

STEPSTONE

(lo

wer

), r

esp

ecti

vel

y. T

he

den

sity

is

no

rmal

ized

, w

ith

th

e av

erag

e an

d t

he

stan

dar

d d

evia

tion

of

the

nu

mb

ers

of

rep

eats

det

ecte

d w

ith

in 1

do

t. T

he

ver

tica

l li

nes

th

rou

gh

th

e g

rap

h

ind

icat

e th

e lo

cati

on

s o

f se

gm

enta

l d

up

lica

tio

n r

egio

ns:

in

tra-

chro

mo

som

al d

up

lica

tio

n r

egio

n (

blu

e) a

nd

in

ter-

chro

mo

som

al d

up

lica

tio

n r

egio

n (

red

).

Page 5: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

Relationship between Segmental Duplications and Repeat Sequences 17

1

5

0,0

00

,00

0 1

00

,00

0,0

00

1

50,0

00,0

00

Location

Fig

ure

2:

Co

rres

po

nd

ence

bet

wee

n t

he

per

iod

s o

f th

e re

pea

t se

qu

ence

s d

etec

ted

by

TRF

(upper

) an

d STEPSTONE

(lo

wer

) an

d t

he

loca

tio

ns

of

seg

men

tal

du

pli

cati

on

reg

ion

s in

hu

man

ch

rom

oso

me

7.

In t

he

plo

ts,

the

ho

rizo

nta

l ax

is i

s th

e lo

cati

on

of

the

gen

om

ic s

equ

ence

, an

d t

he

ver

tica

l ax

is i

s th

e p

erio

d.

Th

e

per

iod

val

ues

ran

ge

fro

m 1

bp

(b

ott

om

) to

50

0 b

p (

top).

Th

e p

erio

ds

wit

h m

ore

th

an 5

00

bp

are

in

clu

ded

in

th

e b

ar w

ith

th

e 5

00

bp

per

iod

. T

he

ver

tica

l li

nes

thro

ugh

th

e g

rap

h

ind

icat

e th

e lo

cati

on

s o

f th

e se

gm

enta

l d

up

lica

tio

n

reg

ion

s:

intr

a-ch

rom

oso

mal

d

up

lica

tio

n

reg

ion

(b

lue)

an

d

inte

r-ch

rom

oso

mal

du

pli

cati

on

reg

ion

(re

d).

Page 6: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

18 Murakami et al.

55Mbp region (a large red band in the figure), long periods are concentrated around the boundaries ofthe segmental duplication regions. In the following subsection, we will statistically test these intuitivefeatures from Figure 2.

3.3 Statistical Tests for the Periods of Repeats in Segmental Duplication Regions

To further analyze the relationship between the repeat sequence periods and the segmental duplica-tions, we listed the periods of the repeat sequences detected by TRF and STEPSTONE in Table 1.All repeats found in the entire region of chromosome were classified in terms of the period, and thenthe periods of the repeats found in the intra- and inter-segmental duplication regions were selectedfrom them.

Table 1 reveals that the numbers of repeat sequences detected by STEPSTONE are larger thanthose found by TRF in almost all periods. This is partly because STEPSTONE detects both tandemand inter-spread repeats, and partly because STEPSTONE requires local sequence similarity withinthe repeat sequence, rather than entire similarity, as in the case of the detection by TRF. In addition,STEPSTONE detects repeat sequences with longer periods than those detected by TRF. Althoughthe sequence similarity of the repeats detected by STEPSTONE is less than that revealed by TRF,STEPSTONE detect a wide variety of repeats, especially to the period. Although the number ofrepeats in the intra-segmental duplication regions is larger than that in the inter-segmental duplicationregions, the decrease degree of repeat numbers in the ascending order of period ranges in the intra-segmental duplication regions is almost similar to that in the inter-segmental duplication regions, inboth cases of the detection by TRF and STEPSTONE.

Table 1: Periods of the repeat sequences detected by TRF and STEPSTONE.Period TRF STEPSTONE

all intra inter all intra inter1-20 3181 139 50 10168 446 144

21-40 2013 100 34 10275 420 15441-60 772 47 23 6052 283 9761-80 336 19 16 4883 197 80

81-100 215 11 7 4102 212 73101-120 85 0 0 967 39 8121-140 43 6 4 229 20 8141-160 30 3 2 101 6 3161-180 31 2 1 72 5 1181-200 8 3 3 39 2 2201-300 21 2 2 85 3 3301-400 2 0 0 53 4 2>= 401 12 1 1 167 14 5

The total numbers of detected repeat sequences within each period are listed in the ‘all’ column,and the numbers of repeats around the intra-and inter- segmental duplications are listed in the ‘intra’and ‘inter’ columns, respectively.

Apart from the number of repeats, the relative ratios in each period were investigated by a χ2 testbetween four pairs of the distributions of repeats in Table 2; the comparison between the total numberof repeats and the number of repeats detected in both the intra- and inter- segmental duplicationregions, between the total number of repeats and the number of repeats detected in the intra-segmentalduplication regions, between the total number of repeats and the number of repeats detected inthe inter-segmental duplication regions, and between the numbers of repeats detected in the intra-

Page 7: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

Relationship between Segmental Duplications and Repeat Sequences 19

segmental duplication regions and those in the inter-segmental duplication regions. Thus, four testswere performed in the respective cases of the detection by TRF and STEPSTONE. The test reveledthat the distributions of the periods in the segmental duplication regions are highly biased against thedistribution of the periods in the entire region, while the difference of the periods between the intra-and inter-segmental duplication regions is not significant at the 5% level. These results indicate theexistence of repeat sequences with periods specific to the segmental duplications.

Table 2: Distribution bias of the periods in segmental duplication regions.all vs. (intra + inter) all vs. intra all vs. inter intra vs. inter

TRF p <0.001 p <0.005 p <0.001 NSSTEPSTONE p <0.005 p <0.01 p <0.05 NSNS indicates not significant (p ≥0.05).

The periods specific to the segmental duplications, due to the bias of period distributions, wererevealed by the normalized residual analysis in Table 3; each residual test was performed for in threeperiod distributions (all, intra, and inter) of the detection by TRF and STEPSTONE. The two testsfor the distributions of the detection by TRF and STEPSTONE shared many biased periods with 5%significance probability. Indeed, the periods of 121 to 140 bp, of 181 to 200 bp, and of more than 400bp are frequently found in the table. As expected from the overall features of Figure 2, the relativelylonger periods are attributable to the period distribution bias. Indeed, the periods of 121 to 140 bpand of more than 400 bp are estimated as biased periods with 5% significance probability in the threeof four cases. In this study, we will further investigate features of the repeats with the two long biasedperiods.

Table 3: Characteristic periods in the biased distribution.all vs. intra all vs. inter

TRF 181-200, ≥401 61-80, 121-140, 181-200, ≥401STEPSTONE 81-100, 121-140, ≥401 121-140

Another remarkable feature of the repeats with the biased periods is that the G+C content ishighly biased in most of the repeats. Indeed, a bias with l% significance probability is found in 20of 28 repeats with 121 to 140 bp periods, and in 13 of 19 repeats with more than 400 bp periods.Interestingly, high G+C content regions favor the formation of Z-form DNA and are involved in therelease of negative super-coiling [8]. Although the relationship between the release of negative super-coiling and the segmental duplication is still unclear, the observed G+C content bias might be relatedto the frequent occurrence of the repeats in the segmental duplication regions.

4 Discussion

We have presented analyses of the relationship between segmental duplications and repeat sequencesin human chromosome 7 by the graphical plots and the statistical analyses. The plots revealed thatthe repeats, especially those with long periods, are frequently found around the segmental duplica-tion regions. The statistical analyses showed that some periods are biased between the segmentalduplication regions and the other regions, and not between the intra- and inter-segmental duplicationregions.

In a previous study [6], two characteristic features were described for two types of segmentalduplications: the intra-segmental duplications of chromosome 7 are larger, and share higher sequence

Page 8: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

20 Murakami et al.

similarity than the inter-segmental duplications. In addition, the distributions of their locations differfrom each other. The inter-segmental duplications are frequently observed in the peri-centromericand sub-telomeric regions [2, 6], while the intra-segmental duplications are found throughout theentire chromosomes. Interestingly, little bias of periods exists between the two types of segmentalduplications. Thus, the mechanism for the occurrence of the duplication might be similar between theintra- and inter-segmental duplications.

In this study, we detected 28 repeat sequences with 121 - 140 bp periods, and 19 repeat sequenceswith more than 400 bp periods, as the biased periods by a statistical test. Interestingly, although theG+C content is frequently biased with significant probability, little sequence similarity is observedamong the repeat sequences by visual inspections. Thus, the biased repeat sequences related to thesegmental duplications may be characterized by the physical properties, such as G+C content, ratherthan the sequence similarity.

Acknowledgments

One of the authors (K. H.) was partly supported by a Grant-in-Aid for Scientific Research on PriorityAreas “Genome Information Science” (grant 17017015) from the Ministry of Education, Culture,Sports, Science and Technology of Japan.

References

[1] Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers,E.W., Li, P.W., and Eichler, E.E., Recent segmental duplications in the human genome, Science,297:1003–1007, 2002.

[2] Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J., and Eichler, E.E., Segmental duplications:Organization and impact within the current human genome project assembly, Genome Res.,11:1005–1017, 2001.

[3] Benson, G., Tandem repeats finder: A program to analyze DNA sequences, Nucleic Acids Res.,27(2):573–580, 1999.

[4] Cheung, J., Estivill, X., Khaja, R., MacDonald, J.R., Lau, K., Tsui, L.-C., and Scherer, S.W.,Genome-wide detection of segmental duplications and potential assembly errors in the humangenome sequence, Genome Biol., 4:R25, 2003.

[5] Eichler, E.E. and Sankoff, D., Structural dynamics of eukaryotic chromosome evolution, Science,301:793–797, 2003.

[6] Hillier, L.W., Fulton, R.S., Fulton, L.A., Graves, T.A., Pepin, K.H., Wagner-McPherson, C.,Layman, D., Maas, J., Jaeger, S., Walker, R., Wylie, K., Sekhon, M., Becker, M.C., O’Laughlin,M.D., Schaller, M.E., et al., The DNA sequence of human chromosome 7, Nature, 424:157–164.,2003.

[7] Murakami, H., Sugaya, N., Sato, M., Imaizumi, A., Aburatani, S., and Horimoto, H., Detectionof inter-spread repeat sequence in genomic DNA sequence, Genome Informatics, 15(1):170–179,2004.

[8] Rich, A. and Zhang, S., Z-DNA: The long road to biological function, Nat. Rev. Genet., 4:566–572,2003.

[9] Zhang, L., Lu., H.H.S., Chung, W., Yang, J., and Li, W.H., Patterns of segmental duplication inthe human genome, Mol. Biol. Evol., 22(1):135–141, 2005.

Page 9: Relationship between Segmental Duplications and …Relationship between Segmental Duplications and Repeat Sequences 15 repeats located around the intra-chromosomal segmental duplication

Relationship between Segmental Duplications and Repeat Sequences 21

[10] Zhou, Y. and Mishra, B., Quantifying the mechanisms for segmental duplications in mammaliangenomes by statistical analysis and modeling, Proc. Natl. Acad. Sci. USA, 102(11):4051–4056,2005.

[11] http://ftp.genome.washington.edu/RM/RepeatMasker.html

[12] http://www.nature.com/nature/journal/v424/n6945/suppinfo/nature01782.html