Mol Biol Evol 2011 Paar 1877 92

Embed Size (px)

Citation preview

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    1/16

    Intragene Higher Order Repeats in NeuroblastomaBreakPoint Family Genes Distinguish Humans fromChimpanzees

    Vladimir Paar,*,1 Matko Gluncic,,1 Marija Rosandic,2 Ivan Basar,1 and Ines Vlahovic1

    1Faculty of Science, University of Zagreb, Zagreb, Croatia

    2Department of Internal Medicine, University Hospital Rebro, University of Zagreb, Zagreb, Croatia

    These authors contributed equally to this work.*Corresponding author: E-mail: [email protected].

    Associate editor: James McInerney

    Abstract

    Much attention has been devoted to identifying genomic patterns underlying the evolution of the human brain and itsemergent advanced cognitive capabilities, which lie at the heart of differences distinguishing humans from chimpanzees,our closest living relatives. Here, we identify two particular intragene repeat structures of noncoding human DNA,spanning as much as a hundred kilobases, that are present in human genome but are absent from the chimpanzee genomeand other nonhuman primates. Using our novel computational method Global Repeat Map, we examine tandem repeatstructure in human and chimpanzee chromosome 1. In human chromosome 1, we find three higher order repeats (HORs),

    two of them novel, not reported previously, whereas in chimpanzee chromosome 1, we find only one HOR, a 2mer alphoidHOR instead of human alphoid 11mer HOR. In human chromosome 1, we identify an HOR based on 39-bp primary repeatunit, with secondary, tertiary, and quartic repeat units, fully embedded in human hornerin gene, related to regeneratingand psoriatric skin. Such an HOR is not found in chimpanzee chromosome 1. We find a remarkable human 3mer HORorganization based on the ;1.6-kb primary repeat unit, fully embedded within the neuroblastoma breakpoint familygenes, which is related to the function of the human brain. Such HORs are not present in chimpanzees. In general, we findthat humanchimpanzee differences are much larger for tandem repeats, in particularly for HORs, than for genesequences. This may be of great significance in light of recent studies that are beginning to reveal the large-scale regulatoryarchitecture of the human genome, in particular the role of noncoding sequences. We hypothesize about the possibleimportance of human accelerated HOR patterns as components in the gene expression multilayered regulatory network.

    Key words:human brain evolution, chromosome 1, higher order repeats, NBPF genes, human hornerin gene, global repeatmap.

    Introduction

    HumanChimpanzee Divergence and RegulatoryElementsIdentification of genomic differences that set humans apartfrom chimpanzees is important from evolutionary, medical,and cultural perspectives (Dorus et al. 2004;Enard and Paa-bo 2004;Hayakawa et al. 2005;Haygood et al. 2007,2010;Kleinjan and Lettice 2008; Portin 2008; Varki et al. 2008;Woolfe and Elgar 2008; Enard et al. 2009; Taft et al.2010). The most striking trend in human evolution isthe rapid increase in brain size over the past 34 My

    and the associated increase in cognitive capacity and com-plexity (Jerison 1975;Mekel-Bobrov et al. 2007). It could beexpected that the genetic changes, potentially underlyinghuman brain evolution, span a wide range from nucleotidesubstitutions to large-scale structural alteration of the ge-nome (Vallender 2008;Vallender et al. 2008). Chimpanzeesshare nearly 99% of human DNA in genes, but given thesubstantial number of neutral mutations, only a small sub-set of the observed gene differences is likely to be respon-sible for the key phenotypic changes between humans and

    chimpanzees (The chimpanzee sequencing and analysisconsortium 2005). Developmental and evolutionarymechanisms underlying the characteristics that set thehuman brain apart from other human primates are poorlyunderstood (Pollard et al. 2006a).

    Recently, computational search was performed for DNAsequences that have changed most since humans andchimpanzees diverged from their last common ancestor(Amadio and Walsh 2006; Pollard et al. 2006b; Prabhakaret al. 2006;Bird et al. 2007;Beniaminov et al. 2009). Amongsuch human accelerated regions (HARs) the most pro-nounced is a 118-bp stretch on human chromosome 20,

    known as HAR1. It is highly conserved among species dur-ing most of vertebrate evolution, while changing signifi-cantly after the chimpanzeehuman split: The humanand chimpanzee HAR1 sequences differ by 18 bp. This rapidburst of base pair substitutions in HAR1 was interpreted asevidence that HAR1 acquired an important new functionin humans and in particular may have altered human brainby influencing the activity of a whole network of genes(Pollard et al. 2006a). On the other hand, many highly con-served noncoding sequences have been found (Dermitzakis

    The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, pleasee-mail: [email protected]

    Mol. Biol. Evol. 28(6):18771892. 2011 doi:10.1093/molbev/msr009 Advance Access publication January 27, 2011 1877

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    2/16

    et al. 2002; Bejerano et al. 2004, 2006; Ovcharenko 2008;Stephen et al. 2009).

    Half a century ago,Jacob and Monod (1961) discoveredthat specific noncoding sequences are required to activategenes inEscherichia coliand noted that mutations in theseregulatory sequences might play a role in the evolution oforganismal traits. Britten and Davidson (1969, 1971) pro-posed a model that discrete regulatory sequences are abun-

    dant in the genomes of higher eukaryotes and that increasesin the number of regulatory elements and the complexity ofgene regulatory networks drive increased biological com-plexity in evolution.King and Wilson (1975)proposed thatphenotypic differences between humans and chimpanzeesare mainly caused by regulatory changes in gene expression.Raff and Kaufman (1983) hypothesized that mutations af-fecting the function of regulatory genes are likely to underliemany evolutionary changes in morphology: changes in justa handful of regulatory interactions could cause large-scalemorphological changes. Recent investigations are progress-ing along such general framework.

    Components of regulatory control in the human ge-nome include proximal regulatory elements and long-rangeelements that can act across large genomic distances toinfluence the spatial and temporal distribution of gene ex-pression, with enhancers and repressors of gene activity of-ten located in evolutionary conserved noncoding regions ofthe genome (Pennacchio and Rubin 2001; Caceres et al.2003;Cooper and Sidow 2003;Gu JY and Gu X 2003;No-brega et al. 2003;Wray et al. 2003; Khaitovich et al. 2004;Uddin et al. 2004;Heissig et al. 2005;Kleinjan and van Hey-ningen 2005;Evans et al. 2006;Maston et al. 2006;Pennac-chio et al. 2006;King et al. 2007;Xie et al. 2007;Portin 2008;Visel et al. 2008;Wray and Babbitt 2008;Duret and Galtier

    2009;Mercer et al. 2009;Garfield and Wray 2010;Haygoodet al. 2010; Noonan and McCallion 2010; Xu et al. 2010).Recent analyses provide evidence of many thousands ofdistant-acting noncoding sequences, beyond the proximalregulatory sequences and thus are beginning to reveal thelarge-scale regulatory architecture of the genome. At thepresent time, however, identifying regulatory elements inthe human genome still remains a major challenge (Woolfeand Elgar 2008; Noonan and McCallion 2010).

    Genomewide studies have shown that human genomein transcription produces many thousands of regulatorynoncoding RNAs (Huttenhofer et al. 2005; Johnson et al.

    2005; Costa 2007; Kapranov et al. 2007; Amaral et al.2008; Mattick 2009; Mercer et al. 2009; Van Bakel andHughes 2009;Taft et al. 2010;Xu et al. 2010). Transcriptomestudies indicated that a large proportion of transcriptiontakes place outside of the known gene boundaries (Merceret al. 2009;Van Bakel and Hughes 2009). Recent results ofXu et al. (2010)showed that close to 40% of all transcriptsexpressed in the human brain map are within repetitiveelements, whereas ,10% of the human brain transcrip-tome corresponds to nonrepetitive intergenic regions.

    Morphological and anatomical differences between hu-mans and chimpanzees are mainly caused by differencesin the regulation of function of genes arising from noncoding

    sequences (King and Wilson 1975;Bush and Lahn 2008), andthe chromosomal regions in man, that are gene-poor, harborgene regulatory elements that have the ability to modulategene expression over very long distances (Lettice et al. 2002).

    A possible functional role of noncoding sequences andin particular of repeats has been much discussed. It is ofimportance to investigate the role of repeat sequencesin the regulatory mechanism, related to a general concept

    that the regulatory system of genomes is encoded in net-works of repetitive sequence relationships (Britten andKohne 1968; Britten and Davidson 1969, 1971; Davidsonand Britten 1979). In this sense, the repetitive componentsmay play a major architectonic role in higher order physicalstructuring. It was argued that in this way one could thinkabout genomes as information storage systems with paral-lels to electronic information storage systems. From sucha perspective, repetitive DNA sequences are importantcomponents of genomes, required for formatting codinginformation so that it can be accurately expressed andfor formatting DNA molecules for transmission to new gen-

    erations of cells. The cooperative nature of proteinDNAinteractions provides another fundamental reason why re-peated sequence elements are essential to format genomicDNA (Shapiro and von Sternberg 2005). This was accompa-nied by observation that tandem arrays are often the regionsthat vary most between related taxi (Shapiro and von Stern-berg 2005). A probable importance of noncoding changes inthe evolution of human traits was underscored, particularlyfor cognitive traits (Haygood et al. 2010). Recent studies arebeginning to reveal the large-scale regulatory architecture ofthe human genome (Noonan and McCallion 2010).

    Neuroblastoma BreakPoint Family GenesThe recently described gene family neuroblastoma break-point family (NBPF) with an intricate genomic organizationwas expanded during recent primate evolution. There areindications that they are brain-related genes: At least one ofthem might suppress the development of neuroblastomaand possibly of other tumor types (Vandepoele et al. 2005,2008, 2009; Gregory et al. 2006; Popesco et al. 2006;Vandepoele and van Roy 2007; Diskin et al. 2009).

    Previous studies of genomic sequences of humanNBPFgenes using standard bioinformatics tools (Altschul et al.1990;Benson 1999;Gelfand et al. 2007) have reported tan-dem repeats in these genes, so-called NBPFrepeats, based

    on diverging ;1.51.6-kb repeat unit (Vandepoele et al.2005; Gregory et al. 2006; Gelfand et al. 2007; Warburtonet al. 2008). Several reports have shown that the copy num-ber of these genes is variable in humans (Tuzun et al. 2005;Redon et al. 2006). A remarkable reduction ofNBPFcopynumber with respect to the human genome was found inother primates, and in mouse, the NBPFcopies are absent(Vandepoele et al. 2005, 2009;Popesco et al. 2006).

    Higher Order RepeatsHigher order repeats (HORs) have been extensivelystudied in human and nonhuman primate chromosomes(Warburton and Willard 1996). The best known prototypes

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1878

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    3/16

    of HORs are alpha satellite (alphoid) arrays, located in thecentromeric region of all human chromosomes (Manuelidis1978; Wu and Manuelidis 1980; Willard 1985; Tyler-

    Smith and Brown 1987; Willard and Waye 1987a, 1987b;Warburton and Willard 1996; Jurka 2000; Alexandrovet al. 2001; Rosandic et al. 2003, 2006; Rudd and Willard2004; Jurka et al. 2005; Paar et al. 2005, 2007; Warburtonet al. 2008). Alpha satellite arrays consist of primary repeatunits (alphoid monomers) of;171 bp, tandemly arrangedin a head-to-tail fashion. Individual alphoid monomers di-verge by 2040% from each other. However, some stretchesof alpha satellites are hierarchically organized into HORs, sec-ondary repeat units with highly convergent HOR copies (di-vergencebetweenHORcopies,5%)(Warburton andWillard1996). In that case, divergence between HORcopies is muchsmaller than divergence between monomers within each

    HOR copy.Figure 1schematically describes the overall con-cept of HORs for an illustrative case of 11mer HOR. Most ofthe reported HORs are based on relatively short primary re-peat units, like 171-bp alpha satellite or less. The longest pri-mary repeat unit was reported by Warburton et al. (2008) asa ;3.3-kb repeat forming 18-kb HORs.

    Highly homogeneous arrays of HOR alpha satellitemonomers are relatively recent additions to human ge-nome (Warburton and Willard 1996; Alexandrov et al.2001). An explanation for generating HORs involves un-equal crossing-over between misaligned HOR units alignedon the register of homologous monomers. Unequal cross-

    ing-over, restricted to tandem sequences, explains the gen-eration and local homogenization of HOR units andaccounts for large size variation among HORs on homol-ogous chromosomes (Southern 1975;Smith 1976;Willardand Waye 1987b;Warburton and Willard 1996;Alkan et al.2004;Rudd et al. 2006). By the process of unequal crossing-over, HORs enable rapid evolutionary development.

    Global Repeat Map Analysis ofHumanChimpanzee DivergenceHere, we use our novel robust bioinformatics tool, theGlobal Repeat Map (GRM [see Methods], for identificationof tandem repeats and HORs in human and chimpanzee

    chromosome 1. In previous investigations, short stretchesof human accelerated sequences (up to several hundreds ofbase pairs; HARs) have been identified, mostly correspond-ing to nonhuman multispecies conserved regions. Here, wesearch for HARs containing large tandem repeat and HORunits. We are asking whether a more complex genomic pat-tern, involving long-range correlations, is accelerated inbrain-related human genes that could characterize the hu-

    man brain evolution. As a case study, we investigate com-putationally the genomic sequence of chromosome 1(National Center for Biotechnology Information [NCBI]Build 36.3 assembly) that contains the NBPFgene family.Here, we discover HORs within NBPF repeats in humanchromosome 1, whereas we find no HOR in arrays ofNBPFrepeats in chimpanzee chromosome 1. This is the first timethat HORs are found fully embedded within a gene. Fur-thermore, these HORs are based on an extraordinary largeprimary repeat unit. Besides the novel NBPFHOR pattern,highly accelerated in human chromosome 1 (in chimpan-zee chromosome 1 this HOR pattern is missing), we find two

    additional human HORs (one of them novel) and severaltandem repeats with large repeat units, all showing signifi-cant humanchimpanzee divergence, that is, the human ac-celerated pattern. We hypothesize on a possible role of thisacceleration as possibly important components in the geneexpression multilayered regulatory network.

    Materials and Methods

    Key String AlgorithmIn spite of powerful standard computational tools in bio-informatics, there are still difficulties to identify and analyzelong repeat units. For example, the Tandem Repeat Finder

    can identify tandem repeat units up to 2 kb (Benson 1999;Warburton et al. 2008). Here, we use a new robust ap-proach, useful in particular for investigating very longand/or complex repeats. The Key String Algorithm(KSA) framework (Rosandicet al. 2003, 2006; Paar et al.2005,2007) is based on the use of a short sequence of nu-cleotides, referred to as key string, which cuts a given ge-nomic sequence at each location where the key stringappears within a given genomic sequence. The ensuingKSA fragments form the KSA length array that could becompared with an array of lengths of restriction fragmentsresulting from hypothetical complete digestion cutting ge-

    nomic sequence at recognition sites corresponding to theKSA key string.

    Global Repeat MapThe GRM algorithm is an extension of KSA framework andconsists of the following steps: 1) GRM-Total module: com-putes the frequency versus fragment length distribution fora given genomic sequence by superposing results of con-secutive KSA segmentations computed for an ensemble ofall 8-bp key strings (48 5 65,536 key strings) (Paar et al.2007). In the GRM diagram, each pronounced peak corre-sponds to one or more repeats at that length, tandem ordispersed. GRM computation is fast and can be easily

    FIG. 1 Schematic illustrating of 11mer HOR. Seven HOR copies are

    denoted h1, h2,. . .,h7 and 11 constituent primary repeat monomers

    m1, m2,. . .,m11. Divergence div (hi, hj) between HOR copies is

    sizably smaller than divergence div(mk, ml) between primary repeat

    monomers within each HOR copy. Monomers within each HOR

    copy can significantly diverge from each other, whereas divergence

    between HOR copies is sizably smaller. For a particular case of

    ;171-bp alpha satellite monomers in chromosome 1, monomers

    can be diverged ;35% from each other, whereas the HORs are

    highly similar (mutual divergence among HOR copies ,5%).

    Intragene HORs Distinguish Humans from Chimpanzees doi:10.1093/molbev/msr009 MBE

    1879

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    4/16

    executed for a human chromosome using PC. 2) GRM-Dom module: determines the dominant key string corre-sponding to the fragment length for each peak in theGRM diagram from step 1. An 8-bp key string (or a groupof 8-bp key strings) that gives the largest frequency fora fragment length under consideration is referred to asa dominant key string. 3) GRM-Seg module: performs seg-mentation of a given genomic sequence into KSA frag-ments using dominant key string from the step 2. Anyperiodic segment within the KSA length array revealsthe location of repeats and provides genomic sequences

    of the corresponding repeat copies. 4) GRM-Cons module:aligns all sequences of repeat copies from step 3 and con-structs consensus sequence. 5) NW module: computes di-vergence between each repeat copy from step 3 andconsensus sequence from step 4 using the NeedlemanWunsch (Needleman and Wunsch 1970) algorithm.

    Diagram outlining the main steps in the GRM process isshown infigure 2. Code for GRM modules is available uponthe request to the authors.

    Regarding the 8-bp choice of the key string size, using anensemble of allr-bp key strings, the average length of KSAfragments is ;4r. With increasing length of key strings, theoverall frequency of large fragment lengths increases. For an

    ensemble of all 8-bp key strings, from computed GRM dia-grams, we can identify the primary and secondary repeatunits as large as 100 kb.

    Characteristics of GRM are 1) robustness with respect todeviations from perfect repeats, that is, substitutions, inser-tions, and deletions; 2) straightforward and parameter-freeidentification of simple repeats (tandem and dispersed),applicable to very large repeat units (as large as several kilo-bases); 3) straightforward identification of HORs (second-ary repeats) for very large primary and/or secondary repeatunits; 4) straightforward determination of consensus

    lengths and consensus sequences for simple repeats andHORs; and 5) modest scope of computation using a PC.

    The GRM method is a straightforward method to pro-vide a GRM in a single diagram identifying all pronouncedrepeats in a given sequence, without any prior knowledgeof the sequences structure. Once the size of the repeat isdetermined, GRM provides in a straightforward way thelocation of the corresponding repeat arrays and their pre-cise analysis. GRM is particularly useful for precise sequenceanalysis because the method does not involve any averag-ing procedure. It is also useful that the method is ratherrobust with respect to sizeable substitutions and indels.Once the consensus repeat unit is determined using

    FIG. 2 Diagrammatic outline of five steps in the process of applying the GRM algorithm.

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1880

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    5/16

    A B

    C D

    E F

    FIG. 3 GRM diagrams for human and chimpanzee chromosome 1 assemblies, Build 36.3 and Build 2.1, respectively. Horizontal axis: fragment

    length (length of repeat unit); vertical axis: frequency of appearance. Upper panel: human GRM diagram; lower panel: chimpanzee GRM

    diagram. (AD) GRM for the whole chromosome (except some smaller segments in inserts). (A) Insert of enlarged segments of human and

    chimpanzee GRM diagrams (fragment lengths 160180 bp) for contigs NT_077389.3 and NW_001230099.1, respectively (peak 171 bpupper

    Intragene HORs Distinguish Humans from Chimpanzees doi:10.1093/molbev/msr009 MBE

    1881

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    6/16

    GRM, in the next step it could be well combined with basiclocal alignment search tool (Blast) search for dispersedunits or their fragments. For very large repeat units, Tan-dem Repeat Finder has limitations, whereas GRM has nosuch size limitations.

    Results and Discussion

    GRM Diagrams and Repeat Patterns for Human andChimpanzee Chromosomes 1We compute the GRM diagrams for human chromosome 1(Build 36.3) and chimpanzee chromosome 1 (Build 2.1) (fig.3). The GRM analysis of pronounced peaks leads to theidentification of pronounced repeats with large repeatunits (table 1,supplementary tables S1andS2,Supplemen-tary Materialonline).

    Of particular interest are three human HORs that distin-guish humans from chimpanzees: 1) human 1,410-bp36mer HOR based on 39-bp primary repeat unit, withouta chimpanzee counterpart; 2) human 1,866-bp 11mer

    alphoid HOR based on ;171-bp alpha satellite primaryrepeat unit, with a chimpanzee counterpart 340-bp2mer alphoid HOR; and 3) human ;4,770-bp 3merHOR based on ;1.6-kb primary repeat unit, without thechimpanzee counterpart.

    Furthermore, using GRM, we find eight tandem repeatunits and some peaks that correspond to dispersed repeats(Alu,long interspersednuclear elements [LINE]). Intable 1,we compare human and chimpanzee GRM results and pre-vious identification of repeats in human chromosome 1(Warburton et al. 2008). We find a number of GRM peakscorresponding to novel tandem repeat units. There is a sig-

    nificant difference in the repeat pattern of human andchimpanzee chromosomes 1 (fig. 3). The lengths of se-quenced chromosome 1 assemblies of human Build 36.3and chimpanzee Build 2.1 are similar, which is reflectedin a similar level of noise in the upper and lower panelsoffigure 3.

    Human-Specific 1,410-bp Quartic HOR RepeatEmbedded in Human Hornerin GeneIn the GRM diagram for human chromosome 1 (fig. 3), weidentify five copies of the 1,410-bp HOR repeat unit (startposition in chromosome 1: 150452728, start position incontig NT_00487.18: 2676458) and determine the consen-

    sus HOR unit (supplementary table S3A, SupplementaryMaterial online). The average divergence of HOR copieswith respect to consensus is ;4%. (These five HOR copiesappear as two subsets: average divergence among the firstthree copies is;1%, whereas for pairs involving one or two

    of the remaining two copies, the divergence is sizably larger,712%.)

    The GRM diagram of 1,410-bp HOR consensus sequence(fig. 3F) reveals its internal repeat structure. First, there isa series of equidistant peaks at multiples of 39 bp ( n 39bp,n 5 1, 2, 3. . .). This shows that the 39-bp monomer isa primary repeat unit for the 1,410-bp HOR unit (pro-nounced peak infig. 3F). The consensus sequence of the

    39-bp primary repeat unit is displayed in supplementarytable S3A (Supplementary Material online). Among thepeaks at multiples of 39 bp infigure 3F, the strongest peakis at ;0.7 kb, and the second strongest is at ;0.35 kb.

    This reveals three levels of HOR organization: nine 39-bpprimary repeat units are organized into ;0.35-kb second-ary repeat unit, two ;0.35-kb secondary repeat units areorganized into ;0.7-kb tertiary HOR repeat unit, and fi-nally, two ;0.7-kb tertiary repeat units are organized intothe ;1.4-kb HOR quartic repeat units. Thus, the 1,410-bprepeat unit represents the quartic HOR repeat unit (fig. 4).

    By using the consensus sequence, we find that the av-

    erage divergence between neighboring copies is graduallydecreasing with increasing level of HOR organization, from;32% between 39-bp copies (first levelprimary repeatunit) to ;4% between 1,410-bp HOR copies (fourthlevelquartic repeat unit) (see fig. 4). Such hierarchy ofdivergences is a signature of HOR organization.

    Tandem repeats are of biological interest because theyrepresent a rapidly evolving type of DNA sequences thatcan show profound differences even between closely re-lated species, as are humans and chimpanzees, and maycontribute to phenotypic differences. In the present caseof 36mer HOR, the whole HOR array is embedded withinthe human hornerin (HRNR) gene that produces hornerin

    protein, expressed in regenerating and psoriatic skin(Takaishi et al. 2005). Comparing with positions within ge-nomic sequence in the NCBI database, this HOR array ispositioned within one lengthy exon inHRNR gene (fig. 5).

    Using the structure of human hornerin protein,Takaishiet al. (2005)have deduced the corresponding amino acidsequence. The repetitive region was divided into segments.The smallest units of;39-bp amino acids showed a mod-erate homology to each other. However, Takaishi et al.(2005) concluded that the analysis was compatible withthe notion that unit of;39 bp was at first amplified 4-foldand then triplicated to form three segments in tandem,

    these being further amplified 6-fold, implying a {[(39 bp) 4] 3} 6 5 2.8-kb organization, which differs from theHOR organization {[(39 bp) 9] 2} 2 5 1.4 kb foundhere applying GRM to the Build 36.3 genomic assembly. Inthe corresponding GRM diagram (fig. 3E), we clearly see

    panel and peaks 169 and 171 bplower panel). (B) Upper panel: peak at 1,866 bp: 11mer alphoid HOR in the centromeric region. Peak at 2,241

    bp: tandem of 17 2,241-bp copies (divergence with respect to consensus ,1%). Peak at 1,410 bp: tandem of five copies. Insert of enlarged

    segment of chimpanzee GRM diagram (fragment lengths 9001,100 bp) for contig NW_001229599.1. (C): Peaks at 4,770 bp and 3,183, 3,193 bp

    correspond to 3mer HOR and its 2mer subset, respectively (see the text). Peak at 3,731 bp: tandem of two copies (consensus length 3,731 bp,

    divergence 0.01%). Peak at 7,380: tandem of four copies (divergence,0.5%). (E) GRM diagram for genomic segment 02.7 Mb in human contig

    NT_113799.1. (F) GRM diagram for 1,410-bp human consensus HOR.

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1882

    http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1
  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    7/16

    Table 1 Large Repeat Units in Human and Chimpanzee Chromosome 1: GRM Computation and Previous Identification for Humans and GRM Com

    GRM Human Previous Identificationa Human

    135 bp, Signature of Alu

    166 bp, Signature of Alu

    ;171 bp, PRU;171 bp (alpha satellite, div. ;20%) ;171 bp

    ;310 bp, Signature of Alus in tandem

    342 bp, 2 3 171 PRU 342 bp, 2 3171 PRU

    513 bp, 3 3 171 PRU 513 bp, 3 3171 PRU

    684 bp, 4 3 171 PRU 684 bp, 4 3 171PRU

    972 bp, PRU 5 972 (TA, 11 copies, div. ;5%) 972, PRU 5 972 bp (TA, 11 copies)

    1,410 bp, HOR 36mer (TA, 5.5 copies, div. ;3%) (QRU;1,410 bp,

    TRU;0.7 kb, SRU;0.35 kb, PRU39 bp)

    ;1.4 kb (TA, 5.6 copies)

    ;1,600 bp, PRU;1.6 kb (5 families, 165 copies in 13 arrays) ;1.5 kb (84 copies in 3 arrays, div.

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    8/16

    GRM peaks at 39, ;0.35, ; 0.7, and ; 1.4-kb (table 2), inaccordance with our annotation of 1.4-kb quartic HOR,

    whereas there is no peak at 2.8 kb predicted byTakaishiet al. (2005).

    By using Tandem Repeat Finder, a tandem repeat unit of1.4 kb was previously detected (Warburton et al. 2008), butthe HOR structure was not found.

    Because the 1.4-kb repeat is within a coding region (fig.5), it is of evolutionary interest to compare the resultantnucleotide structure of this highly repetitive gene, thatis, how variation between repeat copies affects the proteinstructure. Using the EMBOSS Transeq code (Williams 2002)for translating nucleotide into protein sequence, we findthat the variation within the genomic sequence mostlycauses nonsynonymous changes, predicting a variable pro-tein domain structure (see supplementary table S3B,Supplementary Materialonline).

    In chimpanzee GRM diagram, we find no counterpartsof 1.4-kb human HOR.

    Human 11mer and Chimpanzee 2mer AlphoidHORsFor 11mer alphoid HOR in chromosome 1 (Waye et al.1987), the consensus length obtained using the GRM algo-rithm is 1,866 bp (fig. 3), in accordance with the precisevalue from previous bioinformatics studies (Paar et al.2005;Rosandicet al. 2006;Warburton et al. 2008). The array

    of alpha satellite monomers constituting HOR gives rise to

    GRM peaks at multiples of alpha satellite monomer length171:171 bp, 2 171 5 342 bp, 3 171 5 513 bp,. . .with

    decreasing frequency (fig. 3). The next higher alphoidGRM fragment length corresponds to a tandem oftwo 11mer HOR copies, that is, 2 1,866 bp 5 3,732bp (table 1).

    For the chimpanzee chromosome 1, we find a differentalphoid HOR pattern: Instead of the 11mer HOR, we obtainthe 2mer HOR. This 2mer HOR is based on two alpha sat-ellite monomers of the lengths 169 and 171 bp (mutualdivergence;25%). Thus, the length of the 2mer HOR unitis 340 bp; this is the length of a peak in the chimpanzeeGRM diagram. Accordingly, the next pronounced alphoidGRM fragment length corresponds to a tandem of two2mer HOR copies (mutual divergence ,1%), that is,2 340 5 680 bp, whereas there is no pronounced peakat;3 1715 513 bp. We compare the alphoid HOR unitsof human chromosome 1 (11mer) and chimpanzee chro-mosome 1 (2mer) (supplementary table S4,SupplementaryMaterialonline); the average divergence between the 169-and 171-bp chimpanzee monomers and m01m11 humanmonomers is 27%.

    Thepresentresultfor2merHORinchimpanzeeisinaccor-dance with previous observations that some early primatealpha satellites have a dimeric organization (Alexandrovet al. 2001; Schueler andSullivan 2006). Itis not surprisingthatthe alphoid HORs from human and chimpanzee chromo-

    somes 1 are different, because different alphoid HORs can

    FIG. 5 Human 1,410-bp HOR (five copies) embedded within an exon in the human hornerin gene. Upper strip: tandem array of five HOR copies

    w01, w02, w03, w04, and w05 (start and end positions within chromosome 1 are given). Lower strip: exon (shadowed) and intron (blank)

    positions in Human Hornerine Gene. HOR array is fully embedded within one exon.

    FIG. 4 Schematic illustrating hierarchical structure of 1,410-bp quartic HOR. Nine 39-bp primary repeat units denoted m01,. . .,m09 (divergence

    ;30%) (bottom) form a ;0.35-kb secondary repeat unit; two ;0.35-kb secondary repeat units denoted n01 and n02 (divergence;21%) form

    a ;0.7-kb tertiary repeat unit; two ;0.7-kb tertiary repeat units denoted v01 and v02 (divergence;14%) form ;1.4-kb quartic HOR repeat

    unit (HOR copies average divergence ;4%) (top). In chimpanzee chromosome 1, such HOR organization is absent.

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1884

    http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1
  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    9/16

    be found on homologous chromosomes on these species,representing a rapid expansion since speciation.

    Human-Specific 4,770-bp 3mer HOR inNBPFGenesin NT_113799.1Performing GRM computation for Build 36.3 genomic as-sembly of human chromosome 1, we obtain an interestingnew result: The two GRM peaks, at 4,770 and 3,193 bp, cor-respond to HORs based on the ;1.6-kb primary repeat

    unit in the NBPF genes. The largest contribution to the4,770-bp peak arises from the contig NT_113799.1 in chro-mosome 1. The GRM diagram for the 4,770-bp consensussequence shows two peaks, at;1.6 and;3.2 kb (fig. 6A). Ina further step, the GRM diagram computed for the 3.2-kbpeak shows only one pronounced peak, at;1.6-kb (fig. 6B).This shows that the ;1.6-kb peak corresponds to the pri-mary repeat unit for the 4,770- and 3,193-bp peaks. UsingGRM, we determine a dominant key string CACTGACC forsegmentation of the contig NT_113799.1 into a maximumnumber of the;1.6-kb fragments. Using this key string, weobtain an array of KSA fragment lengths (supplementary ta-ble S5,Supplementary Materialonline). Aligning the resulting

    length arrays, we see that the three ;1.6-kb fragments, rep-resenting monomers, form a tandemly repeating 3mer: Itconsists of three ;1.6-kb NBPF monomers, belonging tothree monomer families, with consensus lengths 1,623,1,593, and 1,554 bp, respectively. Consensus length for eachfamily is determined as the most frequent length for primaryrepeat monomers belonging to that family. Explicit relationsbetween consensus sequence and copies are illustrated inSupplementary figure S13, Supplementary Material online.The corresponding basic consensus monomers are denotedas m01, m02, and m03, respectively (see consensus sequencesinsupplementary table S6,Supplementary Materialonline).

    Finally, the computed GRM diagram for each of three ;1.6-kb basic monomers is without any pronounced peak, reflect-ing its monomer character, that is, the absence of any internalrepeat structure.

    The broadened peak consisting of a cluster of scattered;1.6-kb lengths in the human GRM diagram (fig. 3B) is dueto cases with the presence of the same key string in bothm01 and m02 and/or in both m02 and m03 within the3mer m01m02m03. A peak at ;3.2 kb appears in the casewhen the same key string is present in m01 and m03 butnot in m02. Table 3 shows the detailed NBPFmonomerstructure of the 3mer HOR. It is seen that the m03 mono-mer is deleted from 3mer HOR copies no. 68. This mono-

    mer deletion, resulting in the appearance ofNBPFdimerswithin theNBPFarray, contributes to the ;3.2-kb peak inthe GRM diagram.

    Divergence between the three consensus monomers is19% (m01 vs. m02), 15% (m01 vs. m03) and 20% (m02 vs.m03). On the other hand, the average divergence betweenthe 3mer HOR copies is mostly ,0.5%, that is, the 3mercopies are highly identical. This substantially smaller diver-gence between HOR copies than between constitutingmonomers is a signature of well-developed HOR pattern

    (see Introduction). Thus, the 4,770-bp repeat unit corre-sponds to strongly convergent HORs based on three di-verging monomers m01, m02, and m03. This HOR arrayconsists of a tandem of 17 highly identical HOR copies(in the interval in chromosome 1: 146618386146694662).(We find noNBPFmonomers outside the HOR array.) ThisHOR is fully embedded within the NBPF20 gene(NM_001037675.2). (Note that the Build 36.3 sequenceis reverse complement to the sequence of NBPF20 genefrom NCBI database.) All 17 NBPFHOR copies (horizontalstretches) are displayed by aligning the constituent NBPFmonomers and the corresponding regions of exons are

    shown below HOR copies (vertical bars; fig. 7).An example of divergence between segments at position

    of exons in the 15th and 16th HOR copies and between thecorresponding protein sequences is shown insupplemen-tary table S9(Supplementary Materialonline).

    The 4,770-bp 3mer HOR was not reported previously.However, the self-similarity dot plot of the NBPF regioncan also detect the presence of this HOR.

    Human-Specific 4,770-bp 3mer HOR inNBPFGenesin other Contigs in Chromosome 1We perform a similar search and analysis of the 4,770-bp3mer HORs in other contigs in human chromosome 1

    FIG. 6 (A) GRM diagram of the 4,770-bp 3mer HOR consensus

    sequence (m01m02m03). (B) GRM diagram of the 3.2-kb 2mer HOR

    consensus sequence (one monomer from 3mer HOR is missing).

    Table 2 Scheme of Quartic HOR Organization Based on 39-bp

    Primary Repeat Unit Embedded within Human Hornerin Gene in

    Human Chromosome 1.

    Repeat Unit Structure Length

    Primary Monomer 39 bp

    Secondary 9 3 39 bp ;0.35 kb

    Tertiary 2 3 0.35 kb ;0.7 kb

    Quartic 2 3 0.7 kb ;1.4 kb

    Intragene HORs Distinguish Humans from Chimpanzees doi:10.1093/molbev/msr009 MBE

    1885

    http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1
  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    10/16

    (supplementary table S7, Supplementary Material on-line). Most of the NBPFrepeats in other contigs are or-ganized in HORs; at the edges ofNBPFregions, increaseddivergence and/or truncation appears in HOR copies.HOR arrays or their segments are present in 14 distinctarrays in 10 contigs (supplementary table S8,SupplementaryMaterial online). Only three NBPFHOR tandem arrays areof significant size, located in contigs NT_113799.1,

    NT_079497.3, and NT_004434.18: They contain 17, 16,and 12 NBPF HOR copies, respectively (fig. 8). All threeHOR arrays are highly similar: The average divergence be-tween 17 HOR copies in NT_113799.1 and 16 HOR copiesin NT_079497.3 is 0.6%, and between 17 HOR copies inNT_113799.1 and 12 HOR copies in NT_004434.18, the av-erage divergence is 2%.

    Absence ofNBPFHORs in Nonhuman PrimatesTo search forNBPFHORs in nonhuman primate genomes,we used human NBPF consensus monomers m01, m02,m03 (supplementary table S6,Supplementary Materialon-

    line) for Blast (Altschul et al. 1990) search against the chim-panzee, orangutan, and rhesus macaque genomicsequences (supplementary tables S10S12,SupplementaryMaterialonline). The arrays ofNBPF repeats in chimpan-zees and other nonhuman primates are not organized intoHORs. This is shown by the absence of pronounced GRMpeaks at ;4,770 bp (fig. 3).

    Human and Chimpanzee GRM Diagrams andTandem Repeats without Higher OrderOrganizationFor most of monomeric tandem repeats, there is a sizeabledifference between human and chimpanzee chromosomes

    1 (seetable 1). For example, pronounced repeat units at2,241, 7,380, and 18,555 bp in human chromosome 1 areabsent in chimpanzee chromosome 1. On the other hand,a sizeable 9,734-bp GRM peak in chimpanzee chromosome1 is absent in human chromosome 1.

    The exception is, for example, a tandem based on the972-bp repeat unit that is present in both humans andchimpanzees. The corresponding chimpanzee peak in anoverall GRM diagram of the whole chromosome 1 is weak,almost immersed into noise (weak peak in the lower paneloffig. 3B). To reduce the noise, we additionally computethe GRM diagram for the contig NW_001229599.1 that

    contains tandem array based on the 972-bp repeat unit.

    FIG. 7Aligned human 4,770-bp 3mer HOR copies in humanNBPF20

    gene from contig NT_113799.1. Each horizontal stretch represents

    a 3mer HOR copy with its three constituting monomers m01, m02,

    and m03. Below a stretch the corresponding regions of exons in the

    NBPF gene are shown (vertical bars). Exons reported in the NCBI

    database are denoted a, b,. . ., h. The segments from Build 36.3

    genomic sequence that are .98% identical to the sequences

    corresponding to exons, but belong to NCBI introns, are denoted a,

    b,. . .,h. Note that the Build 36.3 sequence is reverse complement

    with respect to the NBPF20 gene from NCBI. For each HOR copy,

    the start position is shown (0 is assigned to the start of the first

    HOR copy). (Position zero corresponds to the position 146618386 in

    chromosome 1.) The NBPF gene starts at the position of the first

    HOR copy (top) and extends beyond the last HOR copy in the

    figure (bottom). Between the end of the last HOR copy and the end

    ofNBPF20gene, there are a number of additional exons (not shown

    in figure).

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1886

    http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1
  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    11/16

    Table 3 NBPFMonomer Repeat Structure of 3mer HORs in Contig NT_113799.1 in Chromosome 1.

    Consensus MonomeraCh1 Start

    Position

    Contig Start

    Position

    Monomer

    Lengthb (bp)

    Monomer

    Divergencec (%)

    HOR

    Lengthd (bp)

    HOR

    Copy No.eHOR

    Divergencef (%)

    m01 146618386 75723 1,608 8.9

    m02 146619994 77331 1,587 0.7

    m03146621581 78918 1,554 0.1 4,749 1 3,3

    m01 146623135 80472 1,623 0.1

    m02 146624758 82095 1,587 0.6

    m03 146626345 83682 1,554 0.1 4,764 2 0,3m01 146627899 85236 1,623 0.1

    m02 146629522 86859 1,583 0.8

    m03146631105 88442 1,554 0.1 4,760 3 0,3

    m01 146632659 89996 1,623 0

    m02 146634282 91619 1,595 0.2

    m03146635877 93214 1,554 0.1 4,772 4 0,1

    m01 146637431 94768 1,625 0.3

    m02 146639056 96393 1,593 0

    m03146640649 97986 1,546 0.8 4,764 5 0,4

    m01 146642195 99532 1,625 0.1

    m02g146643820 101157 1591 0.2 3,216 6 0,1

    m13h 146645411 102748 1,600 0.1

    m02

    g

    146647011 104348 1,593 0.1 3,193 7 0,1m13h 146648604 105941 1,600 0

    m02g146650204 107541 1,593 0.1 3,193 8 0

    m13h 146651797 109134 1,600 0

    m02 146653397 110734 1,593 0.2

    m03146654990 112327 1,554 0.1 4,747 9 0,1

    m01 146656544 113881 1,623 0.1

    m02 146658167 115504 1,593 0.2

    m03146659760 117097 1,554 0.1 4,770 10 0,1

    m01 146661314 118651 1,623 0.1

    m02 146662937 120274 1,593 0.2

    m03146664530 121867 1,554 0.1 4,770 11 0,1

    m01 146666084 123421 1,623 0.1

    m02 146667707 125044 1,583 0.8m03

    146669290 126627 1,554 0.1 4,760 12 0,3

    m01 146670844 128181 1,623 0.1

    m02 146672467 129804 1,593 0.3

    m03146674060 131397 1,558 0.3 4,774 13 0,2

    m01 146675618 132955 1,623 0.1

    m02 146677241 134578 1,585 0.7

    m03146678826 136163 1,554 0 4,762 14 0,3

    m01 146680380 137717 1,623 0.1

    m02 146682003 139340 1,591 0.2

    m03146683594 140931 1,558 0.3 4,772 15 0,2

    m01 146685152 142489 1,623 0.1

    m02 146686775 144112 1,571 1.5

    m03146688346 145683 1,554 0.1 4,748 16 0,6

    m01 146689900 147237 1,623 0.1

    m02 146691523 148860 1,593 0.1

    m03 146693116 150453 1,546 1 4,762 17 0,4

    a Consensus monomers m01 (1,623 bp), m02 (1,593 bp), and m03 (1,554 bp) (see supplementary table S2, Supplementary Materialonline).b Length of monomer copy.c Divergence of monomer copy with respect to consensus monomer.d Length of HOR copy (consensus length 4,770 bp).e Ordinal number of HOR copy in tandem array.f Divergence of HOR copy with respect to consensus HOR.g Monomer m03 in HOR copies No. 68 is missing.h Monomer m01 in HOR copies no. 79 is substituted by hybrid m13 (1,600 bp). Divergence of m13 with respect to consensus monomers m01, m02, and m03 is 7%,22%, and 9%, respectively. The m13 sequence exhibits a specific hybrid structure: the first 845 bases in m13 are identical to the first 841 bases from m03 (additionally,a 4-bp insertion AGAG is inserted into m03 after position 581), whereas the last 821-bp segment in m13 is identical to the last 821-bp segment from m01. Divergencebetween m13 and m01 (7%) is smaller than divergence between m01 and m03 but larger than the mutual divergence between different copies of m01 or of m03 (,0.5%).Considering the three 1,600-bp copies as belonging to the m01 family, divergence of the corresponding three HOR copies with respect to consensus HOR is still ,5%,whereas for the m13 classification of these three monomer copies, divergence between HOR copies is reduced to 0.5%.

    Intragene HORs Distinguish Humans from Chimpanzees doi:10.1093/molbev/msr009 MBE

    1887

    http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1
  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    12/16

    In this case with a shorter sequence, the noise level at theposition of the 972-bp peak is reduced (as seen from insertin the lower panel in fig. 3B).

    For short repeat units ,20 bp, the corresponding hu-man and chimpanzee GRM peaks are similar. In the lengthinterval from 20 to 100 bp, the most pronounced peak inthe human GRM diagram is at 39 and 40 bp, and in chim-panzee, it is at 32 bp. We find weaker peaks at multiples of

    39 bp (78, 117 bp,. . .

    ) and 40 bp (80, 120 bp,. . .

    ) in humanand at multiples of 32 bp (64, 96 bp, . . .) in chimpanzeeGRM diagrams. The human 39-bp repeat unit is the pri-mary repeat unit for the;1,410-bp HOR, and 40 bp is a tan-dem repeat unit of another array.

    GRM Signature of Alu SequencesThe pronounced GRM peaks at 135, 166, and ;310 bp inthe GRM diagram (fig. 3) are signature of Alu sequences.Human Alu elements are dimeric structures of about300 bp that comprise two similar but nonidentical mono-mers (Weiner et al. 1986; Britten et al. 1988; Jurka 1995,

    2004; Schmid 1996). The longer monomer (on the right)contains an insert of 31 nt that is absent in the shortermonomer (on the left). The two monomers are connectedby a short internal poly-A tract. The dimer sequence is fol-lowed by a long terminal poly-A tail of variable length. The135-bp GRM fragment length corresponds to a distancebetween the starts of the left and right monomers. The166-bp GRM fragment corresponds to the distance fromthe start of the segment of right monomer after the 31-bp insert and the start of a similar segment in the leftmonomer (thus, this distance is 135 31 bp 5 166bp). The ;310-bp length corresponds to a tandem of

    two Alu sequences; because of the variable length of thepoly-A tail, this GRM peak is broadened in the intervalof about a dozen base pairs, centered at ;310 bp.

    Statistical Significance of Peaks in the GRM DiagramAn important question can be raised about the statisticalsignificance of the peaks appearing in the GRM diagram,because they arise by applying an ensemble of a large num-

    ber (485 65,536) of key strings. We use three strong tests ofthis significance.

    First, we compute the corresponding GRM diagrams forartificial sequences, constructed using a random numbergenerator, of the same nucleotide composition as in thegenomic sequences under study. We compute the GRMdiagram for a pseudochromosome 1, constructed in sucha way (fig. 9). This randomly constructed sequence froma given number of nucleotides does not produce any sig-nificant peak above the level of noise in the GRM diagram.

    The second test of statistical significance is provided bythe steps of the GRM algorithm (see Methods): For each

    length at position of a peak, we determine the dominantkey string and determine the corresponding repeat arraygiving rise to this peak. In this way, we find explicitly thatto each GRM peak corresponds a concrete repeat pattern.

    The third test of statistical significance is the identifica-tion of previously known HORs in human chromosomes(e.g., alphoid 11mer HOR in chromosome 1, alphoid18mer HOR in chromosome 7, and so on). In this way,we identify GRM peaks for all previously known human al-phoid HORs. Additionally, for the Build 36.3 assembly of hu-man chromosome 1, we find peaks corresponding to allrepeats previously identified in human chromosome 1 usingstandard bioinformatics tools (Warburton et al. 2008) and

    some novel peaks reported for the first time in this work.

    Human AcceleratedNBPFandHRNRHORsComponents of Genomic RegulatoryNetwork?Previously, it was found that the number ofNBPFmonomerrepeats is related to the evolutionary level of higher pri-mates, showing a gradual increase with evolutionary devel-opment (Vandepoele et al. 2005,2009;Popesco et al. 2006),but the NBPFHOR pattern was not identified. Our novelGRM results for human chromosome 1 show that the;1.6-kb NBPF monomers, mutually diverging 2025%,

    FIG. 8 Schematic illustrating three NBPF 3mer HOR copies based on the ;1.6-bp monomers in human chromosome 1. In chimpanzee

    chromosome 1, this HOR organization is absent.

    FIG. 9 GRM diagram for pseudochromosome 1 generated by

    a random number generator.

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1888

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    13/16

    are in fact organized into highly identical 3mer HORs(4,770-bp consensus). With respect to previously knownHORs in the human genome, this HOR pattern is interest-ing in several aspects: 1) The repeat unit is much longer

    than most of the primary repeat units identified so far.2) Most of the NBPFrepeat monomers are organized into3mer HORs. 3) There are three distinct major NBPFHORarrays, having practically the same consensus sequence. 4)NBPFHOR is fully embedded within a gene. Because theNBPFgenes might suppress the development of neuroblas-toma and possibly other tumor types (Vandepoele et al.2005, 2008, 2009; Gregory et al. 2006; Popesco et al. 2006),there is an interesting question on a possible relation of tu-mor dynamics to this peculiar 4,770-bp HOR pattern. Com-parison of the appearance of NBPF monomers and HORcopies and of tandem repeats ofNBPFHOR copies in chro-mosome 1 of humans and some nonhuman vertebrates is

    shown intable 4. The tandem repeat of theNBPFHOR copypattern shows a discontinuous jump in the evolutionary stepfrom chimpanzee to the human genome: from total absenceof tandem repeats ofNBPFHOR copies in chimpanzees to 47tandem repeat HOR copies in humans. This human acceler-ated HOR pattern (referred to as HAHOR) is one of the fac-tors that distinguish humans from nonhuman primates.

    Another case of HOR being fully embedded withina gene is a 1.4-kb quartic HOR based on 39-bp primaryrepeat unit. This HOR is fully embedded within a singlelarge exon in the human hornerin gene (HRNR), relatedto human skin. It is of substantial interest evolutionary to

    compare the resultant amino acid structure of this highlyrepetitive gene.

    We show that the presence ofNBPFHOR and HRNRHORtandem repeats is an exclusively human specific. We hypoth-esize that this may be related to the possible regulatory roleof the HOR pattern. In both of these within-the gene HORs,we find no HOR counterparts in the chimpanzee genome:These HORs represent a human accelerated pattern. Itseems of interest to look systematically for higher order re-peats (HORs), particularly in or near brain-related and othergenes and for their possible regulatory function.

    More generally, it is an intriguing question related toa possible genomic regulatory network (GRN) with vari-

    ous components including known gene regulators and asyet hidden regulatory elements at different levels of co-ordination and hierarchical organization. In this sense,it is challenging whether some types of accelerated se-

    quence may be functional, acting dynamically withina network of genes, gene regulators, and higher orderGRN components.

    Small differences among repeat copies, even when thesecopies exhibit a high degree of mutual similarity, mighthave a significant impact on the function of the multilayerregulatory network (circuitry). Their influence on geneexpression may be direct, as in the case of some knownHAR regulators, or indirect along a hierarchy of possiblehigher level organization. In this connection, it shouldbe noted that growing evidence from the field of develop-mental genetics is supporting the hypothesis that small dif-ferences in the time of activation or in the level of activity

    of even a single gene could have important evolutionaryconsequences owing to the extensive consequences thatchanges in regulatory interactions could have on develop-mental processes (Garfield and Wray 2010). In general, onecould argue that because any vanishingly small input ap-plied at critical points in nonlinear dynamical systemsmight cause a finite response (Otto 1993;Hilborn 1994;Al-ligood et al. 1997;Zak et al. 1997), a system could be con-trolled by a microdynamical device that operates by signstrings, as a genetic code.

    Tandem repeats and HORs are the expected result ofrapid expansion, given models of unequal crossing-over.

    They are of interest in the framework of a large-scaleregulatory architecture of genome as a rapidly evolvingtype of DNA sequence that can show a profound differencebetween closely related species and may contribute tophenotypic differences. This may shed more light onhumanchimpanzee evolutionary divergence from thepoint of view of long-range higher order periodicity.

    Supplementary Material

    Supplementary tables S1S12 and Supplementary figureS13are available atMolecular Biology and Evolutiononline(http://www.mbe.oxfordjournals.org/).

    Table 4 Evolutionary Acceleration ofNBPFMonomer and HOR Pattern.

    Monomer Copiesa Total No. Of HOR Copiesb No. of Tandem HOR Copiesc

    Humand 165 57 47

    Chimpanzeee 48 14 0

    Orangutanf 17 7 0

    Rhesus macaqueg 7 2 0

    Mouseh 0 0 0

    a Total number ofNBPFmonomers.b

    Number of all NBPFHOR copies (both tandemly organized and dispersed).c Total number ofNBPF HOR copies organized in tandem repeats (two or more HOR copies in tandem array).d Genomic assembly NCBI Build 36.3.e Genomic assembly NCBI Build 2.1. Most of the contigs containing NBPF repeats are in chromosome 1, whereas eight monomers and two distinct HOR copies are incontigs not yet assigned to the chromosome.f Genomic assembly WUSTL Pongo_albelii-2.0.2. All contigs containing NBPF repeats are in chromosome 1.g Genomic assembly NCBI Build 1.1. Most of the contigs containing NBPFrepeats are in chromosome 1, whereas the one monomer is in a contig not assigned to thechromosome.h Genomic assembly NCBI Build 2.1.

    Intragene HORs Distinguish Humans from Chimpanzees doi:10.1093/molbev/msr009 MBE

    1889

    http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://www.mbe.oxfordjournals.org/http://www.mbe.oxfordjournals.org/http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1http://mbe.oxfordjournals.org/cgi/content/full/msr009/DC1
  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    14/16

    Acknowledgments

    We thank two anonymous reviewers for valuable com-ments and suggestions, contributing significantly to thequality of our manuscript. We also thank D. Ugarkovic, J.Brajkovic, and K. Vlahovicek for useful discussion.

    References

    Alexandrov IA, Kazakov A, Tumeneva I, Shepelev V, Yurov Y. 2001.Alpha-satellite DNA of primates: old and new families.

    Chromosoma 110:253266.

    Alkan C, Eichler EE, Bailey JA, Sahinalp SC, Tuzun E. 2004. The role of

    unequal crossover in alpha-satellite DNA evolution: a computa-

    tional analysis. J Comp Biol. 11:933944.

    Alligood KT, Sauer TD, Yorke JA. 1997. Chaos. An introduction to

    dynamical systems. New York: Springer.

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic

    local alignment search tool. J Mol Biol. 215:403410.

    Amadio JP, Walsh CA. 2006. Brain evolution and uniqueness in the

    human genome. Cell 126:10331035.

    Amaral PP, Dinger ME, Mercer TR, Mattick JS. 2008. The eukaryotic

    genome as an RNA machine. Science 319:17871789.

    Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR,Rubin EM, Kent WJ, Haussler D. 2006. A distal enhancer and an

    ultraconserved exon are derived from a novel retroposon.

    Nature 441:8790.

    Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS,

    Haussler D. 2004. Ultraconserved elements in the human

    genome. Science 304:13211325.

    Beniaminov A, Westhof E, Krol A. 2009. Distinctive structures

    between chimpanzee and human in a brain noncoding RNA.

    RNA Biol. 6:113121.

    Benson G. 1999. Tandem repeats finder: a program to analyze DNA

    sequences. Nucleic Acids Res. 27:573580.

    Bird CP, Stranger BE, Liu M, Thomas DJ, Ingle CE, Beazley C, Miller W,

    Hurles ME, Dermitzakis ET. 2007. Fast-evolving noncoding

    sequences in the human genome.Genome Biol. 8:R118.Britten RJ, Baron WF, Stout DB, Davidson EH. 1988. Sources and

    evolution of human Alu repeated sequences.Proc Natl Acad Sci

    U S A. 85:47704774.

    Britten RJ, Davidson EH. 1969. Gene regulation for higher cellsa

    theory. Science 165:349357.

    Britten RJ, Kohne DE. 1968. Repeated sequences in DNA. Science

    161:529540.

    Britten RJ, Davidson EH. 1971. Repetitive and nonrepetitive DNA

    sequences and a speculation on the origins of evolutionary

    novelty. Quart Rev Biol. 46:111138.

    Bush EC, Lahn BT. 2008. A genome-wide screen for noncoding

    elements important in primate evolution.BMC Evol Biol. 8:17.

    Caceres M, Lachuer J , Zapala MA, Redmond JC, Kudo L,

    Geschwind DH, Lockhart DJ, Preuss TM, Barlow C. 2003.Elevated gene expression levels distinguish human from non-

    human primate brains. Proc Natl Acad Sci U S A. 100:

    1303013035.

    The chimpanzee sequencing and analysis consortium. 2005. Initial

    sequence of the chimpanzee genome and comparison with the

    human genome. Nature 437:6987.

    Cooper GM, Sidow A. 2003. Genomic regulatory regions: insights

    from comparative sequence analysis. Curr Opin Genet Dev.

    13:604610.

    Costa FF. 2007. Non-coding RNAs: lost in translation? Gene

    386:110.

    Davidson EH, Britten RJ. 1979. Regulation of gene expression:

    possible role of repetitive sequences.Science 204:10521059.

    Dermitzakis ET, Reymond A, Lyle R, et al. (11 co-authors). 2002.

    Numerous potentially functional but non-genic conserved

    sequences on human chromosome 21.Nature 420:578582.

    Diskin SJ, Hou C, Glessner JT, et al. (27 co-authors). 2009. Copy

    number variation at 1q21.1 associated with neuroblastoma.

    Nature459:987991.

    Dorus S, Vallender EJ, Evans PD, Anderson JR, Gilbet SL,

    Mahowald M, Wyckoff GJ, Malcom CM, Lahn BT. 2004.

    Accelerated evolution of nervous system genes in the origin of

    Homo sapiens. Cell 119:10271040.Duret L, Galtier N. 2009. Biased gene conversion and the evolution

    of mammalian genomic landscapes. Annu Rev Genom Hum

    Genet. 10:285311.

    Enard W, Gehre S, Hammerschmidt K, et al. (56 co-authors). 2009. A

    humanized version of Foxp2 affects cortico-basal ganglia circuits

    in mice. Cell 137:961971.

    Enard W, Paabo S. 2004. Comparative primate genomics.Annu Rev

    Genom Hum Genet. 5:351378.

    Evans PD, Vallender EJ, Lahn BT. 2006. Molecular evolution of the

    brain size regulatory genes CDK5RAP2 and CENPJ. Gene

    375:7579.

    Garfield DA, Wray GA. 2010. The evolution of gene regulatory

    interactions. BioScience 60:1523.

    Gelfand Y, Rodriguez A, Benson G. 2007. TRDBthe tandemrepeats database. Nucleic Acids Res. 35:D80D87.

    Gregory SG, Barlow KF, McLay KE, et al. (178 co-authors). 2006. The

    DNA sequence and biological annotation of human chromo-

    some 1. Nature. 441:315321.

    Gu JY, Gu X. 2003. Induced gene expression in human brain after

    the split from chimpanzee.Trends Genet. 19:6365.

    Hayakawa T, Altheide TK, Varki A. 2005. Genetic basis of human

    brain evolution: accelerating along the primate speedway. Dev

    Cell. 8:24.

    Haygood R, Babbitt CC, Fedrigo O, Wray GA. 2010. Contrasts

    between adaptive coding and noncoding changes during human

    evolution.Proc Natl Acad Sci U S A. 107:78537857.

    Haygood R, Fedrigo O, Hanson B, Yokoyama KD, Wray GA. 2007.

    Promoter regions of many neural- and nutrition-related geneshave experienced positive selection during human evolution.

    Nat Genet. 39:11401144.

    Heissig F, Krause J, Bryk J, Khaitovich P, Enard W, Paabo S. 2005.

    Functional analysis of human and chimpanzee promoters.

    Genome Biol. 6:R57.

    Hilborn RC. 1994. Chaos and nonlinear dynamics. Oxford: Oxford

    University Press.

    Huttenhofer A, Schattner P, Polacek N. 2005. Non-coding RNAs:

    hope or hype?Trends Genet. 21:289297.

    Jacob F, Monod J. 1961. Gen etic regulatory mechanisms in synthesis

    of proteins. J Mol Biol. 3:318356.

    Jerison H. 1975. Evolution of brain and intelligence. New York:

    Academic Press.

    Johnson JM, Edwards S, Shoemaker D, Schadt EE. 2005. Dark

    matter in the genome: evidence of widespread transcription

    detected by microarray tiling experiments. Trends Genet. 21:

    93102.

    Jurka J. 1995. Origin and evolution of Alu repetitive elements. In:

    Maraia RJ, editor. The impact of short interspersed elements

    (SINEs) on the host genome. RG Landes, Austin, Texas: Springer.

    p. 2542.

    Jurka J. 2000. Repbase update: a database and an electronic journal

    of repetitive elements. Trends Genet. 9:418420.

    Jurka J. 2004. Evolutionary impact of human Alu repetitive elements.

    Curr Opin Genet Dev. 14:603608.

    Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O,

    Walichiewicz J. 2005. Repbase update, a database of eukaryotic

    repetitive elements.Cytogenet Genome Res. 110:462467.

    Paar et al. doi:10.1093/molbev/msr009 MBE

    1890

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    15/16

    Kapranov P, Willingham AT, Gingeras TR. 2007. Genome-wide

    transcription and the implications for genomic organization.Nat

    Rev Genet. 8:413423.

    Khaitovich P, Muetzel B, She XW, et al. (15 co-authors). 2004.

    Regional patterns of gene expression in human and chimpanzee

    brains. Genome Res. 14:14621473.

    King DC, Taylor J, Zhang Y, Cheng Y, Lawson HA, Martin J,

    Chiaromonte F, Miller W, Hardison RC. 2007. Finding cis-

    regulatory elements using comparative genomics: some lessons

    from ENCODE data. Genome Res. 17:775786.King MC, Wilson AC. 1975. Evolution at two levels in humans and

    chimpanzees. Science 188:107116.

    Kleinjan DA, Lettice LA. 2008. Long-range gene control and genetic

    disease. Adv Genet. 61:339388.

    Kleinjan DA, van Heyningen V. 2005. Long-range control of gene

    expression: emerging mechanisms and disruption in disease.Am

    J Hum Genet. 76:832.

    Lettice LA, Horikoshi T, Heaney SJH, et al. (21 co-authors). 2002.

    Disruption of long-range cis-acting regulator for Shh

    causes preaxial polydactyly. Proc Natl Acad Sci U S A.

    99:75487553.

    Manuelidis L. 1978. Complex and simple sequences in human

    repeated DNAs. Chromosoma 66:121.

    Maston GA, Evans SK, Green MR. 2006. Transcriptional regulatoryelements in the human genome.Annu Rev Genom Hum Genet.

    7:2959.

    Mattick JS. 2009. The genetic signatures of noncoding RNAs. PLoS

    Genet. 5:e1000459.

    Mekel-Bobrov N, Posthuma D, Gilbert SL, et al. (22 co-authors).

    2007. The ongoing adaptive evolution of ASPM and micro-

    cephalin is not explained by increased intelligence. Hum Mol

    Genet. 16:600608.

    Mercer TR, Dinger ME, Mattick JS. 2009. Long non-coding RNAs:

    insights into functions.Nat Rev Genet. 10:155159.

    Needleman SB, Wunsch CD. 1970. A general method applicable to

    the search for similarities in the amino acid sequence of two

    proteins. J Mol Biol. 48:443453.

    Nobrega MA, Ovcharenko I, Afzal V, Rubin EM. 2003. Scanninghuman gene deserts for long-range enhancers.Science. 302:413.

    Noonan JP, McCallion AS. 2010. Genomics of long-range regulatory

    elements. Annu Rev Genom Hum Genet. 11:123.

    Otto E. 1993. Chaos in dynamical systems. Cambridge: Cambridge

    University Press.

    Ovcharenko I. 2008. Widespread ultraconservation divergence in

    primates. Mol Biol Evol. 25:16681676.

    Paar V, Basar I, Rosandic M, Gluncic M. 2007. Consensus higher

    order repeats and frequency of string distributions in human

    genome. Curr Genom. 8:93111.

    Paar V, Pavin N, Rosandic M, Gluncic M, Basar I, Pezer R, Durajlija-

    ZinicS. 2005. ColorHORnovel graphical algorithm for fast scan

    of alpha satellite higher-order repeats and HOR annotation for

    GenBank sequence of human genome. Bioinformatics21:846852.

    Pennacchio LA, Ahituv N, Moses AM, et al. (19 co-authors). 2006. In

    vivo enhancer analysis of human conserved non-coding

    sequences. Nature 444:499502.

    Pennacchio LA, Rubin EM. 2001. Genomic strategies to identify

    mammalian regulatory sequences. Nat Rev Genet. 2:100109.

    Pollard KS, Salama SR, King B, et al. (13 co-authors). 2006a. Forces

    shaping the fastest evolving regions in the human genome.PLoS

    Genet. 2:15991611.

    Pollard KS, Salama SR, Lambert N, et al. (16 co-authors). 2006b. An

    RNA gene expressed during cortical development evolved

    rapidly in humans. Nature 443:167172.

    Popesco MC, Maclaren EJ, Hopkins J, Dumas L, Cox M, Meltesen L,

    McGavran L, Wyckoff GJ, Sikela JM. 2006. Human lineage-

    specific amplification, selection, and neuronal expression of

    DUF1220 domains. Science 313:13041307.

    Portin P. 2008. Evolution of man in the light of molecular genetics:

    a review. Part II. Regulation of gene function, evolution of

    speech and of brains.Hereditas 145:113125.

    Prabhakar S, Noonan JP, Paabo S, Rubin EM. 2006. Accelerated

    evolution of conserved noncoding sequences in humans.Science

    314:786.

    Raff RA, Kaufman TC. 1983. Embryos, genes, and evolution: the

    developmental-genetic basis of evolutionary change. New York:Macmillan.

    Redon R, Ishikawa S, Fitch KR, et al. (43 co-authors). 2006. Global

    variation in copy number in the human genome. Nature

    444:444454.

    Rosandic M, Paar V, Basar I. 2003. Key-string segmentation

    algorithm and higher-order repeat 16mer (54 copies) in human

    alpha satellite DNA in chromosome 7. J Theor Biol. 221:2937.

    RosandicM, Paar V, Basar I, GluncicM, Pavin N, PilasI. 2006. CENP-

    B box and pJa sequence distribution in human alpha satellite

    higher-order repeats (HOR). Chromosome Res. 14:735753.

    Rudd MK, Willard HF. 2004. Analysis of the centromeric regions of

    the human genome assembly.Trends Genet. 20:529533.

    Rudd MK, Wray GA, Willard HF. 2006. The evolutionary dynamics of

    alpha-sarellite.Genome Res. 16:8896.Schmid CW. 1996. Alu: structure, origin, evolution, significance, and

    function of one-tenth of human DNA.Prog Nucleic Acid Res Mol

    Biol. 53:283319.

    Schueler MG, Sullivan BA. 2006. Structural and functional dynamics

    of human centromeric chromatin. Annu Rev Genomics Hum

    Genet. 7:301313.

    Shapiro JA, von Sternberg R. 2005. Why repetitive DNA is essential

    to genome function.Biol Rev. 80:227250.

    Smith GP. 1976. Evolution of repeated DNA sequences by unequal

    crossover. Science 191:528535.

    Southern EM. 1975. Long range periodicities in mouse satellite DNA.

    J Mol Biol. 94:5169.

    Stephen S, Pheasant M, Makunin IV, Mattick JS. 2009. Large-scale

    appearance of ultraconserved elements in tetrapod genomesand slowdown of molecular clock. Mol Biol Evol. 25:402408.

    Taft RJ, Pang KC, Mercer TR, Dinger M, Mattick JS. 2010. Non-coding

    RNAs: regulators of disease. J Pathol. 220:126139.

    Takaishi M, Makino T, Marohashi M, Huh N. 2005. Identification of

    human hornerin and its expression in regenerating and psoriatic

    skin. J Biol Chem. 280:46964703.

    Tuzun E, Sharp AJ, Bailey JA, et al. (12 co-authors). 2005. Fine-scale

    structural variation of the human genome. Nat Genet. 37:

    727732.

    Tyler-Smith C, Brown WRA. 1987. Structure of the major block of

    alphoid satellite DNA on the human Y chromosome.J Mol Biol.

    195:457470.

    Uddin M, Wildman DE, Liu G, Xu W, Johnson RM, Hof PR,

    Kapatos G, Grossman LI, Goodman M. 2004. Sister grouping of

    chimpanzees and humans as revealed by genome-wide

    phylogenetic analysis of brain gene expression profiles. Proc

    Natl Acad Sci U S A. 101:29572962.

    Vallender EJ. 2008. Exploring the origins of the human brain through

    molecular evolution. Brain Behav Evol. 72:168177.

    Vallender EJ, Mekel-Bobrov N, Lahn BT. 2008. Genetic basis of

    human brain evolution.Trends Neurosci. 31:637644.

    Van Bakel H, Hughes TR. 2009. Establishing legitimacy and function

    in the new transcriptome. Brief Funct Genomic Proteomic.

    8:424436.

    Vandepoele K, Andries V, Van Roy F. 2009. The NBPF1 promoter has

    been recruited from the unrelated EVI5 gene before simian

    radiation.Mol Biol Evol. 26:13211332.

    Intragene HORs Distinguish Humans from Chimpanzees doi:10.1093/molbev/msr009 MBE

    1891

  • 8/13/2019 Mol Biol Evol 2011 Paar 1877 92

    16/16

    Vandepoele K, Andries V, Van Roy N, Staes K, Vandesompele J,

    Laureys G, De Smet E, Berx G, Speleman F, Van Roy F. 2008. A

    constitutional translocation t(1;17) (p36.2;q11.2) in a neuroblas-

    toma patient disrupts the human NBPF1 and ACCN1 genes.

    PLoS ONE. 3:e2207.

    Vandepoele K, van Roy F. 2007. Insertion of an HERV(K) LTR in the

    intron of NBPF3 is not required for its transcriptional activity.

    Virology. 362:15.

    Vandepoele K, Van Roy N, Staes K, Speleman F, Van Roy F. 2005. A

    novel gene family NBPF: intricate structure generated by geneduplications during primate evolution. Mol Biol Evol. 22:

    22652274.

    Varki A, Geschwind DH, Eichler EE. 2008. Explaining human

    uniqueness: genome interactions with environment, behaviour

    and culture. Nat Rev Genet. 9:749763.

    Visel A, Akiyama JA, Shoukry M, Afzal V, Rubin EM, Pennacchio LA.

    2008. Functional autonomy of distant-acting human enhancers.

    Genomics 93:509513.

    Warburton PE, Hasson D, Guillem F, Lescale C, Jin X, Abrusan G.

    2008. Analysis of the largest tandemly repeated DNA families in

    the human genome.BMC Genomics. 9:533.

    Warburton PE, Willard HF. 1996. Evolution of centromeric alpha

    satellite DNA: molecular organization within and between

    human and primate chromosomes. In: Jackson M, Strachan T,Dover G, editors. Human genome evolution. Oxford: BIOS

    Scientific Publishers. p. 121145.

    Waye JS, Durfy SJ, Pinkel D, Kenwrick S, Patterson M, Davies KE,

    Willard HF. 1987. Chromosome-specific alpha satellite DNA

    from human chromosome 1: hierarchical structure and genomic

    organization of a polymorphic domain spanning several

    hundred kilobase pairs of centromeric DNA. Genomics 1:

    4351.

    Weiner AM, Deininger PL, Efstratiaadis A. 1986. The reverse flow of

    genetic information: pseudogenes and transposable elements

    derived from nonviral cellular RNA. Annu Rev Biochem. 55:

    631661.

    Willard HF. 1985. Chromosome-specific organization of human

    alpha satellite DNA. Am J Hum Genet. 37:524532.

    Willard HF, Waye JS. 1987a. Chromosome-specific subsets of human

    alpha satellite DNA: analysis of sequence divergence within and

    between chromosomal arrays and evidence for an ancestral

    repeat. J Mol Evol. 25:207214.

    Willard HF, Waye JS. 1987b. Hierarchical order in chromosome-

    specific human alpha satellite DNA.Trends Genet. 3:192198.

    Williams G. 2002. EMBOSS Transeq. Available from:www.ebi.ac.uk/

    Tools/emboss/transeq/index.html.

    Woolfe A, Elgar G. 2008. Organization of conserved elements near

    key developmental regulators in vertebrate genomes.Adv Genet.

    61:307338.

    Wray GA, Babbitt CC. 2008. Enhancing gene regulation. Science

    321:13001301.

    Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV,

    Romano LA. 2003. The evolution of transcriptional regulation in

    eukaryotes. Mol Biol Evol. 20:13771419.

    Wu JC, Manuelidis L. 1980. Sequence definition and organization of

    a human repeated DNA. J Mol Biol. 142:363386.

    Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander ES.2007. Systematic discovery of regulatory motifs in conserved

    regions of the human genome, including thousands of CTCF

    insulator sites. Proc Natl Acad Sci U S A. 104:71457150.

    Xu AG, He L, Li Z, et al. (15 co-authors). 2010. Intergenic and repeat

    transcription in human, chimpanzee and macaque brains

    measured by RNA-Seq.PLoS Comput Biol. 6:e1000843.

    Zak M, Zbilut JP, Meyers RE. 1997. From instability to intelligence.

    Complexity and predictability in nonlinear dynamics. Berlin

    (Germany): Springer.

    Paar et al. doi:10.1093/molbev/msr009 MBE

    http://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/Tools/emboss/transeq/index.html