1983 Kimura - Synon-Nonsynon Mut Rate

  • Upload
    milly

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    1/10

    J. Mol. Evol. 16, 111--120 (1980)Journal ofMolecular Evolution by Springer-Verlag - 1980

    A Simple Method for Estimating EvolutionaryRates of Base SubstitutionsThrough ComparativeStudies of Nucleotide Sequences

    Motoo KimuraNational Inst itute of Genetics, Mishima 411, Japan

    Summar y. Some simple formulae were obta ined which enable us to est imateevolutionary distances in terms of the numbe r of nucleotide substitutions(and, also, the evolu tionary rates when the divergence times are known). Incomparing a pair of nucleotide sequences, we distinguish two types of differ-ences; if homologous sites are occupied by di fferent nucleotide bases butboth are purines or both pyrimidines, the difference is called type I (or "tran-sit ion" type), while, if one of the two is a purine and the other is a pyrimi-dine, the diffe rence is called type I1 (or "t rans vers ion" type). Lett ing P andQ be respectively the fractions of nuc leotide sites showing type I and type IIdifferences between two sequences compared, then the ev olutio nary distanceper site is K = -- (1/2) In {(1 -- 2P -- Q) ~z - ~ } . The evolut ionary rate peryear is then given by k = K/(2T), where T is the time since the divergence ofthe two sequences. If only the third codon positions are compared, the syn-ony mou s co mpo nen t of the evolutionar y base substituti ons per site is esti-mated by K ' S = -- (1/2) In (1 -- 2P -- 0.). Also, formula e for st andard errorswere obtained. Some examples were worked o ut using reported globin se-quences to show that s yno nym ous substi tuti ons occur at much higher ratesthan amino acid-altering subst itut ions in evolution.Key words: Molecular evolution - Evolut ionary distance estimat ion - Syn-onymous substitution rate

    During the last few years, rapid sequencing of DNA has become feasible, and dataon nucleo tide sequences of various parts of th e g enome in diverse organisms havestarted to accumulat e at an accelerated pace. Each new report on such sequences

    Contribution No. 1330 from the National Institute of Genetics, Mishima, 411 Japan0022--2844/80/0016/ 0111/~ 02.00

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    2/10

    112 M. Kimurainvites evolut ionary considerations through comparative studies. Therefore, it is de-sirable if good statistical method s are established for estimating evolution ary distancesbetween homologous sequences in terms of the numb er of nucleoti de base substituti ons.

    Recently, Miyata and Yasunaga (1980) proposed a new metho d for this purpose. Theilmethod consists in tracing, through successive one step changes, all the possible paths(restricted by the assumption of "the min imu m substitu tion numbe r") for each pairof codons compared, giving each step its due weight based on relative acceptance ratesof amino acid substitut ions. Using this meth od they succeeded in revealing some inter-esting properties of syno nymous base substitutions. One drawback of their method,however, is that it is too tedious.

    The purpose of this note is to derive some simple formulae which enable us to ob-tain reliable estimates on the evoluti onary distances between two nucleotide sequencescompared. The present method of estimation is not on ly handy but also has the meri tof incorporating the possibility that sometimes "trans ition " type substitutions may oc-cur more frequent ly than "transversion" type substitutions.

    Let us compare two homologous sequences and suppose that n nucleotide sites areinvolved. In what follows, we express sequences in terms o f RNA codes, so that t hefour bases are designated by letters U, C, A and G.

    Now, we fix our att enti on on one of the n sites, and investigate how the homologoussites in two species differentiate from each other in the course of evolution, startingfrom a com mo n ancestor T years back. Since there are 4 possibilities with respect toa base occupied at each site, there are 16 comb ina tio ns of base pairs when the homol -ogous sites in two species are compared. These are listed in Table 1. For example,UU in the first line represents the case in which homologous sites in the first and sec-ond species are bo th occupied by base U. Similarly, UC in the second line representsthe case in which homologous sites in the first and second species are occupied by Uand C.Table 1. Types of nucleotide base pairs occupied at homologous sites in two species. TypeI difference includes four cases in which both are purines or both are pyrimidines (line 2). TypeII difference consists of eight cases in which one of the bases is a purine and the other is a pyri-midine (lines 3 and 4).Same UU CC AA GG Total(Frequency) (R 1 (R 2 ) (R 3 (R4) (R)Different,Type I UC CU AG GA Total(Frequency) (P1) (P 1 (P2) (P2) (P)

    UA AU UG GUDifferent,Type I1 (Q1) (Q1) (Q2) (Q2) Total

    CA AC CG GC (Q)(Frequency) (Q3) (Q3) (%) (%)

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    3/10

    E s t i m a t i o n o f E v o l u t i o n a r y R a t e s o f N u c l e o t i d e S u b s t i t u ti o n s 1 1 3W h e n t h e h o m o l o g o u s s it e s a r e o c c u p i e d b y d i f f e r e n t b a s es , w e s h a l l d i s ti n g u is h t w o

    t y p e s o f d i ff e r e n c e s , n a m e l y , t y p e I a n d t y p e I I d i f fe r e n c e s . A s sh o w n i n T a b l e 1, t h et y p e I d i f f e r e n c e i n c l u d e s 4 c a se s i n w h i c h b o t h a r e e i th e r p u r i n e s o r p y r i m i d i n e s , a n dt h e t y p e I I d i f f e r e n c e i n c l u d e s 8 c a s es in w h i c h o n e is a p u r i n e a n d t h e o t h e r i s a p y r -i m i d i n e .L e t ~ a n d ~ b e t h e r a t e s o f b a s e s u b s t i t u t i o n s a s s h o w n i n F i g . 1 . I n o t h e r w o r d s ,c~ i s t h e r a t e o f t r a n s i t i o n t y p e s u b s t i t u t i o n s , a n d 2/3 i s t h a t o f t r a n s v e r s i o n t y p e s u b s t i -t u t i o n s , s o t h a t t h e t o t a l r a t e o f s u b s t i t u t i o n s p e r s i te p e r u n i t t i m e ( y e a r ) is k = e + 2/3.N o t e t h a t ~ a n d / 3 r e f e r to e v o l u t i o n a r y r a t e s o f m u t a n t s u b s t i t u t io n s i n t h e s p e c ie sr a t h e r t h a n t h e o r d i n a r y m u t a t i o n r a t e s at t h e l e ve l o f a n i n d iv i d u a l. T o s i m p l i f y t h ef o l l o w i n g t r e a t m e n t s , w e a s s u m e t h a t t h e s e r a t e s a r e e q u a l f o r a l l b a s e s a s s h o w n i nF i g . 1 .

    W e n o w d e f i n e t w o p r o b a b i l i t i e s d e n o t e d b y P a n d ( 2, w h e r e P i s t h e p r o b a b i l i t y o fh o m o l o g o u s s i t e s s h o w i n g a t y p e I d i f fe r e n c e , w h i l e (2 i s t h a t o f t h e s e s i t e s s h o w i n g at y p e I I d i f fe r e n c e . M o r e d e t a i l e d s p e c i fi c a ti o n s o f p r o b a b i l i t i e s o f i n d i v i d u a l c o m b i n a -t i o n s o f b a s e s a r e g i v e n i n T a b l e 1 . F o r e x a m p l e , t h e p r o b a b i l i t y o f U C ( a n d a l s o C U )i s d e n o t e d b y P I " L e t T b e t h e t i m e s i n c e d i v e r g e n c e o f t h e t w o s p e c i e s ( m e a s u r e d i ny e a r s ) , a n d d e n o t e b y P ( T ) a n d Q ( T ) t h e p r o b a b i l it i e s o f t y p e I a n d t y p e I I d i f f e re n c e sa t t im e T . N o t e t h a t t h e p r o b a b i l i t y o f i d e n t i t y at h o m o l o g o u s s i te s w h i c h w e d e n o t eb y R ( T ) i s e q u a l t o 1 - - P ( T ) - - ( 2 (T ) .

    T h e n , w e c a n d e r i v e th e e q u a t i o n s f o r P a n d ( 2 a t t i m e T + A T i n t e r m s o f P , O~ a n dR a t t i m e T a s f o l l o w s , w h e r e A T s t a n d s f o r t h e l e n g t h o f a s h o r t t i m e i n t e r v a l . C o n s i -d e r a p a r t i c u l a r b a s e p a i r o f t y p e I , s a y U C , w h i c h o c c u r s w i t h p r o b a b i l i t y P 1 ( T + A T )( s ee s e c o n d l i n e in T a b l e 1 ). W e c a n d i s t in g u i s h th r e e w a y s b y w h i c h U C a t t i m e T + A Ti s d e r i v e d f r o m v a r i o u s b a s e p a i r s a t t i m e T .( i) U C is d e r i v e d f r o m U C ( w h i c h o c c u r s w i t h p r o b a b i l i t y P $ ( T ) ) w h e n b o t h U a n d Cr e m a i n u n c h a n g e d . S i n c e t h e p r o b a b i l i t y o f o c c u r r e n c e o f s u b s t i t u t i o n p e r s it e d u r i n g

    Fig . 1 . Scheme of evolu t ionary base subs t i tu t ionsand the i r ra t es per un i t t im e

    P y r i m i d i n e s ( U . .. . ~ ' C )

    P u r i n e s ( A ~ ' G )o (a s h o r t t i m e i n t e r v a l o f l e n g t h A T i s ( a + 21 3)A T, t h e p r o b a b i l i t y o f n o c h a n g e o c c u r r i n ga t b o t h h o m o l o g o u s s i te s i s [ 1 - ( ~ + 2 /3 )A T ] 2 s o th a t t h e c o n t r i b u t i o n c o m i n g f r o mt h i s c l a ss i s [ 1 - ( ~ + 2 / 3 ) A T ] 2 P I ( T ) , w h i c h r e d u c e s t o [ 1 - 2 ( ~ + 2 / 3 ) A T ] P I ( T ) i f w en e g l e c t s m a l l te r m s o f t h e o r d e r ( A T ) 2 . W e a l so i g n o r e t h e r a r e p o s s i b i l i t y t h a t C Uc h a n g e s t o U C b y a d o u b l e c h a n g e .( ii ) U C i s d e r i v e d f r o m U U w h e n U in t h e s e c o n d s p e c i e s i s r e p l a c e d b y C , a n d f r o mC C w h e n C i n t h e f i r s t s p e c i e s is r e p l a c e d b y U ( s e e t h e f i r s t li n e i n T a b l e 1 ). S i n c e U Ua n d C C o c c u r w i t h r e s p e c t iv e f r e q u e n c i e s R I ( T ) a n d R 2 ( T ) , a n d s i n c e e a c h c h a n g e s t oU C a t t h e r a t e (x, t h e t o t a l c o n t r i b u t i o n c o m i n g f r o m t h i s c la s s i s eEX T [ R I ( T ) + R 2 ( T ) ] .W e n e g l e c t c as e s i n w h i c h s u b s t i t u t i o n s o c c u r s i m u l t a n e o u s l y a t b o t h s i te s b e c a u s e s u c hp r o b a b i l i t i e s a r e o f t h e o r d e r o f ( A T ) 2 .

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    4/10

    1 1 4 M . K i m u r a( ii i) U C c a n a ls o b e d e r i v e d f r o m U A , U G , A C a n d G C ( s ee t h e t h i r d l i n e i n T a b l e 1 )w h i c h o c c u r w i t h r e sp e c t i v e f r e q u e n c i e s Q I ( T ) , Q 2 ( T ) , Q 3 ( T ) a n d Q 4 ( T ) , a n d e a c h o fw h i c h c h a n g e s t o U C a t t h e ra t e/ 3 . T h e c o n t r i b u t i o n o f t h is c la ss t o U C a t T + A T i s/3A T [ Q I ( T ) + Q 2 ( T ) + Q 3 ( T ) + Q 4 ( T ) ] o r / 3 A T Q ( T ) / 2 .

    C o m b i n i n g a l l t h e s e c o n t r i b u t i o n s c o m i n g f r o m v a r i o u s b a s e p a ir c la ss es at t i m e T ,w e g e t

    P I ( T + A T ) = [ 1 - - ( 2& + 4 /3 ) A T ] P I ( T ) + 0 ~ T [ R I ( T ) + R 2 ( T ) ] + /3 A T " Q ( T ) /2 .S i m i l a r l y , f o r t h e b a s e p a i r A G ,

    P 2 ( T + A T ) = [ 1 - ( 2 & + 4 [3 ) A T ] P 2 ( T ) + Z 6 T [ R 3 ( T ) + R 4 ( T ) ] + / 3 A T Q ( T ) / 2S u m m i n g t h es e t w o e q u a t i o n s , a n d n o t i n g P ( T ) = 2 P I ( T ) + 2 P 2 ( T ) a n d R ( T ) = R I ( T ) +R 2 ( T ) + R 3 ( T ) + R 4 ( T ) = 1 - - P ( T ) - - Q ( T ) , a n d w r i t i n g A P ( T ) -- P ( T + A T ) - P ( T ) w eg e t

    A P ( T ) / A T -- 2 & - 4 ( a + /3 ) P ( T ) - 2 ( a - / 3 ) Q ( T ) . ( 1 )C a r r y i n g o u t a s i m i l a r s e ri es o f c a l c u l a t i o n s fo r b a s e p a i r s o f t y p e I I, w e o b t a i nA Q ( T ) / A T = 4 /3 - 8 / 3Q ( T ) ( 2 )

    F r o m t h e s e t w o f i n i t e d i f fe r e n c e e q u a t i o n s ( E q s . 1 a n d 2 ) , w e o b t a i n t h e f o l l o w i n g s e to f d i f f e r e n t i a l e q u a t i o n s

    d P ( T ) = 2 ~ - - 4 ( & + / 3 ) P ( T ) - - 2 (0 t - / 3 ) Q ( T )d T (3 )d Q ( T ) = 4 /3 - 8 /3Q (T )d TT h e s o l u t i o n o f t h i s s et o f e q u a t i o n s w h i c h s a ti s fi e s t h e c o n d i t i o n

    P ( 0 ) -- Q ( 0 ) = 0 , ( 4 )i .e . , n o b a s e d i f f e r e n c e s e x i s t a t T = 0 , i s a s f o l l o w s ,

    11 1 e-4(&+/3)T + e.8/3TP( T) - 4 2 4 - (5 )

    Q (T ) - 1 1 e_8V a2 2W r i ti n g P T a n d O .T f o r P ( T ) a n d Q ( T ) , w e g e t, f r o m t h e s e t w o e q u a t i o n s ,

    4 ( a + f l )T = - - l o g e ( 1 - - 2 PT - Q T )a n d

    8/YI" = - l o g e ( 1 - 2 Q T ) ,s o t h a t

    4 a T = - - l o g e ( 1 - - 2 P T - - Q T ) + ( 1 1 2 ) l o g e ( 1 - - 2 Q T )S i n c e t h e r a t e o f e v o l u t i o n a r y b a se s u b s t i t u t i o n s p e r u n i t t i m e i sk = o e + 2 ~ ,

    ( 6 )

    ( 7 )

    (8 )

    (9 )

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    5/10

    Estimation of Evolutionary Rates of Nucleotide Substitutions 115the t otal num ber of subs titut ions (including revertant and superimposed changes) persite which separate the two species (and therefore involve two b ranches each with lengthT) is

    K=2T k =2~T +4~T ,where aT and ~3T are given by Eqs. (8) a nd (9). Then , omi tti ng the subscript T fromPT and QT'we obtain

    K=-- l lo g e { (1 - 2P- Q) x /1 - 2Q) . (10)It is remarkable that, as can be seen from Eqs. (3), at equilibrium 2P -- Q = 1/2, evenwhen a :/: 3.

    This equati on may be used to estimate the evolut ionary distance between two se-quences in terms of the n umbe r of base substi tuti ons per site that have occurred in thecourse of evolution exten ding over T years. In this equation, P = nl / n and Q = n2/n,where n I and n 2 are respectively the n umbers of sites for which two sequences differfrom each other with respect to type I ("transi tion" type) and type II ("transversion"type) subst itut ions and n is the t otal numb er of sites compared.

    In the special case of a = J3, Eqs. (5) and (6) reduce toPT = QT/2 = (1 - e-8tT)/4 (11)

    Then, s ubsti tuti ng P = Q/2 in (10), we getK= -- ~l og e(1 -- X) , (12)

    where X = P + Q = 3Q/2 is the frac tion of sites for which two sequences differ fromeach other. This formula is well -know n (see Kimur a and Ohta 1972), and it was firstobta ined by Jukes and Cantor (1969). However, in actual situations, particularly whenthe third positions of codons are compared, P is often larger than Q, and therefore theass umption of a = ~ or P = Q/2 is no t always realistic. This is one reason why the newfor mul a (10) is bett er than (12). Eq. (10) also has a desirable prop erty in tha t as P andQ get small, it converges to K = P + Q ind ependent of 0t and/3.

    Since a large fraction of s ubsti tuti ons at the third positions are synon ymou s, thatis, they do not cause amino acid changes, it would be interesting to est imate the syn-onym ous co mponen t of the substitu tion rate at this position.

    As is evident from the standard RNA code table, for a given pair of bases in the firstand the second codon positions, roughly speaking there are two situati ons; either thethird positi on is completely s yno nym ous (four-fold degeneracy) or syn on ymy is re-stricted within purines or pyrimidines (two-fold degeneracy). These two situationsoccur roughly in equal numbers. Thus, the synon ymous component of substitutionsat the t hird pos ition which we denote by k S may be estimated by

    1 1k S = ~-( a+ 2fl )+~ a= a+ /3 . (13)Let K S' = 2Tk S' = 2(a +/3)T be the syn ony mou s com pon ent of the distance, then,

    we get1K S =- -~ - l og e( 1-2 P- Q) , (14)

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    6/10

    1 16 M . K i m u r ai f w e a p p l y E q . (7 ) , so th a t t h e s y n o n y m o u s c o m p o n e n t o f s u b s t i t u t i o n r a t e p e r u n i tt i m e m a y b e e s t i m a t e d f r o m k S ' = K S ' / ( 2 T ) . W r i t in g k 'n u c (S ) r a t h e r t h a n k S ' t o e m -p h a s i z e t h a t t h i s r e f e rs t o t h e r a t e p e r n u c l e o t i d e s i t e , w e g e t , u s in g E q . 1 4 , t h e f o l l o w -i ng f o r m u l a .

    1k ' n u c ( S ) = - ~ l og e (1 - 2P - Q) (15 )T h e c o r r e s p o n d i n g f o r m u l a f o r a m i n o a c i d - a l te r i n g s u b s t i t u t i o n s m a y b e o b t a i n e d b y

    k n u c ( A ) = K " / ( 2 T ) , ( 1 6 )w h e r e K " is t h e e v o l u t i o n a r y d i s t a n c e c o m p u t e d b y a p p l y i n g E q . ( 1 0 ) t o a s et o f th ef i r s t a n d s e c o n d p o s i t i o n s o f c o d o n s ( i .e . , b y e x c l u d i n g t h e t h i r d p o s i t i o n s in t h e s e-q u e n c e c o m p a r i s o n ) .

    W e c a n a l so d e r i v e f o r m u l a e f o r t h e e r r o r v a ri a n c e s o f t h e e s t i m a t e s K a n d K S ' . L e t6 K , ~ P a n d ~ Q b e r e s p e c t i v e l y s m a l l c h an g e s in K , P a n d Q . T h e n

    t~ K = a tS P + b ~ Qw h e r e

    a -

    a n d1 - 2 p - Q1(1

    b = ~ - 1 - 2 P - qs o t h a t

    OK 2 = E { (SK ) 2} =

    ( 1 7 )

    1) 1 - - 2 Q ' ( 1 8 )a 2 E { ( ~ p ) 2 } + 2 a b E { ~ P ~ Q } + b 2 E { ( 6 Q ) 2 } ,

    w h e r e E s ta n d s f o r t h e e x p e c t a t i o n o p e r a t o r . T h e n n o t i n g t h a t t h e s a m p l i n g v a r ia n c e sa n d t h e c o v a r i a n c e a r e E ( ( 8 p ) 2 } = P ( 1 - P ) / n , E { ( ~ Q ) 2 ) = Q ( 1 - Q ) / n a n d E ( 8 P 8 0 ,}= - P Q / n . T h e s t a n d a r d e r r o r o f K i s t h e n

    1 I X |a 2 p + b 2 O D - ( a P + b Q ) 2 IO K ( 1 9 )I n a s i m i l a r m a n n e r , w e c a n d e r i v e t h e s t a n d a r d e r r o r o f K S ' w h i c h t u r n s o u t t o b e a sf o l l o w s .

    v /4 P + Q - ( 2 P + Q )2 K ' S - 2 ( 1 - 2 P - Q ) c -n - ( 2 0 )A s a n e x a m p l e , l e t u s c o m p a r e t h e n u c l e o t i d e s e q u e n c e o f t h e r a b b i t / 3 g l o b i n ( E fs tr a-

    t i a d i s a n d K a f a t o s 1 9 7 7 ) w i t h t h a t o f c h i c k e n 13 g l o b i n ( R i c h a r d s e t a l 1 9 7 9 ) . T h e r e a r e4 3 8 n u c l e o t i d e s i te s t h a t c a n b e c o m p a r e d , c o r r e s p o n d i n g t o 1 4 6 a m i n o a c i d si te s (c o -d o n s ) . A m o n g t h e s e si t es , w e f i n d t h a t t h e r e a r e 5 8 s it e s f o r w h i c h t h e s e t w o s e q u e n c e sh a v e t y p e I d i f f e r e n c e s , a n d 6 3 s i t es w i t h t y p e I I d i f f e r e n c e s . T h u s , P = 0 . 1 3 2 , Q = 0 . 1 4 4a n d w e o b t a i n K = 0 . 3 4 8 . M a m m a l s a n d b i r d s p r o b a b l y d i v e r g ed d u r i n g t h e C a r b o n i f e r -o u s p e r i o d ( se e R o m e r 1 9 6 8 ) , s o w e t e n t a t i v e l y t a k e T = 3 00 X 1 0 6 y ea r s . T h e e v o l u -t i o n a r y r a t e p e r s i t e i s t h e n k n u c = K / ( 2 T ) = 0 . 5 8 1 0 9 p e r y e a r . T h i s is t h e o v e r a l l

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    7/10

    E s t i m a t i o n o f E v o l u t i o n a r y R a t e s o f N u c l e o t i d e S u b s t i t u t i o n s 1 17r a t e p e r s i t e, b u t i t is m u c h m o r e i n t e r e s ti n g to e s t i m a t e s e p a r a t e l y th e e v o l u t i o n a r yra t es fo r t he th ree co don pos i t i ons . Fo r t he f i r s t pos i t i on , t he re a re 146 nuc leo t id es i tes com pa red , and we f ind P = 15 /146 and Q = 21 /146 , g iv ing K 1 = 0 .300 , where sub -sc r ip t 1 deno tes t ha t i t r e fe r s t o t he f i r s t codon pos i t i on . S im i l a r ly , fo r t he second pos i -t i on , we f ind P = 7 /146 and Q = 18 /146 so tha t K 2 = 0 .195 . F ina l ly , fo r t he th i rd pos i -t i on , P = 36 /146 and Q = 24 /146 , and we ge t K 3 = 0 .635 , wh ich i s m uch h igher t hanthe co r resp ond in g es t im ates fo r t he f i r s t and sec ond pos i t i ons . We can a l so es t im a te thes y n o n y m o u s c o m p o n e n t o f t h e e v o l u t i o n a r y d i s t a n c e p e r t h i r d c o d o n p o s i ti o n . F r o mEq. (14) , th is tu r ns ou t to be K S ' = 0 .535 .

    In Tab le 2 , r esu l ts o f s im i l ar ca l cu la t ions a re l i s t ed fo r var ious com par i son s invo lv ingthe hum an /3 (Maro t t a e t a l . 1977 ) and the m ouse /3 g lob in sequences (Konke l e t a l . 1978 ) ,in add i t ion to the ch icken and rabb i t /3 -g lob in sequences . Exc ep t fo r t he l as t two co m -par is ons involv ing the a bno rma l , g lob in l ike ~-3 gene (N ish ioka et a l . 1980) i t i s c leartha t t he r e l a t ionsh ip K 2 ~ K 1 ~ K3 ho ld s genera l ly ; t he evo lu t ionary m u tan t subs t i t u -t i o n s a r e m o s t r a p i d a t t h e t h i r d p o s i t i o n , a n d t h i s is f o l l o w e d b y t h e f i r s t p o s i t i o n a n dt h e n , a t t h e s e c o n d p o s i t i o n t h e s u b s t i t u t i o n s a re t h e s lo w e s t.

    T h i s c a n be r e a d i l y i n t e r p r e t e d b y t h e n e u t r a l t h e o r y o f m o l e c u l a r e v o l u t i o n ( K i m u r a1968 ; King and Jukes 1969 ; see a l so Kim ura 1979 ) as fo l lows . Am ong the th ree co donp o s i t i o n s , ba s e s u b s t i tu t i o n s a t t h e s e c o n d p o s i t i o n s t e n d t o p r o d u c e m o r e d r a s t i cchanges in the phys i co -c hem ica l p rop er t i e s o f am ino ac id s than those a t t he f i r s t posi -t i ons . Take fo r exam ple a cod on fo r P ro (CCN) . Subs t i t u t ions fo r base C a t the f i r s tpos i t i o n o f U , A , and G , l ead respec t ive ly to Ser , Th r and Ala . In te rm s o f Miy a ta ' sd i s t a n c e ( b a s e d o n p o l a r i t y a n d v o l u m e d i f f e r e n c e s b e t w e e n a n a m i n o a c i d p ai r ; s eeMi ya ta e t a l. 1979 ) , t h ey a re 0 .56 , 0 .87 and 0 .06 un i t s app ar t f rom Pro , wi th the averaged i s t ance o f abo u t 0 .5 . On the o ther hand , t he c o r resp ond ing average d i s t ance resu l t i ngf r o m s u b s t i t u t i o n s f o r C a t t h e s e c o n d p o s i t i o n o f t h e c o d o n t u r n s o u t t o b e a b o u t 2 .5 .

    Th i s m eans tha t m u ta t iona l changes a t t he f i r s t pos i t i on have a h igher chance o f no tbe ing harm fu l ( i .e . , s e l ec t ive ly neu t ra l o r equ iva len t ) t ha n those a t t he seco nd pos i t i on ,and there fo re , have a h igher chance o f be ing f ixed in the spec ies by r a ndom d r i f t (Ki -m u ra and O h ta 1974 ) . Th i s t yp e o f r eason ing app l i es m ore fo r c ib ly to the th i rd pos i -t i on ( as com pared wi th the f i r s t and the second pos i t i ons ) , s ince a m ajo r i ty o f m u ta -t iona l changes a t t h i s pos i t i on do no t cause am ino ac id changes .

    T h e r a t i o p e r s i te o f s y n o n y m o u s t o a m i n o a c i d -a l t e ri n g s u b s t i t u ti o n s a s m e a s u r e dby 2K s ' / ( K 1 + K2) , i s 4 .17 fo r t he hum an /3 - rab b i t /3 com par i son (T = 8 X 107 year s ) ,2 .16 fo r t he ch icken /3 - rabb i t /3 com par i son (T = 3 X 108 year s ) bu t on ly 1 .41 fo r t herab b i t ~o rabb i t/3 com p ar i son (T = 5 X 108 year s ) . I t l ooks as i f syn ony m o us s ubs t i t u -t i ons occu r m ore f r equen t ly du r ing the l a t e r ( i . e . m ore r ecen t ) s t ages o f g lob in evo lu -t ion than i t s ea r ly s tages, bu t such a t ende ncy i s p rob ab ly m or e a ppare n t t han r ea l. Inm y o p i n i o n , t h is is d u e t o l o w e r d e t e c t a b i l i t y o f s y n o n y m o u s s u b s t i t u t io n s f o r m o r er e m o t e c o m p a r i s o n s ; as m o r e a n d m o r e s y n o n y m o u s s u b s t i t u t i o n s a c c u m u l a t e a t th eth i rd po s i t i ons o f codons , i t b ecom es p rog ress ive ly d i f f i cu l t t o de t ec t a l l o f t hem . Inf a c t , a s c o m p a r e d w i t h t h e f i r s t a n d t h e s e c o n d p o s i t i o n s , t h e t h i r d p o s i t i o n s h o w sm arked dev ia t ion o f t he base com pos i t i on f rom equa l i t y ( e .g . , G 36%, C 30%, U 27%, A7 % i n h u m a n / 3 ) s o t h a t t h e p r e s e n t m e t h o d o f e s t i m a t i o n m a y b e c o m e i m p r e c is e f o r avery l a rge va lue o f T .

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    8/10

    11 8

    ~ . ~ f i ~

    u~

    -~ ~.~ ~

    ~.~ ~.

    ~ ga .~ 'g

    e,i ,~ g~ o

    r

    e

    +1 +l +1 +I +1 +l +1 +l

    +1 +l +~ . +[ +1 +~ +1 +1

    +i +1 +F +1 +1 +1 +1 +1

    N d d ~ d N ~ d

    +1 I +1 +l +1 +1 +t +1

    d d d d d d d d

    Z~ Z~ ~ ~ ~ :~ :~.

    M. Kimura

    The l a s t two compar i so ns in Tab le 2 invo lve the g lob in - l ike c t-~ gene recen t ly se-quenced in the mouse (N ish ioka e t a l. 1980). Th i s gene comp le te ly l acks two in t e rven -ing sequences no rma l ly p re se n t in a l l t he a and ~ g lob in genes , and i t does no t encodeg lob in . In o the r words , i t i s i nac t ive in the p rod uc t io n o f s t ab le mR NA . Howeve r ,f rom a compar i son o f th i s sequence w i th the no rma l , adu l t mouse a g lob in (a - l ) and therabb i t a g lob in nuc leo t id e sequences , i t is ev iden t tha t t h i s gene evo lved f rom a no rma lances t ra l a g lob in gene th rou gh dup l i c a t ion and subsequen t lo ss o f it s in t e rven ing se -quences . Th i s mu s t have occu r red a f t e r the mouse and the rab b i t d iverged f rom the i r

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    9/10

    E s t i m a t i o n o f E v o l u t i o n a r y R a t e s o f N u c l e o t i d e S u b s t i t u t i o n s 11 9co mm on ances to r some 80 mi l l ion yea rs ago . Th is ~ -g lob in -l ike gene acqu i red in i t scod ing reg ion a num ber o f in se r tions and de le t ions o f nuc leo t ide s . In mak in g sequencecompar i sons , the re fo re , I chose on ly those codons o f the ~ -3 gene which do no t con ta insuch changes and which a re e i the r iden t i ca l w i th o r d i f fe r on ly th rough base subs t itu -t ions f rom the co r re spo nd ing (ho molo gous ) cod ons o f the mouse cz-1 and the ra bb i t cxgenes.

    As seen f ro m the las t three l ines of Table 2 , th is unexpr esse d c~-3 gene evolved a t amuch fa s t e r ra t e than i t s no rm a l coun te rp a r t (mouse ~ -1 gene ) , pa r t i cu la r ly w i th re spec tto the f i r s t and the second cod on pos i t ions . Th i s i s e a sy to unde rs t and f r om th e neu t ra ltheor y . Unde r a no rma l s i tua t ion , each gene i s sub jec t to a se l ec tive cons t r a in t com ingf r o m t h e r e q u i r e m e n t t h a t t h e p r o t e i n w h i c h i t p r o d u c e s m u s t f u n c t i o n n o r m a l l y . E v-o lu t iona ry changes are re s t r i c t ed wi th in such a se t o f ba se subs t i tu t ion s . Howeve r , oncea gene is f ree d f ro m th is con st ra in t , as i s the case for th is g lobin- l ike ~-3 gene , prac t i -ca l ly a ll t he base subs t i tu t io ns in i t become ind i f fe ren t to D a rwin ian f i tne ss , and ther a t e o f b as e s u b s t i t u t io n s s h o u l d a p p r o a c h t h e u p p e r l i m i t s e t b y t h e m u t a t i o n r a t e( T h is h o l d s o n l y i f t h e n e u t r a l t h e o r y i s v a li d , b u t n o t i f t h e m a j o r i t y o f b a s e s u b s t it u -t ions a re d r iven by po s i t ive se l ec t ion ; see K im ura 1977) . I f t he ra t e s o f syn ony mo uss u b s t i t u t i o n s a r e n o t v e r y fa r f r o m t h i s l im i t ( K i m u r a 1 9 7 7 ) w e m a y e x p e c t t h a t t h er a t es o f e v o l u t i o n o f a " d e a d g e n e " a r e r o u g h l y e q u a l t o t h o s e o f s y n o n y m o u s s u b s t i tu -t i o n s. R e c e n t l y M i y a t a ( p e r s o n al c o m m u n i c a t i o n ) c o m p u t e d t h e e v o l u t i o n a r y r a te o fa mous e pse udo a lph a g lob in gene ($ ~ 30 .5 ) wh ich was sequenced by Van in e t a l. (1980)and w hich appea rs to be e ssen t i a l ly equ iva len t to the 0~-3 gene . He a l so ob ta ined a re su l ts u p p o r t i n g t h i s p r e d i c t i o n .

    F ina l ly , i t wou ld be in te re s t ing to e s t ima te base subs t i tu t ion ra t e s in non-cod ing re -g ions such a s in t rons . I u se da ta p re sen ted b y van Ooye n e t a l. (1979) who inves t iga teds imi la r i ty be twee n the nu c leo t ide sequences o f r abb i t and mouse /3 -g lob in genes . The yl i s t ( see the i r Tab le 2 ) sepa ra te ly the numbe rs o f " t r an s i t io n" and " t rans ve rs io n" typed i f f e r en c e s b e t w e e n t h e h o m o l o g o u s p a r t s o f t h e s e s eq u e nc e s . F o r t h e s m a l l i n t r o n sexc lud ing 5 gaps tha t a mo un t to 6 nuc leo t ide s , P = 27 /113 , Q = 18 /113 and n = 113.Using Eqs. 10 and 19, we ge t K -- 0 .60 -+ 0 .12. This i s not s ignif ic ant ly d i f f e ren t f romthe su bs t i tu t ion ra t e a t t he th i rd pos i t ion K 3 = 0 .43 -+ 0 .07 . The l a rge in t rons o f r abb i tand mo use /3 g lob in genes d i f fe r cons ide rab ly in leng th , be ing sepa ra ted f rom eacho t h e r b y 1 4 ga p s ( d e t e r m i n e d b y o p t i m i z a t i o n o f a l i g n m e n t o f th e s e q u e n ce s ) w h i c ham oun t to 109 nuc leo t ide s . Exc lud ing the se pa r ts , P = 113 /557 and Q = 179 /557 , f romwhich we ge t K = 0 .90 -+ 0 .07. This va lue is s ignif ican t ly la rger than K 3 . I t i s l ike ly , aspo in ted ou t by van Ooyen e t a l . (1979) tha t in se r t ions and de le t ions occur ra the r f re -q u e n t l y i n t h i s p a r t i n a d d i t i o n t o p o i n t m u t a t i o n s , a n d t h a t t h e y i n f l at e t h e e s t i m a t e dva lue o f the "nuc leo t ide subs t i tu t ion ra t e , " s ince a ma jo r i ty o f the se changes may a l sobe se lec t ive ly neu t ra l and sub jec t to rando m f ixa t ion by gene t i c d r i ft .

    Aknowledgments. I thank Drs. Y. Tateno, N. Takahata, and J.F. Crow for reading carefullythe first draft of this paper and for giving many useful suggestions for improving its presenta-tion.

  • 8/8/2019 1983 Kimura - Synon-Nonsynon Mut Rate

    10/10

    120 M. KimuraRe f e r e n c e sEfstratiadis A, Kafatos FC (1977) Cell 10 :5 71 -5 85Heindell HC, Liu A, Paddock GV, Studnicka GM, Salser WA (1978) Cell 15:43-54Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed),Mammalian protein metabolism, II. New York, Academic Press, p 21Kimura M (1968) Nature 217:624-626Kimura M (1977) Nature 267:275--276Kimura M (1979) Sci Am 241 (No.5, Nov. ):94--104Kimura M, Ohta T (1972) J Mol Evol 2:87--90Kimura M, Ohta T (1974) Proc Natl Acad Sci USA 71:2848--2852King JL, Jukes TH (1969) Science 164:788-798Konkel DA, Tilghman SM, Leder P (1978) Cell 15:1125-1132Marotta CA, Wilson JT, Forget BG, Weissman SM (1977) J Biol Chem 252:5040-5053Miyata T, Miyazawa S, Yasunaga T (1979) J Mol Evol 1 2: 219- 23 6Miyata T, Yasunaga T (1980) J Mol Evol 16:23--36Nishioka Y, Leder A, Leder P (1980) Proc Natl Acad Sci USA 77:28 06-2 80 9Richards RI, Shine J, Ullrich A, Wells JRE, Goodman HM (1979) Nucleic Acids Res 7:

    1137-1146Romer AS (1968) The procession of life. Weidenfeld and Nicolson, London.Vanin EF, Goldberg GI, Tucker PW, Smithies O (1980). Nature 286:222--226van Ooyen A, van den Berg J, Mantel N, Weissmann C (1979) Science 206:337-344Received July 24, 1980