相同アミノ酸配列の比較解析sci-tech.ksc.kwansei.ac.jp/~tohhiro/MCB-2-16.pdf・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two

生命科学科3年後期�分子細胞生物�I

相同アミノ酸配列の比較解析

藤　博幸関西学院大学理工学部生命医化学科

1.  イントロダクション 2.  膜タンパク質のトポロジー反転 3.  ケモカイン受容体

Outline

Evolu&onaryfateandfunc&onalconsequenceGene A

duplication

Gene A'

Neofunctionalization

Subfunctionalization

Non-functionalization (pseudogenization)

Gene A

Gene A

Gene A'

Non-processed pseudogene A

Gene B

Gene A''

Function A

Functions A + B

Functions A' + A'' = A


duplication

Gene A'




Gene A

Gene A

Gene A'


Gene B

Gene A''

Function A

Functions A + B



duplication

Gene A'




Gene A

Gene A

Gene A'


Gene B

Gene A''

Function A

Functions A + B


!Which amino acid sites are related to !

the functional divergence ?!!

Classical Approach !to identify the critical sites!for functional divergence !

!- Evolution of Prostaglandin D2 Synthase - !

Nagata, A., Suzuki, Y., Igarashi, M., Eguchi, N., Toh, H., Urade, Y.,Hayaishi, O. Proc. Natl. Acad. Sci. USA 88, 4020-4024 (1991). Igarashi, M., Nagata, A., Toh, H., Urade, Y., Hayaishi, O. Proc. Natl. Acad. Sci. USA 89, 5376-5380 (1992).

COOH

OH

O

O

COOH

OH

HO

O

PGH2 PGD2

PGD synthase

PGD Synthase about 190 a.a.

Amino Acid Sequence Database

Lipocalins

Database Searching

human neutrophil gelatinase-associated lipocalin�

mou

se P

GD

syn

thas

e�

分泌蛋白質から構成されるグループで、疎水性の低分子に結合し、その輸送に携わっている。

Lipocalin Family!

secretory tissue!

target cell!Small hydrophobic !molecules!

lipocalin!

Diverse family of secretory proteins involved !in binding and transport of small hydrophobic molecules

PGD synthases Lipocalins!

enzyme!!!

vertebrates!

transporter!= non-enzyme!

!

from bacteria to eukaruyotes!

Which sites are involved in acquisition !of the catalytic activity ? !

PGD synthase is inactivated by treatment with

Cys residues may be involved in the catalytic reaction of the enzyme.

SH X

SH-Modifier�

.

C C

C C C

C C

C C C

C C C

C C

Cys

Cys

S

Site-Directed Mutagenesis Cys Ala, Ser

Mutants lost the enzyme activity.

Mutants showed the activity comparable to that of their parent enzyme.

Cys

S SH

More systematic and automatic method !to detect the critical sites!for functional divergence!

Clue for design and/or alteration !

of protein function!

Deep insight into evolution of protein function

Substrate A Substrate B

0 40058-6 0 -15 17 5 5 3 0 945404 058- 5 0 015517 5 533 3 0 9455 2 05835 -3 15 17553333 0 4640 098 5 515 1715 5 250 098 5 515 1715 5 250 098 5 515 1715 5 250 098 5 515 1715 5 25

0 098 56 515 1715 5 250 098 3 5 015 1715 5 25 20 93 983 550 16 1715 55 0- 434374 0 5 0057 17 5 5 3 7745474 5 03 -5 055 6 1715 5 1237

574 3 5 055 5517 5 65 35 7574 3 5 055 5517 5 65 35 1

574 33 5 055 5517 5 65 35 1574 3 5 055 5517 5 65 35 31574 3. 5 055 5517 5 65 35 31574 3 5 0 5 5517 5 35 5 9312374 0 33 5 055 5517 5 65 35 1374 0 33 5 055 5517 5 65 35 17 53 35 055 5517 5 65 35 -

37 3 5 55 5 17 5 65 35 7 -

amino acid sequence alignment !consisting of groups with different functions

conservation!

evolutionary rata!

amino acid composition

Evolutionary Trace

Quantitative Evolutionary Trace

Conservation

Diverge

Branch Length

Cumulative Relative Entropy

Amino Acid Composition

Evolutionary Rate Hierarchical Conservation Analysis

Relative Entropy among Paralogs

Level Entropy


Outline

Aquaporin

Fu et al.. Science 290, 481-486 (2000).Murata et al.. Nature 407, 599-605 (2000).

Dutzler et al. Nature 415, 287-294 (2002).

ClC chloride ion channel

NC

a b c a b c

・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two domains are opposite to each other.

Membrane proteins have bias in amino acid compositionat the membrane boundary. ex) positive inside rule

＋� ＋�＋�＋�

＋�＋�

von Heijne, Nature, 341, 456-458 (1989) Nakashima & Nishikawa FEBS Lett. 303, 141-146 (1992)

The amino acid composition of cytoplasmic proteinsis different from that of the extracellular proteins

Nakashima & Nishikawa J. Mol. Biol. 238, 54-61 (1994)

membrane

extracellularregion

cytoplasmicregion

NC

a b c a b c

・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two domains are opposite to each other.

The two domains are expected to have evolved �under deferent constraints.

Extracellular Region

Cytoplasmic Region

Pore SurfaceMembrane

Different constraints are considered to have been working on the interfaces against the extracellular and cytoplasmic environments.

3. Methods

(a) Collection of homologous amino acid sequences by database searching

(b) Cleavage of the obtained sequences into the N- and the C-terminal domains

(c) Multiple alignment of each domain

(d) Profile alignment between the alignments of the N- and the C-terminal domains

(1) Multiple Alignment Sequence DB

N-termnal domains C-terminal domains

chemokine receptor a cluster of decoy or viral receptor

Site i . . .

Protein １ Protein ２ Protein ３　　　 Protein M

Protein １ Protein ２ Protein ３　　　 Protein N

Ala Arg Trp 0.05 0.03 0.01

Ala Arg Trp 0.02 0.05 0.03

Site 1 . . . Site L

Amino acid residue frequency at the alignment site i in the chmokine receptor

Amino acid residue frequency at the alignment site i in the a group of decoy or viral receptor

Evaluation of difference between two domains at each alignment site!

・Taxonomic Bias!　　　　　　

・Unobserved Residue Henikoff & Henikoff weight!

Pseudocounts adopted !in PSI-BLAST

Estimation of Amino Acid Composition !at Each Alignment Site

※ it is the same method used for the calculation of PSSM in ! PSI-BLAST (β = 0.1)!※ BLAST parameter λu was obtained by Newton-Laphson method ! at each calculation.!※ CRE uses Dirichlet mixture as a prior instead of pseudocount.!

chemokine receptor a cluster of decoy or viral receptor

Site i . . .

Protein １ Protein ２ Protein ３　　　 Protein M

Protein １ Protein ２ Protein ３　　　 Protein N

Ala Arg Trp 0.05 0.03 0.01

Ala Arg Trp 0.02 0.05 0.03

Kulback-Leibler information between the two groups calculated at each alignment site.

Site 1 . . . Site L

Amino acid residue frequency at the alignment site i in the chmokine receptor

Amino acid residue frequency at the alignment site i in the a group of decoy or viral receptor

Evaluation of difference between two domains at each alignment site!

p(1)+p(2)+p(3)+ . . . +p(20)=1.0

q(1)+q(2)+q(3)+ . . . +q(20)=1.0

The difference between two probability distributions can be !quantitatively evaluated with Kullback-Leibler information (KLI).!

…

…

Σ p(i) log p(i)

q(i) i=1 20

(１) Definition of KLI!

(2) Asymmetry of KLI!

Σ p(i) log p(i)

q(i) i=1 20

Σ q(i) log q(i)

p(i) i=1 20

=

(3) Modified KLI used in this study.!

Σ p(i) log p(i)

q(i) i=1 20

Σ q(i) log q(i)

p(i) i=1 20

+

Sites with top 5% KLI!

If an alignment site shows large differencebetween the N- and the C-terminal domains,

the site is considered to have been subject to different constraints between the domains.

(1) How to evaluate the difference between two domains?

(2) How large difference is considered to be significant?

Different Constraints

Difference in Amino Acid Composition or Conservation Pattern

・Taxonomic Bias��

・Unobserved residueHenikoff & Henikoff weight

Pseudocount adopted in PSI-BLAST

Two Problems for estimation of residue frequency at each alignment site.�

※ BLASTparameterλis obtained by Newton-Laphson method at each calculation.※ As the background compostion of amino acid residues, the compostion obtained from database analysis, and the one obtained from the multiple alignment under consideration were examined. However, no significant difference was observed.

・Number of Sequences used for the analysisaquaporin family 50 sequences (The alignment consists of 100 sequences)ClC chloride channel family 50 sequences (The alignment consists of 100 sequences）

・The distribution of the KL information follows Γ distribution.

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11

系列1

系列2

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12

系列1

系列2

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 *-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 *

aquaprorin family ClC chloride channel family

χ2 = 4.176χ2(6, 0.01) =16.819

χ2 = 5.625χ2(7, 0.01) =18.473

Oberved frq. Expected frq.

4. Results & Discussion

The residues of 1J4N and the amino acid compositions corresponding to the sites selected from the alignment of aquaporins A R N D C Q E G H I L K M F P S T W Y V

N (S28) 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.19 0.00 0.00 0.00C (V155) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.39 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.47

N (F58) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.14 0.00 0.00 0.00 0.00 0.53 0.00 0.00 0.00 0.28 0.00 0.00C (I174) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.07 0.06 0.00 0.00 0.00 0.00 0.22

N (H76) 0.05 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (G192) 0.20 0.00 0.00 0.00 0.05 0.00 0.00 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.37 0.00 0.00 0.00 0.00

N (V81) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.05 0.05 0.00 0.00 0.00 0.00 0.00 0.86C (R197) 0.00 0.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05

N (L85) 0.07 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.49 0.00 0.14 0.19 0.00 0.00 0.00 0.00 0.05 0.04C (S201) 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.78 0.05 0.00 0.00 0.00 0.03

N (I97) 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.22 0.00 0.00 0.06 0.23 0.00 0.00 0.00 0.00 0.22C (W212) 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.06 0.00 0.00 0.00 0.04 0.00 0.00 0.83 0.00 0.00

N (Q103) 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (P218) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.97 0.00 0.03 0.00 0.00 0.00

N (L114) 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.45 0.00 0.03 0.03 0.00 0.00 0.05 0.00 0.00 0.31C (Y229) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.15 0.03 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.04 0.73 0.00

(For the calculation of amino acid composition, Henikoff & Henikoff weight is used buy pseudocount is not introduced.）�

A R N D C Q E G H I L K M F P S T W Y VN (R147) 0.03 0.38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.52 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00C (I356) 0.06 0.00 0.00 0.00 0.00 0.03 0.04 0.02 0.00 0.27 0.23 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.06 0.27

N (E148) 0.00 0.03 0.00 0.00 0.00 0.00 0.91 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.02C (F357) 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.83 0.00 0.00 0.00 0.00 0.05 0.07

N (Q153) 0.00 0.00 0.00 0.00 0.00 0.36 0.04 0.00 0.55 0.00 0.02 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (A362) 0.24 0.00 0.05 0.00 0.00 0.00 0.00 0.02 0.00 0.03 0.14 0.00 0.03 0.13 0.00 0.07 0.00 0.00 0.06 0.23

N (R174) 0.07 0.60 0.00 0.00 0.00 0.03 0.00 0.07 0.00 0.00 0.00 0.15 0.00 0.00 0.00 0.02 0.02 0.00 0.04 0.00C (A386) 0.13 0.00 0.00 0.00 0.00 0.07 0.05 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.53 0.06 0.00 0.00 0.00 0.08

N (H175) 0.05 0.59 0.03 0.00 0.00 0.00 0.00 0.02 0.10 0.00 0.00 0.05 0.04 0.04 0.00 0.02 0.04 0.00 0.03 0.00C (G387) 0.10 0.00 0.00 0.00 0.00 0.00 0.02 0.73 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.03 0.07 0.00 0.00 0.00

N (G185) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (L397) 0.00 0.00 0.00 0.00 0.09 0.07 0.00 0.00 0.00 0.00 0.17 0.00 0.05 0.29 0.00 0.02 0.09 0.00 0.13 0.08

N (F190) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00C (V402) 0.11 0.00 0.00 0.00 0.03 0.00 0.00 0.07 0.00 0.02 0.00 0.00 0.13 0.00 0.00 0.06 0.30 0.00 0.00 0.28

N (gap) 0.03 0.07 0.09 0.00 0.06 0.04 0.00 0.24 0.05 0.03 0.02 0.00 0.00 0.09 0.00 0.02 0.00 0.00 0.05 0.21C (gap) 0.00 0.00 0.00 0.68 0.00 0.00 0.26 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

The residues of 1J4N and the amino acid compositions corresponding to the sites selected from the alignment of ClC chloride ion channels

Aquaporin(1J4N) : Out of 249 residues, 50 residues constitute the pore surface. Out of the 16 residues correspondingto the selected alignment sites, 11 residues are present on the pore surface.M21, F24, I25, S28, I29, A32, L33, F35, H36, Q43, F58, I62, A75, H76, L77, N78, A80, V81, L85, S88, Q90, T111, L114, T118, L121, N124, S125, G127, N129, T148, L151, V152, V155, L156,159, T160, R161,R162, I174, V178, H182, G190, C191, G192, I193, N194, R197, S201, V226, R236

€

16!i!(16 − i)!i=11

16

∑ 50249$

% &

'

( ) i 249 − 50

249$

% &

'

( ) 16− i

= 0.00003393

ClC chloride channel(1KPL) : Out of 451 residues, 36 residues constitute pore surface. Out of 14 residues correspondingto the selected alignment sites, 5 residues are present on the pore surface.

V51, E54, S107, I109, P110, G146, R147, E148, G149, P150, T151, V152, A188, A189, F190, F229, N233, G234, A236, I238, N270, V273, L274, Q277, D278, F317, F348, G355, I356, F357, A358, P359, M360, L444, Y445, I448

extracellular region

cytoplasmic region

Clustering of the residues corresponding to the selected alignment sites is statistically significant.

€

14!i!(14 − i)!i= 5

14

∑ 36451$

% &

'

( ) i 451− 36

451$

% &

'

( ) 14− i

= 0.00351064

Extracellular Region

Cytoplasmic Region

Pore SurfaceMembrane

Different constraints are considered to have been working on the interfaces against the extracellular and cytoplasmic environments.

positive inside rule

＋� ＋�＋�＋�

＋�＋�Cytosolic

environment

Extracellular environment

The efficiency to detect the residues related to positive-inside rulewas not so good.

The method was modified to increase the efficiency.

N-terminaldomain�

C-terminaldomain�

site i . . .

Protein 1Protein 2Protein 3��Protein N


Ala Arg Trp 0.05 0.03 0.01

Ala Arg Trp 0.02 0.05 0.03

site1 . . . site L

Lys+Arg remaining

Lys+Arg remainings

0.012 0.988

0.134 0.866

Reorganization of amino acid compostion

KLI between the twodomains is calculatedat each alignment site.

Distributions of KLI

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

系列1

-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 > 3.5

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6 7 8 9

系列1

-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 >3.5

�aquaporin family � ClC chloride ion channel family

・The distributions do not follow the Γ distribution

・We used the kernel method to estimate the distribution non-parametrically. Quaritc kernel is used for the estimation.

χ2 =28.110χ2(2, 0.01) =9.21

χ2 =36.717χ2(3, 0.01) =11.34

N (L77) e 0.07 0.00C (gap) 0.22 0.37

N (P80) e 0.00 0.00 C (G285) c 0.27 0.09

N (A90) ? 0.00 0.00C (G295) ? 0.19 0.15

N (E113) c 0.00 0.58 C (P321) e 0.00 0.00

N (R126) c 0.46 0.26 C (F335) e 0.00 0.00

N (R147) p 0.38 0.52 C (I356) p 0.00 0.00

N (R174) c 0.60 0.15 C (A386) e 0.00 0.00

N (R175) c 0.59 0.05 C (G387) e 0.00 0.00

N (L212) c 0.21 0.18 C (P424) e 0.00 0.00

N (K216) c 0.43 0.11 C (T428) e 0.00 0.00

N (G234) e 0.00 0.00 C (gap) 0.07 0.33

R K

N (R12) c 0.51 0.20 C (Q139) e 0.03 0.02

N (F35) e 0.00 0.04 C (R162) c 0.31 0.23

N (V81) p 0.00 0.00 C (R197) p 0.95 0.00 N (L86) c 0.00 0.00 C (S202) e 0.32 0.05 N (C89) c 0.25 0.04C (T205) e 0.00 0.00

N (R95) c 0.41 0.18 C (D210) e 0.00 0.00

R K

+ + + + + +

extracellular

cytoplasmic

N:N-terminal domainC:C-terminal domain

c: cytosolic regione: extracellular regionp: pore surface

aquaporin ClC chloride ion channel

Different glasses are required to see different constrains.�

But, how to select a proper glasses ?Ordinarily, we don’t have any prior knowledge about constraints.

�

Is it possible to make flexible glasses for any constraints?

What we have learned from the study is …


Outline

ケモカイン受容体

GPCRs •  Membrane proteins •  Bind neurotransmitters (physiologically

active peptides, amines, nucleic acids, etc).

•  Ligand binding to GPCRs causes their conformation changes.

•  It leads to several signal transductions conjugated with trimeric G-proteins.

GPCRs

•  About 1000 genes in human genome •  Target for ~45% of clinically marketed drugs •  Divided into 5 classes based on sequence

similarity (Class A-E, the other) •  Atomically resolved structure in class A GPCR:

Bovine Rhodopsin

デコイ受容体

デコイ受容体ケモカイン受容体ウイルス性受容体

リガンド結合能　　　　○　　　　　　　　　　　　○　　　　　　　　　　　　　△ シグナリング能　　　　×　　　　　　　　　　　　○ ○

機能的制約の違いを配列比較で検出できるのでは？リガンド結合からシグナリングにいたるパスウェイの解明へ

chemokinereceptor�group ofdecoy or viral receptor �

Site i . . .

Protein 1Protein 2Protein 3��Protein M


Ala Arg Trp 0.05 0.03 0.01

Ala Arg Trp 0.02 0.05 0.03

Kulback-Leibler informationbetween the two groups calculated at each alignment site.

Site 1 . . . Site L

Amino acid residue frequencyat the alignment site i in thechmokine receptor

Amino acid residue frequencyat the alignment site i in thea group of decoy or viral receptor

Evaluation of difference between two domains at each alignment site�

p(1)+p(2)+p(3)+ . . . +p(20)=1.0

q(1)+q(2)+q(3)+ . . . +q(20)=1.0

The difference between two probability distributions can be quantitatively evaluated with Kulback-Leibler information (KLI).

…

…

Σ p(i) logp(i)

q(i)i=1

20

(1) Definition of KLI�

(2) KLI representing the deviation of p from q is different from that of q from p.

Σ p(i) logp(i)

q(i)i=1

20Σ q(i) log

q(i)

p(i)i=1

20=

(3) Modified KLI is used in this study.

Σ p(i) logp(i)

q(i)i=1

20

Σ q(i) logq(i)

p(i)i=1

20+

デコイ受容体ウイルス性受容体

Conclusion

重複遺伝子の機能差を調べることでモチーフを調べること以上の機能的情報を得ることがきでる。

分子進化に基づく生体機能の解析

Molecular Biology

Molecular Evolutionary Genetics

Statistics

Information Science

Physics

Developmental Biology

Bioinformatics

Structural Biology

Genomics

Protein Informagics

1. 膜タンパク質のトポロジー反転市原寿子　かずさDNA研究所大安裕美　大阪大学

2.  ケモカイン受容体

大安裕美　大阪大学根本　航　東京電機大学

共同研究者

Documents

相同アミノ酸配列の比較解析sci-tech.ksc.kwansei.ac.jp/~tohhiro/MCB-2-16.pdf・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two