Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
生命科学科3年後期�分子細胞生物�I
相同アミノ酸配列の比較解析
藤 博幸関西学院大学理工学部生命医化学科
1. イントロダクション 2. 膜タンパク質のトポロジー反転 3. ケモカイン受容体
Outline
Evolu&onaryfateandfunc&onalconsequenceGene A
duplication
Gene A'
Neofunctionalization
Subfunctionalization
Non-functionalization (pseudogenization)
Gene A
Gene A
Gene A'
Non-processed pseudogene A
Gene B
Gene A''
Function A
Functions A + B
Functions A' + A'' = A
Evolu&onaryfateandfunc&onalconsequenceGene A
duplication
Gene A'
Neofunctionalization
Subfunctionalization
Non-functionalization (pseudogenization)
Gene A
Gene A
Gene A'
Non-processed pseudogene A
Gene B
Gene A''
Function A
Functions A + B
Functions A' + A'' = A
Evolu&onaryfateandfunc&onalconsequenceGene A
duplication
Gene A'
Neofunctionalization
Subfunctionalization
Non-functionalization (pseudogenization)
Gene A
Gene A
Gene A'
Non-processed pseudogene A
Gene B
Gene A''
Function A
Functions A + B
Functions A' + A'' = A
!Which amino acid sites are related to !
the functional divergence ?!!
Classical Approach !to identify the critical sites!for functional divergence !
!- Evolution of Prostaglandin D2 Synthase - !
Nagata, A., Suzuki, Y., Igarashi, M., Eguchi, N., Toh, H., Urade, Y.,Hayaishi, O. Proc. Natl. Acad. Sci. USA 88, 4020-4024 (1991). Igarashi, M., Nagata, A., Toh, H., Urade, Y., Hayaishi, O. Proc. Natl. Acad. Sci. USA 89, 5376-5380 (1992).
COOH
OH
O
O
COOH
OH
HO
O
PGH2 PGD2
PGD synthase
PGD Synthase about 190 a.a.
Amino Acid Sequence Database
Lipocalins
Database Searching
human neutrophil gelatinase-associated lipocalin�
mou
se P
GD
syn
thas
e�
分泌蛋白質から構成されるグループで、疎水性の低分子に結合し、その輸送に携わっている。
Lipocalin Family!
secretory tissue!
target cell!Small hydrophobic !molecules!
lipocalin!
Diverse family of secretory proteins involved !in binding and transport of small hydrophobic molecules
PGD synthases Lipocalins!
enzyme!!!
vertebrates!
transporter!= non-enzyme!
!
from bacteria to eukaruyotes!
Which sites are involved in acquisition !of the catalytic activity ? !
PGD synthase is inactivated by treatment with
Cys residues may be involved in the catalytic reaction of the enzyme.
SH X
SH-Modifier�
.
C C
C C C
C C
C C C
C C C
C C
Cys
Cys
S
Site-Directed Mutagenesis Cys Ala, Ser
Mutants lost the enzyme activity.
Mutants showed the activity comparable to that of their parent enzyme.
Cys
S SH
More systematic and automatic method !to detect the critical sites!for functional divergence!
Clue for design and/or alteration !
of protein function!
Deep insight into evolution of protein function
Substrate A Substrate B
0 40058-6 0 -15 17 5 5 3 0 945404 058- 5 0 015517 5 533 3 0 9455 2 05835 -3 15 17553333 0 4640 098 5 515 1715 5 250 098 5 515 1715 5 250 098 5 515 1715 5 250 098 5 515 1715 5 25
0 098 56 515 1715 5 250 098 3 5 015 1715 5 25 20 93 983 550 16 1715 55 0- 434374 0 5 0057 17 5 5 3 7745474 5 03 -5 055 6 1715 5 1237
574 3 5 055 5517 5 65 35 7574 3 5 055 5517 5 65 35 1
574 33 5 055 5517 5 65 35 1574 3 5 055 5517 5 65 35 31574 3. 5 055 5517 5 65 35 31574 3 5 0 5 5517 5 35 5 9312374 0 33 5 055 5517 5 65 35 1374 0 33 5 055 5517 5 65 35 17 53 35 055 5517 5 65 35 -
37 3 5 55 5 17 5 65 35 7 -
amino acid sequence alignment !consisting of groups with different functions
conservation!
evolutionary rata!
amino acid composition
Evolutionary Trace
Quantitative Evolutionary Trace
Conservation
Diverge
Branch Length
Cumulative Relative Entropy
Amino Acid Composition
Evolutionary Rate Hierarchical Conservation Analysis
Relative Entropy among Paralogs
Level Entropy
1. イントロダクション 2. 膜タンパク質のトポロジー反転 3. ケモカイン受容体
Outline
Aquaporin
Fu et al.. Science 290, 481-486 (2000).Murata et al.. Nature 407, 599-605 (2000).
Dutzler et al. Nature 415, 287-294 (2002).
ClC chloride ion channel
NC
a b c a b c
・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two domains are opposite to each other.
Membrane proteins have bias in amino acid compositionat the membrane boundary. ex) positive inside rule
+� +�+�+�
+�+�
von Heijne, Nature, 341, 456-458 (1989) Nakashima & Nishikawa FEBS Lett. 303, 141-146 (1992)
The amino acid composition of cytoplasmic proteinsis different from that of the extracellular proteins
Nakashima & Nishikawa J. Mol. Biol. 238, 54-61 (1994)
membrane
extracellularregion
cytoplasmicregion
NC
a b c a b c
・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two domains are opposite to each other.
The two domains are expected to have evolved �under deferent constraints.
Extracellular Region
Cytoplasmic Region
Pore SurfaceMembrane
Different constraints are considered to have been working on the interfaces against the extracellular and cytoplasmic environments.
3. Methods
(a) Collection of homologous amino acid sequences by database searching
(b) Cleavage of the obtained sequences into the N- and the C-terminal domains
(c) Multiple alignment of each domain
(d) Profile alignment between the alignments of the N- and the C-terminal domains
(1) Multiple Alignment Sequence DB
N-termnal domains C-terminal domains
chemokine receptor a cluster of decoy or viral receptor
Site i . . .
Protein 1 Protein 2 Protein 3 Protein M
Protein 1 Protein 2 Protein 3 Protein N
Ala Arg Trp 0.05 0.03 0.01
Ala Arg Trp 0.02 0.05 0.03
Site 1 . . . Site L
Amino acid residue frequency at the alignment site i in the chmokine receptor
Amino acid residue frequency at the alignment site i in the a group of decoy or viral receptor
Evaluation of difference between two domains at each alignment site!
・Taxonomic Bias!
・Unobserved Residue Henikoff & Henikoff weight!
Pseudocounts adopted !in PSI-BLAST
Estimation of Amino Acid Composition !at Each Alignment Site
※ it is the same method used for the calculation of PSSM in ! PSI-BLAST (β = 0.1)!※ BLAST parameter λu was obtained by Newton-Laphson method ! at each calculation.!※ CRE uses Dirichlet mixture as a prior instead of pseudocount.!
chemokine receptor a cluster of decoy or viral receptor
Site i . . .
Protein 1 Protein 2 Protein 3 Protein M
Protein 1 Protein 2 Protein 3 Protein N
Ala Arg Trp 0.05 0.03 0.01
Ala Arg Trp 0.02 0.05 0.03
Kulback-Leibler information between the two groups calculated at each alignment site.
Site 1 . . . Site L
Amino acid residue frequency at the alignment site i in the chmokine receptor
Amino acid residue frequency at the alignment site i in the a group of decoy or viral receptor
Evaluation of difference between two domains at each alignment site!
p(1)+p(2)+p(3)+ . . . +p(20)=1.0
q(1)+q(2)+q(3)+ . . . +q(20)=1.0
The difference between two probability distributions can be !quantitatively evaluated with Kullback-Leibler information (KLI).!
…
…
Σ p(i) log p(i)
q(i) i=1 20
(1) Definition of KLI!
(2) Asymmetry of KLI!
Σ p(i) log p(i)
q(i) i=1 20
Σ q(i) log q(i)
p(i) i=1 20
=
(3) Modified KLI used in this study.!
Σ p(i) log p(i)
q(i) i=1 20
Σ q(i) log q(i)
p(i) i=1 20
+
Sites with top 5% KLI!
If an alignment site shows large differencebetween the N- and the C-terminal domains,
the site is considered to have been subject to different constraints between the domains.
(1) How to evaluate the difference between two domains?
(2) How large difference is considered to be significant?
Different Constraints
Difference in Amino Acid Composition or Conservation Pattern
・Taxonomic Bias������
・Unobserved residueHenikoff & Henikoff weight
Pseudocount adopted in PSI-BLAST
Two Problems for estimation of residue frequency at each alignment site.�
※ BLASTparameterλis obtained by Newton-Laphson method at each calculation.※ As the background compostion of amino acid residues, the compostion obtained from database analysis, and the one obtained from the multiple alignment under consideration were examined. However, no significant difference was observed.
・Number of Sequences used for the analysisaquaporin family 50 sequences (The alignment consists of 100 sequences)ClC chloride channel family 50 sequences (The alignment consists of 100 sequences)
・The distribution of the KL information follows Γ distribution.
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11
系列1
系列2
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12
系列1
系列2
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 *-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 *
aquaprorin family ClC chloride channel family
χ2 = 4.176χ2(6, 0.01) =16.819
χ2 = 5.625χ2(7, 0.01) =18.473
Oberved frq. Expected frq.
4. Results & Discussion
The residues of 1J4N and the amino acid compositions corresponding to the sites selected from the alignment of aquaporins A R N D C Q E G H I L K M F P S T W Y V
N (S28) 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.19 0.00 0.00 0.00C (V155) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.39 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.47
N (F58) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.14 0.00 0.00 0.00 0.00 0.53 0.00 0.00 0.00 0.28 0.00 0.00C (I174) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.07 0.06 0.00 0.00 0.00 0.00 0.22
N (H76) 0.05 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (G192) 0.20 0.00 0.00 0.00 0.05 0.00 0.00 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.37 0.00 0.00 0.00 0.00
N (V81) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.05 0.05 0.00 0.00 0.00 0.00 0.00 0.86C (R197) 0.00 0.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05
N (L85) 0.07 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.49 0.00 0.14 0.19 0.00 0.00 0.00 0.00 0.05 0.04C (S201) 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.78 0.05 0.00 0.00 0.00 0.03
N (I97) 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.22 0.00 0.00 0.06 0.23 0.00 0.00 0.00 0.00 0.22C (W212) 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.06 0.00 0.00 0.00 0.04 0.00 0.00 0.83 0.00 0.00
N (Q103) 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (P218) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.97 0.00 0.03 0.00 0.00 0.00
N (L114) 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.45 0.00 0.03 0.03 0.00 0.00 0.05 0.00 0.00 0.31C (Y229) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.15 0.03 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.04 0.73 0.00
(For the calculation of amino acid composition, Henikoff & Henikoff weight is used buy pseudocount is not introduced.)�
A R N D C Q E G H I L K M F P S T W Y VN (R147) 0.03 0.38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.52 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00C (I356) 0.06 0.00 0.00 0.00 0.00 0.03 0.04 0.02 0.00 0.27 0.23 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.06 0.27
N (E148) 0.00 0.03 0.00 0.00 0.00 0.00 0.91 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.02C (F357) 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.83 0.00 0.00 0.00 0.00 0.05 0.07
N (Q153) 0.00 0.00 0.00 0.00 0.00 0.36 0.04 0.00 0.55 0.00 0.02 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (A362) 0.24 0.00 0.05 0.00 0.00 0.00 0.00 0.02 0.00 0.03 0.14 0.00 0.03 0.13 0.00 0.07 0.00 0.00 0.06 0.23
N (R174) 0.07 0.60 0.00 0.00 0.00 0.03 0.00 0.07 0.00 0.00 0.00 0.15 0.00 0.00 0.00 0.02 0.02 0.00 0.04 0.00C (A386) 0.13 0.00 0.00 0.00 0.00 0.07 0.05 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.53 0.06 0.00 0.00 0.00 0.08
N (H175) 0.05 0.59 0.03 0.00 0.00 0.00 0.00 0.02 0.10 0.00 0.00 0.05 0.04 0.04 0.00 0.02 0.04 0.00 0.03 0.00C (G387) 0.10 0.00 0.00 0.00 0.00 0.00 0.02 0.73 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.03 0.07 0.00 0.00 0.00
N (G185) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00C (L397) 0.00 0.00 0.00 0.00 0.09 0.07 0.00 0.00 0.00 0.00 0.17 0.00 0.05 0.29 0.00 0.02 0.09 0.00 0.13 0.08
N (F190) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00C (V402) 0.11 0.00 0.00 0.00 0.03 0.00 0.00 0.07 0.00 0.02 0.00 0.00 0.13 0.00 0.00 0.06 0.30 0.00 0.00 0.28
N (gap) 0.03 0.07 0.09 0.00 0.06 0.04 0.00 0.24 0.05 0.03 0.02 0.00 0.00 0.09 0.00 0.02 0.00 0.00 0.05 0.21C (gap) 0.00 0.00 0.00 0.68 0.00 0.00 0.26 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The residues of 1J4N and the amino acid compositions corresponding to the sites selected from the alignment of ClC chloride ion channels
Aquaporin(1J4N) : Out of 249 residues, 50 residues constitute the pore surface. Out of the 16 residues correspondingto the selected alignment sites, 11 residues are present on the pore surface.M21, F24, I25, S28, I29, A32, L33, F35, H36, Q43, F58, I62, A75, H76, L77, N78, A80, V81, L85, S88, Q90, T111, L114, T118, L121, N124, S125, G127, N129, T148, L151, V152, V155, L156,159, T160, R161,R162, I174, V178, H182, G190, C191, G192, I193, N194, R197, S201, V226, R236
€
16!i!(16 − i)!i=11
16
∑ 50249$
% &
'
( ) i 249 − 50
249$
% &
'
( ) 16− i
= 0.00003393
ClC chloride channel(1KPL) : Out of 451 residues, 36 residues constitute pore surface. Out of 14 residues correspondingto the selected alignment sites, 5 residues are present on the pore surface.
V51, E54, S107, I109, P110, G146, R147, E148, G149, P150, T151, V152, A188, A189, F190, F229, N233, G234, A236, I238, N270, V273, L274, Q277, D278, F317, F348, G355, I356, F357, A358, P359, M360, L444, Y445, I448
extracellular region
cytoplasmic region
Clustering of the residues corresponding to the selected alignment sites is statistically significant.
€
14!i!(14 − i)!i= 5
14
∑ 36451$
% &
'
( ) i 451− 36
451$
% &
'
( ) 14− i
= 0.00351064
Extracellular Region
Cytoplasmic Region
Pore SurfaceMembrane
Different constraints are considered to have been working on the interfaces against the extracellular and cytoplasmic environments.
positive inside rule
+� +�+�+�
+�+�Cytosolic
environment
Extracellular environment
The efficiency to detect the residues related to positive-inside rulewas not so good.
The method was modified to increase the efficiency.
N-terminaldomain�
C-terminaldomain�
site i . . .
Protein 1Protein 2Protein 3���Protein N
Protein 1Protein 2Protein 3���Protein N
Ala Arg Trp 0.05 0.03 0.01
Ala Arg Trp 0.02 0.05 0.03
site1 . . . site L
Lys+Arg remaining
Lys+Arg remainings
0.012 0.988
0.134 0.866
Reorganization of amino acid compostion
KLI between the twodomains is calculatedat each alignment site.
Distributions of KLI
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
系列1
-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 > 3.5
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9
系列1
-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 >3.5
�aquaporin family � ClC chloride ion channel family
・The distributions do not follow the Γ distribution
・We used the kernel method to estimate the distribution non-parametrically. Quaritc kernel is used for the estimation.
χ2 =28.110χ2(2, 0.01) =9.21
χ2 =36.717χ2(3, 0.01) =11.34
N (L77) e 0.07 0.00C (gap) 0.22 0.37
N (P80) e 0.00 0.00 C (G285) c 0.27 0.09
N (A90) ? 0.00 0.00C (G295) ? 0.19 0.15
N (E113) c 0.00 0.58 C (P321) e 0.00 0.00
N (R126) c 0.46 0.26 C (F335) e 0.00 0.00
N (R147) p 0.38 0.52 C (I356) p 0.00 0.00
N (R174) c 0.60 0.15 C (A386) e 0.00 0.00
N (R175) c 0.59 0.05 C (G387) e 0.00 0.00
N (L212) c 0.21 0.18 C (P424) e 0.00 0.00
N (K216) c 0.43 0.11 C (T428) e 0.00 0.00
N (G234) e 0.00 0.00 C (gap) 0.07 0.33
R K
N (R12) c 0.51 0.20 C (Q139) e 0.03 0.02
N (F35) e 0.00 0.04 C (R162) c 0.31 0.23
N (V81) p 0.00 0.00 C (R197) p 0.95 0.00 N (L86) c 0.00 0.00 C (S202) e 0.32 0.05 N (C89) c 0.25 0.04C (T205) e 0.00 0.00
N (R95) c 0.41 0.18 C (D210) e 0.00 0.00
R K
+ + + + + +
extracellular
cytoplasmic
N:N-terminal domainC:C-terminal domain
c: cytosolic regione: extracellular regionp: pore surface
aquaporin ClC chloride ion channel
Different glasses are required to see different constrains.�
But, how to select a proper glasses ?Ordinarily, we don’t have any prior knowledge about constraints.
�
Is it possible to make flexible glasses for any constraints?
What we have learned from the study is …
1. イントロダクション 2. 膜タンパク質のトポロジー反転 3. ケモカイン受容体
Outline
ケモカイン受容体
GPCRs • Membrane proteins • Bind neurotransmitters (physiologically
active peptides, amines, nucleic acids, etc).
• Ligand binding to GPCRs causes their conformation changes.
• It leads to several signal transductions conjugated with trimeric G-proteins.
GPCRs
• About 1000 genes in human genome • Target for ~45% of clinically marketed drugs • Divided into 5 classes based on sequence
similarity (Class A-E, the other) • Atomically resolved structure in class A GPCR:
Bovine Rhodopsin
デコイ受容体
デコイ受容体 ケモカイン受容体 ウイルス性受容体
リガンド結合能 ○ ○ △ シグナリング能 × ○ ○
機能的制約の違いを配列比較で検出できるのでは? リガンド結合からシグナリングにいたるパスウェイの解明へ
chemokinereceptor�group ofdecoy or viral receptor �
Site i . . .
Protein 1Protein 2Protein 3���Protein M
Protein 1Protein 2Protein 3���Protein N
Ala Arg Trp 0.05 0.03 0.01
Ala Arg Trp 0.02 0.05 0.03
Kulback-Leibler informationbetween the two groups calculated at each alignment site.
Site 1 . . . Site L
Amino acid residue frequencyat the alignment site i in thechmokine receptor
Amino acid residue frequencyat the alignment site i in thea group of decoy or viral receptor
Evaluation of difference between two domains at each alignment site�
p(1)+p(2)+p(3)+ . . . +p(20)=1.0
q(1)+q(2)+q(3)+ . . . +q(20)=1.0
The difference between two probability distributions can be quantitatively evaluated with Kulback-Leibler information (KLI).
…
…
Σ p(i) logp(i)
q(i)i=1
20
(1) Definition of KLI�
(2) KLI representing the deviation of p from q is different from that of q from p.
Σ p(i) logp(i)
q(i)i=1
20Σ q(i) log
q(i)
p(i)i=1
20=
(3) Modified KLI is used in this study.
Σ p(i) logp(i)
q(i)i=1
20
Σ q(i) logq(i)
p(i)i=1
20+
デコイ受容体 ウイルス性受容体
Conclusion
重複遺伝子の機能差を調べることで モチーフを調べること以上の機能的情報を得ることがきでる。
分子進化に基づく生体機能の解析
Molecular Biology
Molecular Evolutionary Genetics
Statistics
Information Science
Physics
Developmental Biology
Bioinformatics
Structural Biology
Genomics
Protein Informagics
1. 膜タンパク質のトポロジー反転 市原寿子 かずさDNA研究所 大安裕美 大阪大学
2. ケモカイン受容体
大安裕美 大阪大学 根本 航 東京電機大学
共同研究者