View
233
Download
0
Embed Size (px)
Citation preview
Group Feature Extraction Based on Group Feature Extraction Based on Multiple Indexing Sequence Multiple Indexing Sequence
AlignmentAlignment
多重索引序列排比應用於群組特徵擷取
Dr. Tun-Wen Pai Dept. of Computer Science and Engineering,
National Taiwan Ocean University2006.10.30
2
• Central idea: finding short approximate patterns
• Motivation: finding ordered combinatorial features
• Objectives:– constructing evolutionary relationship– providing key features for structural
alignment
3
System ArchitectureSystem Architecture
Sequences
Motif finding and indexing
Hierarchicalclustering
Multiple IndexingSequence Alignment
Exclusive group feature extraction
Background model
Phylogenetic tree
Consensus motifs
Combinatorial features
Exclusive group features
Background information
4
Motif findingMotif finding
• short consensus motifs including tolerable characteristics
• variable-site tolerance: the tolerated sites in a pattern can be variable
• substitutable tolerance: the similar chemical properties of residues in a pattern can be substituted
5
Variable-site toleranceVariable-site tolerance
• applying the uniqueness and efficient searching of hashing techniques
• original patterns unique digital value• comparing patterns using a hash
table structure
6
Substitutable toleranceSubstitutable tolerance
• depending on chemical properties• substitution matrix Blosum62• bitwise clustering avoid
misjudging two dissimilar residues
7
Hierarchical clusteringHierarchical clustering
• revealing phylogenetic relationships• two sequences possess more consensus motif
s more similar• scoring matrix pairwise similarities
8
Exclusive Group Feature ExtractionExclusive Group Feature Extraction
• Removing common motifs occurring in other subgroups
• CP: combinatorial patterns• ECP: exclusive combinatorial patterns
9
Background Model AnalysisBackground Model Analysis
• Verifying conspicuousness• Hit ratio close to 0 unique• Hit ratio relative large insignificant
10
The combinatorial features of RNase The combinatorial features of RNase A-like superfamily extracted by MISAA-like superfamily extracted by MISA
11
The combinatorial features of RNase A-like suThe combinatorial features of RNase A-like superfamily extracted by MISA(cont.)perfamily extracted by MISA(cont.)
• The known H-K-H active sites are identified exactly
12
The combinatorial features of RNase The combinatorial features of RNase A-like superfamily extracted by ClustalWA-like superfamily extracted by ClustalW
• The first H was misaligned
13
The combinatorial features of RNase A-likThe combinatorial features of RNase A-like superfamily extracted by ClustalWe superfamily extracted by ClustalW
• The first H was misaligned
14
The combinatorial features of RNase A-like suThe combinatorial features of RNase A-like superfamily extracted by ClustalW(cont.)perfamily extracted by ClustalW(cont.)
15
The combinatorial features of RNase A-likThe combinatorial features of RNase A-like superfamily extracted by MEMEe superfamily extracted by MEME
16
The combinatorial features of RNase A-like suThe combinatorial features of RNase A-like superfamily extracted by MEME(cont.)perfamily extracted by MEME(cont.)
• The first ‘H’ was not successfully detected
17
The combinatorial features of RNase A-like supThe combinatorial features of RNase A-like superfamily extracted by Gibbs Samplererfamily extracted by Gibbs Sampler
1, 1, 1 65 qekvt CKNGQ gncyk 69 1.00 F 1E21:A1, 2, 0 107 kerhi IVACE gspyv 111 1.00 F 1E21:A1, 3, 2 116 egspy VPVHFD asved 121 1.00 F 1E21:A2, 1, 1 38 nyqrr CKNQN tfllt 42 1.00 F 1GQV:A2, 2, 0 109 anmfy IVACD nrdqr 113 1.00 F 1GQV:A2, 3, 2 127 pqypv VPVHLD rii 132 1.00 F 1GQV:A3, 1, 1 37 nyrwr CKNQN tflrt 41 1.00 F 1DYT:A3, 2, 0 108 grrfy VVACD nrdpr 112 1.00 F 1DYT:A3, 3, 2 125 prypv VPVHLD tti 130 1.00 F 1DYT:A4, 1, 1 65 ttniq CKNGK mnche 69 1.00 F 1RNF:A4, 2, 0 105 strrv VIACE gnpqv 109 1.00 F 1RNF:A4, 3, 2 114 egnpq VPVHFD g 119 1.00 F 1RNF:A5, 1, 1 59 kaice NKNGN phren 63 1.00 F 1B1I:A5, 2, 0 104 gfrnv VVACE nglpv 108 1.00 F 1B1I:A5, 3, 2 111 aceng LPVHLD qsifr 116 1.00 F 1B1I:A15 motifs
Column 1 : Sequence Number, Site NumberColumn 2 : Motif typeColumn 3 : Left End LocationColumn 4 : Motif ElementColumn 5 : Right End LocationColumn 6 : Probability of ElementColumn 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input
18
The combinatorial features of RNase A-like superfThe combinatorial features of RNase A-like superfamily extracted by Gibbs Sampler(cont.)amily extracted by Gibbs Sampler(cont.)
• The first ‘H’ was not successfully detected• The motif colored in red wrong
19
The Comparison in Average RMSD and The Comparison in Average RMSD and Aligned ResiduesAligned Residues MISA Gibbs Sampler ClustalW MEME
Average RMSD 1.039139 1.406796 1.361162 1.336590
AverageAlignedResidues
95.25 38.25 60.50 83.00
The lowest average RMSDThe highest average aligned residues
(using a straight forward structure alignment)
20
MISA for primate map1b upstream MISA for primate map1b upstream sequencessequences
Ref: D. Liu and I. Fischer, “Structural analysis of the proximal region of the microtubule-associated protein 1B promoter”, J Neurochem, 1997, 69: pp. 910-919
22
Hierarchical clustering for p450 family 1Hierarchical clustering for p450 family 1
It can be clustered into three subfamilites
26
Exclusive group features for p450 family 1Exclusive group features for p450 family 1
• cytochrome P450 subfamily 1A• ^ E*L*A ^ *PK*L* ^ *W*ARR*LA* ^ L**FS ^ *SC*LEEH*S*E ^ G*F*P ^ *V*SV*NVI ^ *DF*P*LR*LP* ^ **EHY**F ^ **DIT**L ^ **ELD**
^ R*P*LS• cytochrome P450 subfamily 1B• ^ F*R*A ^ WK**R ^ R*F*T ^ **RYP**Q*R*Q ^ DQ**LP ^ G**NK*L* ^ **HQC** ^ **LLD**• cytochrome P450 subfamily 1C• ^ SI**EWSG**QPAL*A*F ^ **EAC*W* ^ F**YSKQW**HRK*AQS**RAFS*AN*QT* ^ EA**LV**FL ^ F*P*HE*T ^ N**FF**V**KV**HR ^
W**LL ^ *AK*RG*
cytochrome P450 subfamily 1A
cytochrome P450 subfamily 1B
cytochrome P450 subfamily 1C