Upload
tahir-mahmood
View
215
Download
0
Embed Size (px)
Citation preview
8/3/2019 BINSFinalt
1/43
Muhammad Rehan
1358-MSCS-08
Department of Computer Science
GC University Lahore
MS(CS) Final Thesis
8/3/2019 BINSFinalt
2/43
Background
Literature Review
Problem Statement
Hypothesis
Methodology
Result Future Work
References
8/3/2019 BINSFinalt
3/43
Bioinformatics
Any application of computation in biology
including data management, developing
algorithm and data mining. Bioinformatics is the field of science in which
computer science, information technology,
statistics and various branches of biology merge
to from single discipline.
8/3/2019 BINSFinalt
4/43
What are Proteins
ProteinsAntibody Hormones
Structural
Support &
Transportation
Enzymes
8/3/2019 BINSFinalt
5/43
What are Proteins made of
C
H
COOHH2N
R
Carboxyl groupAmino group
Alpha carbon
General formula of Amino Acid
R
R
H
CH3
8/3/2019 BINSFinalt
6/43
Sr.
No
Amino Acid Single Letter
Code
Three
Letter
Code
1 Alanine A Ala
2 Arginine R Arg
3 Serine S Ser
4 Threonine T Thr
5 Tyrosine Y Tyr
6 Glycine G Gly
7 Histidine H His. . . .
. . . .
20 Ieucine L Ieu
8/3/2019 BINSFinalt
7/43
Experimental Approach
vivo
vitro
silico
8/3/2019 BINSFinalt
8/43
Post Translational Modification(PTMs)
Protein modification is very important for
biological activity and perform the desire task.
This modification is done by the addition ofphosphate, glycosyl or other groups to certain
amino acids.
8/3/2019 BINSFinalt
9/43
PTM Target Amino Acid Description
Phosphorylation S, T, Y, H Addition of a phosphate group, to S,
T, Y, H
Glycosylation S, T, N Addition of a glycosyl group to either
S, T, N.
Sulfation Y Addition of a sulfate group to a Y
Acetylation Addition of an acetyl group, usually at
the N-terminus of the protein
Methylation R Addition of a methyl group, usually at
or R residues
List of PTMs Types
8/3/2019 BINSFinalt
10/43
Database Name Description Reference
PROSITE Database of consensus patterns for
various PTMs
Sigrist et al., (2002)
HPRD Human protein reference database of
disease-related proteins and their
PTMs
Peri et al., (2003)
RESID Database with collection of
annotations and structures for PTMs
Garabelli (2003)
PhosphoBase
ELM
Database of validated
phosphorylation sites.
Diella et al., (2004)
List of PTMs Databases
8/3/2019 BINSFinalt
11/43
Sr.No Statement References
1 Proteins often perform diverse and multiple
functions.
(Jeffery ,1999)
2 The diversity of proteome is higher complex then
genome, in human genome the number of genes
are 22,000 to 25,000 but in contrast number of
proteins more than 10,0000
(Nicolle H et al ,2007)
3 To identify the proteins functions and events
mainly rely on their particular 3-D structure as well
as the occurrence of targeted amino acid
modification.
(Attwood ,2000)
4 PTMs regulate various functions of proteins byeffecting verificational changes such as enzymes
activation.
(Konstantinopoulos et al,2007)
5 Phosphorylated serine, theronine and tyrosine
residues using MS is not easy in VIVO.
(Mann et al ,2002)
8/3/2019 BINSFinalt
12/43
Sr.No Statement References
6 Many methods have been developed within the
field of proteomics but these methods are still in
early stages.
(Blom ,2004)
7 Application of machine learning and statistics inbioinformatics have always played a core role in
understanding proteomics and to analysis of
PTMs.
(Qazi et al,2006)
8 ANN is one such Approach that has been
extensively used in biological sequences analysis.
(Wu, C.H ,1997)
9 Mostly cellular proteins are regulated by reversible
phosphorylation and at least 30% of protein have
such alteration.
(Ficarro et al ,2002)
8/3/2019 BINSFinalt
13/43
DISPHOS
PredPhosPho
GPS
PPSP
KinasePhos 1.0, KinasePhos 2.0
NetPhos, NetPhosK Neural-genetic
8/3/2019 BINSFinalt
14/43
Tools DISPHOS NetPhos Neural-
genetic
BPNN
Method Logistic
regression
ANN ANN ANN
Serine 76% 69% 75% 72%
Threonine 81% 72% 82% 77%
Tyrosine 83% 61% 79% 74%
8/3/2019 BINSFinalt
15/43
Tools KinasePhos
2.0
KinasePhos
1.0
PredPhospho GPS PPSP
Method SVM HMM SVM GPS BDT
Kinase
PKA
Sn=92%
Sp=89%
Acc=90%
Sn=91%
Sp=86%
Acc=85%
Sn=88%
Sp=%91
Ac=90%
Sn=91%
Sp=89%
Acc=90%
Sn=90%
Sp=92%
Acc=91%
Kinase
PKC
Sn=84%
Sp=86%
Acc=85%
Sn=80%
Sp=87%
Acc=83%
Sn=79%
Sp=86%
Ac=83%
Sn=82%
Sp=83%
Acc=82%
Sn=82%
Sp=86%
Acc=84%
8/3/2019 BINSFinalt
16/43
Develop a new method BINS to evolve new
classification model by learning amino acid
sequences data using machine learning
based method artificial neural network. ThisBINS improve the prediction specificity,
efficiency and accuracy for machine learning
simulator called GEARS (Genetic
EvaluationofClassifier by Learning Residue Rules and
Sequences).
8/3/2019 BINSFinalt
17/43
BINS classification method will reduce thefalse negative and positive prediction.
BINS method show highly accuracy
prediction about PTMs which will affect thespecific site and kinases that act at each site,disclose the important biologicallyinformation from noisy data.
BINS method can gives the best result ascompare to the existing PTMs predictionmethods.
8/3/2019 BINSFinalt
18/43
Empirical research methodology with
Exploratory Development Life Cycle will be used
for the development of BINS Model.
8/3/2019 BINSFinalt
19/43
BINS consists mainly on three parts BINS Data Preparation Module
BINS Bootstrapping Module
BINS ANN Module
8/3/2019 BINSFinalt
20/43
8/3/2019 BINSFinalt
21/43
BINS Data Preparation Module
Create Protein grouped by target classes
Create Protein Database grouped by non
modified target classes
Peptide Generator
Removed of duplicate instance
PTMs Database
BINS ANN Module
Topology and Network
Configuration
Validation
[SN] [SP] [Acc] [MCC]
Training
[SN] [SP] [Acc] [MCC]
BINS Bootstrapping Module
Peptide dataset grouped by
non modified classes
Merge the Sparse Encoding dataset grouped bymodified and non modified target classes
Sparse Encoding
Peptide dataset grouped by
non modified classes
Training and validation Dataset Generator
Validation dataset
Generator
Training dataset
Generator
8/3/2019 BINSFinalt
22/43
BINS Data Preparation Module BINS Database Inconsistency Analyzing Utility
BINS Balance Inverted Site Application
BINS Peptide Extraction Application
8/3/2019 BINSFinalt
23/43
BINS Data Preparation Module
PID Sequences Position Amino Acid Modification
O08539 ASTSMNSY
TLKSYA.
4 S S
O14543 MVTHSKFP
AAGS.
3 T T
O14746 MPRAPRC
RAVSTA
11 S S
O14920 MSWYPSL
TQTC.
4 Y Y
O15117 ELSFKQGE
QIYTA.
3 S S
8/3/2019 BINSFinalt
24/43
BINS Data Preparation Module
Target
sites
No.of
Proteins
No.of
positive
Peptide
No.of
Negative
Peptide
No.of
Balance
negative
Peptide
No. of
merge pos
and
balanceneg pep
S 5431 14467 326396 14837 29304
T 1940 2907 35795 2983 5890
Y 1156 2208 16273 2325 4533
8/3/2019 BINSFinalt
25/43
BINS Data Preparation Module
BINS Database Inconsistency Analyzing Utility
PID Sequences Position Amino Acid Modification Length
O08539 ASTSMNSY
TLKSYA.
4 S S 350
O14543 MVTHSKFP
AAGS.
3 T T 1030
O14746 MPRAPRC
RAVSTA
11 S S 1250
O14920 MSWYPSLTQTC.
4 Y Y 735
O15117 ELSFKQGE
QIYTA.
3 S S 952
8/3/2019 BINSFinalt
26/43
BINS Data Preparation Module
BINS Invert Application
PID Sequences Position Amino Acid Modification Length
O08539 ASTSMNSY
TLKSYA.
2 S S 350
O08539 ASTSMNSY
TLKSYA.
7 S S 350
O08539 ASTSMNSY
TLKSYA.
12 S S 350
8/3/2019 BINSFinalt
27/43
BINS Data Preparation Module
BINS Peptide Extraction Application
Peptide
ID
Extend
ed
Seque
nces
Class P-10 P-9 P-8 P0 P9 P10
O08539
-2
-,-,-
,A,S,T,
S,M,N
S,Y,T,L
K,S
0.1 - - - A S T K S
O08539-7
-,-,-,A,S,T,
S,M,N,
S,Y,T,L
,K,S,Y
A,-,-,
0.1 - - - N S Y - -
8/3/2019 BINSFinalt
28/43
BINS Bootstrapping Module BINS Training Dataset Encoding Manager
BINS Data Table Merging Utility
BINS Boot Strapping Application
8/3/2019 BINSFinalt
29/43
BINS Bootstrapping Module
Sparse Encoding Scheme
Amino Acid Coding Scheme
A 10000000000000000000
C 01000000000000000000
D 00100000000000000000
E 00010000000000000000
F 00001000000000000000
G 00000100000000000000
H 00000010000000000000
I 00000001000000000000
.
.
.
.
- 00000000000000000000
8/3/2019 BINSFinalt
30/43
BINS Bootstrapping Module
BINS Training Dataset Encoding Manager
Peptide
ID
Extend
ed
Seque
nces
Class P-10-
1
P-
10-2
P-
10-3
P-
10-
8
P-
10-9
P-
10-
10
O08539
-2
-,-,-
,A,S,T,
S,M,N
S,Y,T,L
K,S
0.1 0 0 0 0 0 0 0 0
O08539
-7
-,-,-
,A,S,T,
S,M,N,
S,Y,T,L
,K,S,Y
A,-,-,
0.1 0 0 0 0 0 0 0 0
8/3/2019 BINSFinalt
31/43
BINS Bootstrapping Module
BINS DataTable Merging Utility
Peptide
ID
Extend
ed
Seque
nces
Class P-10-
1
P-
10-2
P-
10-3
P-
10-
8
P-
10-9
P-
10-
10
O08539
-2
-,-,-
,A,S,T,
S,M,N
S,Y,T,L
K,S
0.1 0 0 0 0 0 0 0 0
O08539
-7
-,-,-
,A,S,T,
S,M,N,
S,Y,T,L
,K,S,Y
A,-,-,
0.9 0 1 0 0 0 0 0 0
8/3/2019 BINSFinalt
32/43
BINS Bootstrapping Module
BINS Boot Strapping Application
8/3/2019 BINSFinalt
33/43
BINS ANN Module
8/3/2019 BINSFinalt
34/43
Evaluation Strategy
Sn=TP/(TP+FN)
Sp=TN/(TN+FP)
Acc=(Sn+Sp)/2
MCC=
8/3/2019 BINSFinalt
35/43
Evaluation Strategy
PID Sequence Position Target Clarify
O3265 SASNSTSYTS 3 Mod TP
O3265 SASNSTSYTS 10 Mod FN
O3265 SASNSTSYTS 1 Non-mod TN
O3265 SASNSTSYTS 5 Non-mod FP
O3265 SASNSTSYTS 7 Non-mod TN
8/3/2019 BINSFinalt
36/43
BINS Serine Result
Sr.
No
Training Validation
Ac Sn Sp MCC Ac Sn Sp MCC
10.965 1 0.931 0.932 0.497 0 1 None
2 0.984 0.982 0.987 0.969 0.805 0.612 0.999 0.662
3 0.996 0.996 0.996 0.992 0.807 0.619 0.995 0.663
4 0.995 0.995 0.996 0.991 0.807 0.622 0.995 0.664
5 0.998 0.998 0.998 0.996 0.807 0.616 0.999 0.665
6 0.99
6 0.99
6 0.99
6 0.99
2 0.809
0.628 0.991
0.663
8/3/2019 BINSFinalt
37/43
BINS Threonine ResultSr.
No
Training Validation
Ac Sn Sp MCC Ac Sn Sp MCC
10.972 0.963 0.981 0.946 0.826 0.688 0.965 0.680
20.987 0.986 0.989 0.975 0.834 0.737 0.932 0.683
3 0.987 0.986 0.988 0.974 0.825 0.750 0.901 0.658
4 0.987 0.986 0.987 0.974 0.827 0.771 0.884 0.659
5 0.987 0.986 0.987 0.974 0.824 0.774 0.875 0.653
6 0.986 0.986 0.986 0.972 0.822 0.772 0.872 0.648
7 0.988 0.990 0.987 0.977 0.825 0.761 0.890 0.657
8 0.989 0.989 0.990 0.979 0.823 0.768 0.880 0.652
90.990 0.990 0.990 0.980 0.822 0.770 0.874 0.647
10 0.990 0.989 0.991 0.980 0.821 0.770 0.871 0.645
8/3/2019 BINSFinalt
38/43
BINS Tyrosine ResultSr.
No
Training Validation
Ac Sn Sp MCC Ac Sn Sp MCC
10.966 0.952 0.979 0.933 0.846 0.735 0.951 0.705
20.972 0.961 0.983 0.945 0.843 0.741 0.939 0.697
3 0.977 0.975 0.979 0.955 0.836 0.778 0.891 0.675
4 0.976 0.975 0.977 0.953 0.837 0.780 0.890 0.676
5 0.975 0.973 0.976 0.950 0.831 0.779 0.881 0.665
6 0.973 0.970 0.976 0.947 0.829 0.779 0.877 0.661
7 0.974 0.971 0.977 0.948 0.828 0.778 0.876 0.659
8 0.974 0.973 0.975 0.948 0.828 0.779 0.875 0.659
90.974 0.974 0.975 0.949 0.826 0.778 0.872 0.654
10 0.977 0.974 0.980 0.955 0.825 0.768 0.879 0.652
8/3/2019 BINSFinalt
39/43
BINS Comparison with other Method
Algorithm Y T S
Acc Sn Sp Acc Sn Sp Acc Sn Sp
BINS 85% 74% 95% 83% 74% 93% 81% 63% 99%
NetPhos 69% 70% 68% 72% 66% 77% 69% 81% 57%
DISPHOS 83% NA NA 81% NA NA 76% NA NA
BPNN 75% 75% 75% 78% 78% 77% 72% 72% 72%
Neural-genetic 79% 81% 78% 83% 81% 84% 75% 76% 74%
8/3/2019 BINSFinalt
40/43
BINS is a developed as Desktop Application, technically, there is no online WWW supportavailable in the current version, nevertheless, increasing opportunities over the internet
urges the need to develop an online version of this application for its wider scope and
availability to multiple clients in different regions of the world. This effort would not only
help us to enhance the embedded capability of BINS for efficient PTMs but also could be
major resource for multi-nation research collaborations.
BINS are the sub module of GEARS so in next version learn and optimize the parameters
and weights of ANN with genetic algorithm. In next, BINS integrate with other GEARS
modules like MAPRes and HMM for best classification of proteins data using pros and
cons of each technique.
8/3/2019 BINSFinalt
41/43
Jeffery C.J. Moonlighting proteins, Trends Biochem. Sci., 24:8--11, 1999.
Bork P., Dansekar T., Diaz-Lazcoz Y., Eisenhaber F., Huynen M.and Yuan Y. Predicting function: from genes to genome and back.J. Mol. Biol., 283:707--725, 1998.
Attwood T. The quest to deduce protein function from sequence:the role of pattern databases, Int. J. Biochem. CellBiol., 32:139--155, 2000.
Mann, M., Ong, S., Gronborg. M, .Steen, H. et al., TrendsBiotechnol. 2002, 20, 261-268.
Wu, C. H., Comput, Chem, 1997, 21, 237-256.
Blom N., Sicheritz-Protein T., Gupta R., Gammeltoft S., andBrunak S. Prediction of post-translational glycosylation andphosphorylation of proteins from the amino acid sequence,Proteomics, 4: 1633--1649, 2004.
8/3/2019 BINSFinalt
42/43
8/3/2019 BINSFinalt
43/43