79
Bioinformatics for Proteomics Shu-Hui Chen ( 陳陳陳 ) Department of Chemistry National Cheng Kung University

Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

  • View
    249

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Bioinformatics for Proteomics

Shu-Hui Chen (陳淑慧 )Department of Chemistry

National Cheng Kung University

Page 2: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

TranscriptionDNA

5’ 3’

mRNASplicing

TranslationPoly-peptide

Folding

Protein

• Transport / Localization• Oligomerization• PTM (Post-Translational Modification)

Function Function

How do we find protein coding regions, introns and exons in genomic DNA sequences?

Bioinformatics I

Page 3: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

What is Proteomics ?

Systematic analysis of All protein sequences All protein expression pattern All protein interactions

This involves Protein isolation Protein separation Protein identification Functional characterization of all proteins

Page 4: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

The tools of Proteomics

Traditional protein chemistry assay methods struggle to establish Identity

Identity requires: Specificity of measurement (Precision) Mass Spectrometry MS-based data acquisition algorithm A reference for comparison Protein sequence databases Search algorithms

Page 5: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

MS-based Proteomics and Bioinformatics

• MS instrument is so far not sensitive enough to resolve proteins in a biological system solely based on signals measured.

• MS, however, is able to acquire sufficient data for mapping a protein from the database using new computer algorithms to analyze the data.

• This is the field of bioinformatics

Page 6: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Ion source Mass analyzer

Sample inlet

Data acquisition

vacuum

Instrumentation

Page 7: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 8: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 9: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

“Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

Page 10: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

MS-based Protein Identification

Mass Mapping

Peptide Sequencing

Page 11: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Conventional Methodology- Expression Proteomics

Page 12: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Trypsin DigestionWe know that trypsin cleaves polypeptides C-terminal to basic amino acids.

-NH-CH(R1)-CO-NH-CH(R2)-CO-

trypsin

-NH-CH(R1)-COOH H2N-CH(R2)-CO-

m/z

Ion

in

ten

sity

Page 13: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Mass SpectrometryProtein identified by database mapping

Page 14: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Automated Database SearchNumber 1 match: tumor necrosis factor type 1 receptorassociated protein TRAP-1 (Mr): 76030.271 RALRRAPALA AVPGGKPILC PRRTTAQLGP RRNPAWSLQA GRLFSTQTAE

51 DKEEPLHSII SSTESVQGST SKHEFQAETK KLLDIVARSL YSEKEVFIRE

101 LISNASDALE KLRHKLVSDG QALPEMEIHL QTNAEKGTIT IQDTGIGMTQ

151 EELVSNLGTI ARSGSKAFLD ALQNQAEASS KIIGQFGVGF YSAFMVADRV

201 EVYSRSAAPG SLGYQWLSDG SGVFEIAEAS GVRTGTKIII HLKSDCKEFS

251 SEARVRDVVT KYSNFVSFPL YLNGRRMNTL QAIWMMDPKD VGEWQHEEFY

301 RYVAQAHDKP RYTLHYKTDA PLNIRSIFYV PDMKPSMFDV SRELGSSVAL

351 YSRKVLIQTK ATDILPKWLR FIRGVVDSED IPLNLSRELL QESALIRKLR

401 DVLQQRLIKF FIDQSKKDAE KYAKFFEDYG LFMREGIVTA TEQEVKEDIA

451 KLLRYESSAL PSGQLTSLSE YASRMRAGTR NIYYLCAPNR HLAEHSPYYE

501 AMKKKDTEVL FCFEQFDELT LLHLREFDKK KLISVETDIV VDHYKEEKFE

551 DRSPAAECLS EKETEELMAW MRNVLGSRVT NVKVTLRLDT HPAMVTVLEM

601 GAARHFLRMQ QLAKTQEERA QLLQPTLEIN PRHALIKKLN HCAQASLAWL

651 SCWWIRYTRT P

Total coverage: 33.4%

Page 15: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Minimal content of a « protein sequence » db

• Sequences !!• Accession number (AC)• Taxonomic data• References• ANNOTATION/CURATION• Keywords• Cross-references• Documentation

Bioinformatics I

Page 16: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

SWISS-PROT/TrEMBL

• Collaboration between the SIB (CH) and EMBL/EBI (UK)

• SWISS-PROT: Fully annotated (manually), non-redundant,

cross-referenced, documented protein sequence database.

• TrEMBL: is automatically generated (from annotated EMBL coding sequences (CDS)) and annotated using software tools.

http://www.expasy.org/sprot/

Bioinformatics I

Page 17: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

ExPASy Web Server

ExPASy =

ExpertProtein AnalysisSystem

Page 18: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Molecular Weight Search

By Pappin and Bleasby

History for MS Searching

MOWSE

MOWSEⅡ

1993

1996

1994 SEQUEST By Yates and Eng

1997

1998

MOWSEⅢ

MASCOT By Matrix science

Page 19: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 20: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 21: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 22: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 23: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 24: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 25: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 26: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 27: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 28: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Scoring algorithmScoring algorithmFinal score= -10*LOG(P), where P is absolute probability that the observed match is a random event

E value (expected value) = describes the number of hits one can expect to see by chance when searching a database of a particular size. A value of zero indicates that no matches would be expected by chance.Significant hits at 95% confidence level (p<0.05)

there is less than a 1 in 20 chance that the observed match is a random event.

5 7

Increase mass tolerance

Page 29: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 30: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 31: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

MS-based Protein Identification

Mass Mapping

Peptide Sequencing

Page 32: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 33: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Tandem Mass Spectrometry- MS/MS

MS/MS acquisition is controlled by software setting

Page 34: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Protein Identification

Peptide Sequencing using MSMS

peptide

ABCDEF AB CDEF

A BCDEF

ABC DEF

ABCDE F

ABCD EF

A

ABABC

ABCDABCDE

A B C D E

CID

m/z

precursor ion

Page 35: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Nomenclature used for CID peptide fragmentation-

Low Energy (eV)- Q, TOF, FT

“Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

Page 36: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Protein Identification by Database Search

Page 37: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 38: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 39: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 40: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Trypsin DigestionWe know that trypsin cleaves polypeptides C-terminal to basic amino acids.

-NH-CH(R1)-CO-NH-CH(R2)-CO-

trypsin

-NH-CH(R1)-COOH H2N-CH(R2)-CO-

m/z

Ion

in

ten

sity

Page 41: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 42: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 43: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 44: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Sequence Tag Approach for Peptide Sequencing

“Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

Page 45: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences.

The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Page 46: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Bioinformatics I NCBI BLAST http://www.ncbi.nlm.nih.gov/blast/

BLAST:

BasicLocalAlignmentSearchTool

Page 47: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Sequence alignments and comparison

1: MYTAILORISRICH 2: MONTAILLEURESTRICHE

1: MY-TAIL--ORIS-RICH- ¦x ¦¦¦¦ x¦x¦ ¦¦¦¦2: MONTAILLEURESTRICHE

¦ = Identityx = Mismatch- = Insertion / Deletion

1: TAILO RICH ¦¦¦¦x ¦¦¦¦2: TAILL RICHE

Global Alignment

Two Local Alignments

Bioinformatics I

Page 48: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

HBA_CHICK VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHF-DL 48HBAD_CHICK ML-TAEDKKLIQQAWEKAASHQEEFGAEALTRMFTTYPQTKTYFPHF-DL 48HBPI_CHICK AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV 48HBB_CHICK VHWTAEEKQLITGLWGKV--NVAECGAEALARLLIVYPWTQRFFASFGNL 48HBE_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL 48HBRH_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFDNFGNL 48MYG_CHICK GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL 49 .... . ..* . .. * * * *.. .* * * * .. HBA_CHICK SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV 93HBAD_CHICK SP-----GSDQVRGHGKKVLGALGNAVKNVDNLSQAMAELSNLHAYNLRV 93HBPI_CHICK SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV 93HBB_CHICK SSPTAILGNPMVRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHV 98HBE_CHICK SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV 98HBRH_CHICK SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHV 98MYG_CHICK KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI 99 . *. .. ** .*.. . . .. .. . *.. * .. HBA_CHICK DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR-- 141HBAD_CHICK DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAAFDKFLSAVSAVLAEKYR-- 141HBPI_CHICK DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR-- 141HBB_CHICK DPENFRLLGDILIIVLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH-- 146HBE_CHICK DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH-- 146HBRH_CHICK DPENFRLLGNILIIVLAAHFTKDFTPTCQAVWQKLVSVVAHALAYKYH-- 146MYG_CHICK PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF 149 . .... . .* . . ... . .* . .. **. HBA_CHICK ---- 141HBAD_CHICK ---- 141HBPI_CHICK ---- 141HBB_CHICK ---- 146HBE_CHICK ---- 146HBRH_CHICK ---- 146MYG_CHICK GFQG 153 Consensus length: 154; Identity : 19 ( 12.3%); Similarity: 51 ( 33.1%)Character to show that a position in the alignment is perfectly conserved: '*'Character to show that a position is well conserved: '.'

MultipleSequenceAlignment

(MSA)

Programs:

• CLUSTALW

• T_COFFEE

• MULTALIGN

Bioinformatics I

Page 49: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Searching databases with multiple alignments

PSI-BLAST: Position-Specific Iterative BLAST (Altschul et al., 1997)

1. Starting with a single sequence, PSI-BLAST searches a database using BLAST and builds a multiple sequence alignment and a profile.

2. The profile is then used to search the protein database again.

3. Running the program several times can further refine the profile and increase search sensitivity.

Page 50: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 51: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 52: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 53: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 54: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 55: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 56: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 57: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 58: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 59: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 60: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 61: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 62: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 63: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 64: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 65: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Error tolerance search

Page 66: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

0.2Da/0.2Da32

Page 67: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

0.05Da/0.05Da

27

Page 68: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

0.5Da/0.5Da

33

Page 69: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 70: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 71: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 72: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 73: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

MS/MS Scan Functions

mass scan modesingle mass transmission

m2 m2 m2 m2m3

m1

m4

m2

Collision Chamber (gas)Collision Chamber (gas)

++

+

+

+

+

N2

+ + + ++

Q1 Q3Product Ion Scan (PI) Fix ScanMultiple Reaction Mode (MRM) Fix FixPrecursor Ion Scan (PS) Scan FixNeutral Loss Scan (NL) Scan Scan

Page 74: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 75: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

IP + MS/ID for searching protein interaction complex

Page 76: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 77: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 78: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
Page 79: Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

Conclusions

Protein identification by MS is a key element of proteomics andthe ID process is an informatics-based methodology.

MS + sequence databases represent a huge leap for protein Biochemistry- A large scale analysis approach.

Biochemical manipulation + protein ID is capable of providing functional information of proteins.

Bioinformatics tools are needed to link proteomics data to protein interaction and biological pathways.