Protein World

12-12-2002 Amsterdam

Tim Hulsen

Genome sequencing• Since 1995: sequencing of complete

‘genomes’ (DNA): A/C/G/T orderACGTCATCGTAGCTAGCTAGTCGTACGTATGTGCAGTAGCATCGATCGATCAGCATGCATAC

• At this moment more than 80 genomes have been sequenced and published, of all kinds of organisms:– Animals– Plants– Fungi– Bacteria

Genomes Proteins

• ‘Transcription’ and ‘translation’ of specific regions of the genome leads to proteins, consisting of twenty types of ‘amino acids’:ATG ACG CTG AGC TGC GGA CGT TGA -> TLSCGR

• Proteins are responsible for all kinds of life processes• All the proteins that can be produced in an organism

together are called the ‘proteome’• Sequence comparisons make

possible the classification ofproteins

Protein families• e.g. The GPCR family:

• Sequence comparison helps in predicting the function of new proteins

Determining protein functions

• Function of 40-50% of the new proteins is unknown

• Understanding of protein functions and relationships is important for:– Study of fundamental biological processes– Drug design– Genetic engineering

Sequence comparison

• Smith-Waterman dynamic programming algorithm (1981): calculates similarity/distance between two sequences:Query ---PLIT-LETRESV-Subject NEQPKVTMLETRQTAD(bold=similar)

• Results in a SW-score that is a measure for how similar the two sequences are to each other

• Disadvantage: score is dependent of length• After the alignments, the proteins are ‘clustered’

(divided into families) according to their similarity

Existent databases

• Domain-based clusterings: Prosite, Pfam, ProDom, Prints, Domo, Blocks

• Protein-based clusterings: ProtoMap, COGs, Systers, PIR, ClusTr

• Structural classifications: SCOP, CATH, FSSP

Why should there be another database?

Another method

• Enhanced Smith-Waterman algorithm: Monte-Carlo evaluation (Lipman et al., 1984)

• How big is the chance that two sequences are similar but not related?

• One of the two sequences is randomized and recalculated (200 times). Randomization leads to sequences with the same length and the same composition, but different order

• Method leads to calculation of the Z-value:S(A,B) - µ

Z(A,B) = ------------------- σ

Advantages

• The obtained Z-value is a very reliable measure for sequence, compared to SW-score: – SW-score is dependent of length, Z-value is

not– Amino acid bias does not affect the Z-value

• Independent of the database size• Easier updating of the database, without a

total recalculation

Disadvantage

• LOTS of calculation time needed, especially when all proteins in all proteomes are compared to each other (“all-against-all”)!

SARA calculation

• Proteomes of 82 organisms compared ‘all-against-all’ with the use of the Monte Carlo algorithm: more than 400,000 proteins!

• 21,600 CPU days (~520,000 CPU hours)• = 21,600 PCs running parallel over 24

hours / 1 PC running for ~ 60 years• Using supercomputer TERAS (1024-CPU

SGI Origin 3800) at SARA: less than two months!

Parties involved

• Gene-IT (Paris, France)

• SARA (Amsterdam, the Netherlands)

• CMBI (Nijmegen, the Netherlands)

• Organon (Oss, the Netherlands)

• EBI (Hinxton, UK)

Supporting parties

• Financed by NCF, foundation in support of supercomputing

• Under the auspices of BioASP, the new Dutch knowledge and service center for Bioinformatics

Results available through BioASP

• http://www.bioasp.nl• Log in and click on links ‘Research’ and ‘Protein

World’:1

• Organism selection screen:

• Results screen:

• Alignment screen:

Conclusions

• Currently the most comprehensive and most accurate data-set of protein comparisons

• A start for a maintainable and unique database of all proteins currently known

• A rich data-source for clustering, data-mining and orthology determination

Orthology determination

• Orthologs: genes/proteins in different species that derive from a common ancestor

• Orthologs often have the same function

• Interesting! Information from other species could help in annotating a protein

Thank you for your attention

Any questions?

Protein World

Documents

Asam amino dan Protein - staff.uny.ac.idstaff.uny.ac.id/sites/default/files/Protein-kuliah ko2_0.pdf · Asam amino dan Protein Protein berasal dari kata Yunani Proteios yang artinya

Protein folding Protein folding diseases Protein ... 02-14-03.pdf · Protein folding Protein folding diseases Protein interactions Macromolecular assemblies The end product of Genes

Protein Dan Sintesa Protein (Oksp)

Principles of protein-protein interactionscasegroup.rutgers.edu/lnotes/bpc10_protein2b.pdf · Principles of protein-protein interactions ... The proteins in an obligate interaction

Biosensors to detect enzyme-ligand and Protein-protein interactions

Computer Simulation of Protein-Protein Association in ... · Computer Simulation of Protein-Protein Association ... The paper is devoted to the method of computer simulation of protein

Texturization of dairy protein systems with whey protein

protein,fungsi dan sumber protein (untuk pelajar khas)

METABOLISME PROTEIN PROTEIN adalah salah satu … · Metabolisme Protein : 1. Katabolisme 2. Anabolisme / sintesis. KATABOLISME PROTEIN Pendahuluan Katabolisme atau penguraian protein

Protein-Protein Bindungsstellen

kecernaan protein ransum dan kandungan protein susu sapi perah

Fungsi Protein - Pangiastika Putri WulandariFungsi Protein - Pangiastika Putri WulandariFungsi Protein - Pangiastika Putri Wulandari

Cooperativity of membrane-protein and protein …2020/01/27 · 1 Cooperativity of membrane-protein and protein-protein interactions control membrane remodeling by epsin 1 and regulate

Haploinsufficiency of A20 impairs protein protein ... · protein–protein interactions, whereas a C-terminal zinc finger domain catalyses the attachment of K48-linked polyubiquitin

Protein-protein interaction networks

Identifikasi protein signifikan pada interaksi protein

DEPOSISI PROTEIN PADA ITIK TEGAL FASE LAYER DENGAN ...eprints.undip.ac.id/71119/1/COVER.pdf · kecernaan protein, konsumsi protein, retensi nitrogen, kadar protein telur, dan deposisi

Protein Dan Metabolisme Protein

Detektion von Protein/Protein-Interaktionen mit Hilfe des€¦ · Untersuchung von Protein-Protein-Interaktionen in Hefen [Fields & Song (1989)]. Eine wesentliche Voraussetzung für

High Protein Mega Gainer - PansportSastav: Dymatize protein kompleks ( Whey protein izolat, Whey protein koncetrat, Kalcijum kazeinat, albuminat), Dymatize kompleks ugljenih hidrata