Upload
mitchell-nutley
View
214
Download
0
Embed Size (px)
Citation preview
Virtual
modelling
of proteins
Jacek Leluk
Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski
Main functions of proteins (selected):
Enzymes
ImmunoglobulinsTransport factors (e.g.hemoglobin)
Hormones, neurotransmittersStructural and storage proteinsContractile proteins (muscles, flagella)
Jacek Leluk
Jacek Leluk
Protein – a polymer of amino acids.
Proteins consists of one or more chains.
Some proteins contain other components (sugars, lipids,
nucleotides, metal ions, other compounds...) – proteids.
The basic unit of a protein is amino acid. There are 20 biogenic amino acids (genetically encoded).
Jacek Leluk
Amino acids
Amino acid – organic compound that contains amino group and acidic group (usually it is carboxyl group)
AlanineGeneral formula
Jacek Leluk
Amino acid – polypeptide – protein
Jacek Leluk
Protein chain folding
Jacek Leluk
Diversity of proteins
Glucagon ROP proteinInsulin
Jacek Leluk
Light „harvesting” protein from purple bacteria
Diversity of proteins
Jacek Leluk
Sequence – structure - function
At first the central dogma of molecular biology assumed very strict relationship between genetic information, protein structure and function:
1 gene 1 sequence 1 structure 1 function
At present this dogma is still valid but not in as strict form as before. These relationships are not strictly univocal.
e.g. a protein of the same sequence may reveal different secondary and tertiary structures.
? ? ?
Jacek Leluk
All information about protein structure (and function as well) is included in its amino acid sequence, which is unique for each protein.
In order to be able to apply these relationships for protein modelling, first we have to learn to read and understand the information „written” in amino acid sequence.
The current level of our understanding this „writing” depends on the protein complexity and the prediction accuracy is between 20% and 80%.
Sequence – structure - function
Jacek Leluk
What do we have?
Biomolecular databases (genomic, protein and bibliographic)
Tools for theoretical analysis of biomolecules
Labs for experimental verification of the results
Knowledge (theories, hypotheses, theoretical models)
Jacek Leluk
Regular types of structure(secondary structure)
helix-helix
Jacek Leluk
sheet
-chain (-sheet)
Regular types of structure(secondary structure)
Jacek Leluk
3D protein structuresStructure-function relationship
Sea anemone -
toxin
Snake - toxin
Jacek Leluk
Bacterial RNase
Mammalian RNase
Rnase inhibitor
(inhibits both RNases)
3D protein structuresStructure-function relationship
Jacek Leluk
Errors (mutations) and resulting implicationsSickle cell anemia
Sickle cell anemia – genetic disease caused by a single amino acid substitution in hemoglobin -chain (one of 146). S hemoglobin has Val instead of Glu in -chain. Homozygotes (HbSS) are lethal, heterozygotes (Hb AS) are anemic, but resistant to malaria.
Normal hemoglobin – chainVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
S Hemoglobin – chainVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Jacek Leluk
Hemoglobin
Normal Altered
Mutations and resulting implicationsSickle cell anemia
Jacek Leluk
Mutations and resulting implicationsSickle cell anemia
Jacek Leluk
Mutations and resulting implicationsSickle cell anemia
Jacek Leluk
Glucagon (pig) – hormone, 29 amino acids
HSQGTFTSDYSKYLDSRRAQDFVQWLMNT
Glucagon (synthetic) – hormone, 29 amino acids
HSQGTFTSDYSKYLDSKKAQEFVQWLMNT
Jacek Leluk
Glucagon (pig) – HSQGTFTSDYSKYLDSRRAQDFVQWLMNT
Glucagon (synth.) – HSQGTFTSDYSKYLDSKKAQEFVQWLMNT
Hydrophobic amino acids:
L, I, V, F, M, Y, (W)
„Gluca con” modelling
Gluca con
LAALIAAVAAAIAAVLRRIAEVLAIVAAL
Jacek Leluk
„Gluca con” design - resultsGlucagon (pig) – HSQGTFTSDYSKYLDSRRAQDFVQWLMNT
Glucagon (synth.) – HSQGTFTSDYSKYLDSKKAQEFVQWLMNT
Gluca con – LAALIAAVAAAIAAVLRRIXEVLAIVAAL
Jacek Leluk
Can we „improve” the Nature at molecular level?
What for?
Our goal is to get the knowledge about natural mechanisms and then
to apply this knowledge for our needs, but not to alter the evolved mechanisms that naturally occur.
Jacek Leluk
Role and significance of theoretical protein modeling and design
Time economy Money economy Work and material economy Increasing our knowledge Supporting the experimental work
Jacek Leluk
The value of virtual protein design
=
Zestawienie sekwencji (multiple alignment) 52 inhibitorów proteinaz typu Bowman-Birk sporządzone za pomocą algorytmu
semihomologii genetycznej Reszty konserwatywne i typowe wyszczególniono białymi literami na czarnym tle. Szare tło wskazuje aminokwasy
semihomologiczne. 3 10 20 30 40 50 60 P01055 ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP P01057 ESSKPCCDECACTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS P01056 QSSKPCCBHCACTKSIPPQCRCTDLRLDSCHSACKSCICTLSIPAQCV-CBBIBDFCYEP-CKS P01058 ESSKPCCDQCSCTKSMPPKCRCSDIRLNSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS P01059 ESSKPCCDLCTCTKSIPPQCHCNDMRLNSCHSACKSCICALSEPAQCF-CVDTTDFCYKS-CHN P01063 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS P17734 QSSKPCCRQCACTKSIPPQCRCSQVRLNSCHSACKSCACTFSIPAQCF-CGBIBBFCYKP-CKS P81483 -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS P81484 -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS P16343 ESSKPCCSSC-CTRSRPPQCQCTDVRLNSCHSACKSCMCTFSDPGMCS-CLDVTDFCYKP-CKS P01064 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS P82469 -SSGPCCDRCRCTKSEPPQCQCQDVRLNSCHSACEACVCSHSMPGLCS-CLDITHFCHEP-CKS P01061 ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS P01062 ESSEPCCDSCDCTKSIPPECHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES P01060 QSSPPCCBICVCTASIPPQCVCTBIRLBSCHSACKSCMCTRSMPGKCR-CLBTTBYCYKS-CKS 1BBI: ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP 1D6R:I ---KPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CK- 1DF9:C ESSEPCCDSCDCTKSIPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES 1PI2: EYSKPCCDLCMCTRSMPPQCSCED-RINSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS 1PBI:A DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKQ-CHN AAB4719 ESSKPCCDQCTCTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS TISYC2 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS JC2225 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS TIZB2 ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS JC2073 ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS JC2072 ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS 0506164 ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS 0401177 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS 763679A ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS TISYD2 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS 0907248 ESSEPCCDSCRCTKSIPPQCHCADIRLNSCHSACKSCMCTRSMPGKCR-CLDTDDFCYKP-CES 1102213 ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS 1102213 ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS 0404180 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS TIZB1B ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS TIMB ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES TIZB1P ESSHPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS JC1066 ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCTKP-CES Q41066 DVKSACCDTCLCTKSDPPTCRCVDVGET-CHSACDSCICALSYPPQCQ-CFDTHKFCYKA-CHN P80321 STTTACCDFCPCTRSIPPQCQCTDVREK-CHSACKSCLCTLSIPPQCH-CYDITDFCYPS-CR- Q41065 DVKSACCDTCLCTKSNPPTCRCVDVRET-CHSACDSCICAYSNPPKCQ-CFDTHKFCYKA-CHN P81705 --TSACCDKCFCTKSNPPICQCRDVGET-CHSACKFCICALSYPAQCH-CLDQNTFCYDK-CDS P56679 DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKA-CHN P16346 --TTACCNFCPCTRSIPPQCRCTDIGET-CHSACKTCLCTKSIPPQCH-CADITNFCYPK-CN- P01065 DVKSACCDTCLCTRSQPPTCRCVDVGER-CHSACNHCVCNYSNPPQCQ-CFDTHKFCYKA-CHS P24661 DVKSACCDTCLCTKSEPPTCRCVDVGER-CHSACNSCVCRYSNPPKCQ-CFDTHKFCYKS-CHN P07679 KRPWECCDIAMCTRSIPPICRCVDKVDR-CSDACKDCEETEDN--RHV-CFDTYIGDPGPTCHD P19860 ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE P22737 ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE 220645 ES-EGCCDRCICTKSMPPQCHCHDVRLDSCHSDCETCICTRSYPAQCR-CADTTDFCYKP-C-S P09864 TRPWKCCDRAICTKSFPPMCRCMDMVEQ-CAATCKKCGPATSDSSRRV-CEDXY----------- P09863 KRPWKCCDQAVCTRSIPPICRCMDQVFE-CPSTCKACGPSVGDPSRRV-CQDQYV---------- KONSENSUS ESSKPCCDXCXCTKSIPPQCRCXDXRLNSCHSACKSCXCTRSXPXQCX-CXDTXDFCYKP-CKS