33
TAC’2015 Xevi Biarnés @xevibiarnes Càlcul i anàlisi de dades massiu per al disseny d’enzims amb aplicacions biotecnològiques Xevi Biarnés @xevibiarnes [email protected] Departament de Bioenginyeria IQS School of Engineering Jornada TAC’2015 Mataró, 30 de juny de 2015

Càlcul i anàlisi de dades massiu per al disseny d'enzims amb aplicacions a la indústria biotecnològica

Embed Size (px)

Citation preview

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Càlcul i anàlisi de dades massiu per al disseny d’enzims amb aplicacions biotecnològiques

Xevi Biarnés

@xevibiarnes

[email protected]

Departament de Bioenginyeria

IQS School of Engineering

Jornada TAC’2015

Mataró, 30 de juny de 2015

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

IQS School of Engineering IQS School of Management

Via Augusta, 390 Barcelona

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Bioengineering Department

Laboratory of Biochemistry Laboratory of Biomaterials Laboratory of Tissue Engineering Laboratory of Microbiology Laboratory of Bioprocesses

Degree in Biotechnology Master’s of Science in Bioengineering PhD in Bioengineering

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Laboratory of Biochemistry

Main R&D topics:

Protein engineering and enzymology of glycosidases and glycosyltranferases Therapeutic targets in infectious diseases Amyloidogenic proteins in neurodegenerative diseases Biocatalysis: enzyme redesign, directed evolution of enzymes Metabolic Engineering for the production of glycoglycerolipids

Headed by Prof. Antoni Planas

5 permanent staff 8 PhD students 6 MSc students 4 undergrad students 2 research assistants

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Bioinformatics and Molecular Modelling Unit

Laboratory of Biochemistry

BIOMMIQS

Bioinformatics for comparative analysis of genomic sequences

Protein Structure Prediction

In silico tools to assist in experimental Protein Engineering

Simulation of small and macro molecules conformational dynamics

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

BIOMMIQS @IQS and Anella Científica

BIOMMIQS

Direct access to consortium services:

CBUC/CCUC EDUROAM

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

BIOMMIQS @IQS and Anella Científica

Barcelona (BSC-CNS) MareNostrum 3, MinoTauro, Altix Madrid (CeSViMa-UPM) Magerit Islas Canarias (IAC, ITC) LaPalma 2, Atlante Cantabria (UC) Altamira 2 Málaga (UMA) Picasso 2 Valéncia (UV) Tirant 2 Zaragoza (BIFI-UZ) CaesarAugusta 2

BIOMMIQS

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Re-Evolution of Genomes

optimization of the genetic codes of living organisms to adapt to their

living environments

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Burst of Genomic Data

• Growth of GeneBank database

http://www.ncbi.nlm.nih.gov/genbank/statistics

700 GBytes of raw data

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

ACTAACCCCTCAGTTTTTGTCAAGCTGTCAGACCCTCCAGCGCAGGTTTCAGTGCCATTCATGTCACCTGCGAGTGCTTATCAATGGTTTTATGACGGATATCCCACATTCGGAGAACACAAACAGGAGAAAGATCTTGAATACGGGGCATGTCCTAATAACATGATGGGCACGTTCTCAGTGCGGACTGTGGGGACCTCCAAGTCCAAGTACCCTTTAGTGGTTAGGATCTACATGAGAATGAAGCACGTCAGGGCGTGGATACCTCGCCCGATGCGTAACCAGAACTACCTATTCAAAGCCAACCCAAATTATGCTGGCAACTCCATTAAGCCAACTGGTGCCAGTCGTACAGCGATCACCACTCTTGGGAAATTTGGACAACAGTCTGGGGCTATTTATGTGGGCAACTTTAGAGTGGTCAACCGACATCTTGCCACTCACAATGATTGGGCAAATCTTGTTTGGGAAGACAGCTCTCGCGACTTGCTCGTGTCATGAACCACCGCCCAAGGCTGTGACACGATTGCTCGTTGCGATTGCCAGACAGGGGTGTACTACTGTAACTCGATGAGAAAACACTACCCAGTCAGTTTTTCAAAACCCAGCCTGATCTATGTAGAGGCTAGCGAGTATTACCCAGCCAGGTACCAATCACATCTCATGCTCGCACAGGGTCACTCAGAACCTGGTGATTGCGGTGGTATCCTTAGATGCCAACATGGCGTCGTCGGCATAGTGTCTACTGGTGGTAATGGGCTCGTTGGCTTTGCAGACGTTAGAGACCTCTTGTGGTTAGATGAAGAAGCTATGGAACAGGGCGTGTCCGACTACATCAAGGGTCTCGGAGATGCTTTTGGAACAGGCTTCACTGACGCAGTCTCAAGGGAGGTTGAAGCTCTCAAGAACTATCTTATAGGGTCTGAAGGAGCAGTTGAGAAAATCTTGAAAAATCTTATTAAACTAATCTCTGCACTGGTATTGTGATCAGAAGTGATTACGACATGGTTA

Where is the information?

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

We are at the post-genomic era

Torgeir R. Hvidsten

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Proteins are the Machinery of the Cells

genes (DNA) only keep the information

proteins (aminoacids) perform the function

The Inner Life of the Cell (youtube)

XVIVO Scientific Animation

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Proteins are 3D Objects

nanometer size

>sequence_of_aminoacids

MYCSASTCTCSTTATRHYGCKLMNDSSCRFGH

KLISPRDTEDFSGFRTCSKLIPSCSFACVIPL

PSFACEERERWQSRTNCVISCRTEDPLKISCF

GRSRACGRSTTRSGCSPLYPLREDTSWASDFR

3D structure and function of proteins is dictated by their aminoacids sequence

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Proteins are Dynamic 3D Objects

hemoglobin: oxygen

transporter in blood

eppur si muove

a = F / m

x(t) = x0 + v0·t + ½·a·t2

protein motion can be simulated on the computer: Molecular Dynamics (MD)

typical simulation: 1 billion of steps!

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Computer Simulations of Biological Processes @ BIOMMIQS

Implementation of computational algorithms

based on Molecular Dynamics (metadynamics)

that enhance the simulation of biologically relevant processes

BIOMMIQS

Protein Folding Protein Aggregation

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Computer Simulations of Biological Processes @ BIOMMIQS

Prion protein folding

Prion protein (the causative agent of spongiform encephalopathy) is an unstable protein that can adopt different structures.

One of these structures, tends to form precipitates in the central nervous system tissue, leading to neurodegeneration.

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Computer Simulations of Biological Processes @ BIOMMIQS

Prion protein folding

The structural determinants of prion protein stability were identified in-silico by extensive computer simulations.

Benetti F. and Biarnés X. et al, JMB 2014

The simulations spent 1.000.000 of hours of total CPU time,

and generated 1 TeraByte of data.

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Computer Simulations of Biological Processes @ BIOMMIQS

Amyloid-β peptide aggregation

Amyloid-β is an intrinsically disordered protein. The final segment of this protein can lead to aggregates. These aggregates are associated to Alzheimer’s disease.

18 molecules of the final Amyloid-β segment were simulated, and a nascent fibril was detected.

Baftizadeh F. et al, PRL 2013

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Technical Details of Computer Simulations

• Huge CPU-demanding

– millions of hours in CPU time supercomputers

• Big Data Storage

– 100 GBytes per simulation (3-4 months) cloud storage?

• Data Transfer

– 1 GByte of data generated daily

– Need to transfer locally for visualization efficient communications

– Current download rates:

• From CESVIMA (UPM Madrid) to IQS 5.3 MBytes / s

• From BSC (UPC Barcelona) to IQS ??

• From SISSA (Trieste, Italy) to IQS 9.5 Mbytes / s

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Technical Details of Computer Simulations

• Visualization

– Renders are done off-line in local computers

• Visualization on-line?

– Remote Desktops are a solution not nice

– “Streaming” of xyz coordinates could be a solution?

C 3.23 2.22 4.34 O 2.31 1.34 3.41 H 2.88 2.35 5.32 C 3.21 2.11 1.22 … …

30000-50000 atoms x 100000 frames

Minimal Atomic Coordinates File atom x y z

3D renders are generated by specific software based on an atomic coordinate file containing

the xyz coordinates of each atom in the protein structure

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Proteins as Chemical Machines: ENZYMES

Chemical transformations rule our life

6·CO2 + 6·H2O C6H12O6 + 6·O2

Enzymes decrease the activation energy required for a chemical

transformation: this is a catalyst

PHOTOSYNTHESIS

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Proteins as “bio”catalysts (enzymes)

Protein structures are tightly tuned to accommodate their

natural ligands.

Maximum catalytic efficiency of enzymes is attained, in part, by the

binding forces in the active site.

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Industrial applications of enzymes

Amylases production of sugars from starch in syrups production

Glucanases starch degradation prior to fermentation in beer production

Proteases

Cellulases cellulose degradation prior to fermentation in bioethanol production

Lipases esterification of lipids in biodiesel production

Amylases, Xylanases, Cellulases, Ligninases starch degradation to lower viscosity, aiding sizing and coating paper

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Industrial applications of enzymes

PRODUCTION OF NON-NATURAL ADDED-VALUE COMPOUNDS

Pharmaceuticals Pigments Biomaterials

Complementing traditional chemical industry

···

- Processes optimization - Green chemistry

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Enzymes to Produce Added-Value Compounds

natural compound

novel compound

there is room for enzyme optimization by PROTEIN ENGINEERING

Natural enzymes are not optimized for non-natural compounds

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

The Protein Engineering Dilemma

MSDTAGSPWFSHSLKRNQDFGFYYSDFCNARSDTPQSCWREGQNESDRQTAVWPYRTSCNMLKCSRYTCVPM

Protein Engineering can be guided by Computer Simulations and Genomic-Data Mining

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Protein Engineering by Data Mining @ BIOMMIQS

Setting-up of an integrative platform

to assist in protein engineering experiments

BIOMMIQS

The platform is based on biological data integration from different database sources

and complemented with computer simulations

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Protein Engineering by Data Mining @ BIOMMIQS

UNIPROT 50.000.000 protein sequences

PDB 110.000 protein structures

PFAM 16.000 protein functions

CAZY 340.000 enzymes active on carbohydrates

GENBANK 185.000.000 genomic sequences

http://www.ncbi.nlm.nih.gov/genbank/

http://uniprot.org

http://pdb.org

http://pfam.xfam.org

http://cazy.org

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Protein Engineering by Data Mining @ BIOMMIQS

Protein engineering of chitindeacetylases for the biotechnological production of chitosan

http://nano3bio.eu

from chitin

to chitosan

… …

+ + + … …

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Protein Engineering by Data Mining @ BIOMMIQS

Natural chitindeacetylase enzymes producing different chitosans

Andrés E. et al, Angew Chem Intl Ed 2014

Engineering of a non-natural chitindeacetylase to produce new-to-nature chitosans

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Technical Details of Data Mining

• Public Databases size

– GENBANK 700 GBytes

– UNIPROT 27 GBytes

– PDB 373 GBytes

– PFAM 195 GBytes

– No local copies! For general purposes, public web services are used.

• BLAST

• JMOL

• HMMSEARCH

• Public Databases are updated regularly (weekly)

– Need to update local copies mirrors?

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Bioinformatics and Molecular Modelling Unit

Laboratory of Biochemistry

BIOMMIQS

Bioinformatics for comparative analysis of genomic sequences

Protein Structure Prediction

In silico tools to assist in experimental Protein Engineering

Simulation of small and macro molecules conformational dynamics

TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes

Càlcul i anàlisi de dades massiu per al disseny d’enzims amb aplicacions biotecnològiques

Xevi Biarnés

@xevibiarnes

[email protected]

Departament de Bioenginyeria

IQS School of Engineering

Jornada TAC’2015

Mataró, 30 de juny de 2015