Upload
finn
View
35
Download
0
Embed Size (px)
DESCRIPTION
Acelerando la bioinformatica con el GRID computing. Angel Merino Centro Nacional de Biotecnología, Unidad de Biocomputación. Qué contar …. Microscopia Electrónica Qué es la EM. Cuál es el proceso de trabajo. - PowerPoint PPT Presentation
Citation preview
Acelerando la bioinformaticacon el GRID computing
Angel MerinoCentro Nacional de Biotecnología,
Unidad de Biocomputación
Qué contar ….• Microscopia Electrónica
– Qué es la EM.– Cuál es el proceso de trabajo.
• Que se está resolviendo con la GRID: Procesos/Aplicaciones que se han “gridificado”– Maximum Likelihood– Estimación de la CTF
• Superando la barrera de potencial– Web-portal– Web/Grid Services & Workflows
• Otras aplicaciones del mundillo
Que es la EM (I)
• La EM es una técnica de análisis estructural.
• Nos permite adentrarnos en el entorno molecular de las partículas a estudiar.
Cual es el proceso de trabajo
Preparación de muestras.
Obtención de las imágenes.
Procesado de las imágenes y cálculo de volúmenes 3D
Biological Material- High H2O content- Elevated radiation damage
Negative Tint- Dehydration- Structural changes / Crushing- Image comes from metal mold
Cryomicroscopy- Hydrated / Biologic-friendly- Less distorsions- Image comes from biological specimen
Que es la EM (II)
Que es la EM (III)
Tinción negativa
Criomicroscopía
Aberrations in the microscope optics affect the experimental images (blurring). These effect may be described by the CTF.
CTF-estimation in Xmipp may take up to half a day per micrograph. Moreover per experiment, a user processes about 100 micrographs. Therefore, grid computing is necessary.
Estimation of the CTF allows correction of the blurred images.
Estimación de la CTF (I)
Estimación de la CTF(II)
Estimación de la CTF (III)Por micrografía
1000x
Maximum-Likelihood
Maximum-Likelihood (I)Ejecución “lenta”
1 iteración
Maximum-Likelihood(II)Ejecución “rapida” (MPI)
Desarrollo de Maximum-Likelihood usando EGEE-GRID vs local cluster
Usando EGEE GRID
Grid
Durante el pasado mes de Noviembre se consumieron 17160 horas de CPU (casi 2 años!)
23 CPUs tiempo completo
Usando nuestro cluster local (50%) (jumilla.cnb.uam.es), para la misma actividad
20 cpu´s
Tiempo de uso real = 50% del tiempo total debido a la actividad de desarrollo que se estaba realizando
46 CPUs!!!
0
0,5
1
1,5
2
2,5M
onth
grid jumilla cluster
Environments
Superando la barrera de potencial4 simple steps to run all jobs that you need for your experiment
1º Select your application 2º Login into the UI 3º Upload your necessary files
4º Submit your experiment, giving a notification e-mail address and your password certificate
Superando la barrera de potencial (I)
Input from Grid portal
C++ O
bject
Submit joband publish the data(first time)Checking statusGet Output and retrieve
the output data.
JDLs
Required scripts (3)
Required input tar´s
For each JDL
Aborted or not submitted
Done (success)
First script
Second script
Third script
Run the job and publish the output data when job finishes.
Send e-mail to the notification e-mail address
El motor del portal
Superando la barrera de potencial (II)
Workflows & Grid Services
Grid Protein Structure Analysis Scientific objectives
Bioinformatic analysis of data produced by complete genome sequencing projects is one of the major challenge of the next years. Integrating up-to-date databanks and relevant algorithms is a clear requirement of such an analysis. Grid computing, such as the infrastructure provided by the EGEE European project, would be a viable solution to distribute data, algorithms, computing and storage resources for Genomics. Providing bioinformatician with a good interface to grid infrastructure will also be a challenge that should be successful. GPS@ web portal, Grid Protein Sequence Analysis, aims to be such an user-friendly interface for these grid genomic resources on the EGEE grid.
MethodA well-known web interface eases the access to the algorithms offered.Protein databases are stored on grid storage as flat files.Most protein sequence analysis tools are reference legacy code that is run unchanged. This tools are wrapped in grid jobs to be executed on grid resources.The algorithms output are analysed and displayed in graphic format through the web interface.
Otras aplicaciones
Otras aplicaciones(I)
Scientific objectivesProvide docking information helping in search for new drugs.Biological goal: propose new inhibitors (drug candidates) addressed to neglected diseases.Bioinformatics goal: in silico virtual screening of drug candidate DBs.Grid goal : demonstrate to the research communities active in the area of drug discovery the relevance of grid infrastructures through the deployment of a compute intensive application.
MethodLarge scale molecular docking on malaria to compute million of potential drugs with some software and parameters settings. Docking is about computing the binding energy of a protein target to a library of potential drugs using a scoring algorithm.
In silico Drug Discovery
Genome evolution modeling Scientific objectives
Study human evolutionary genetics and answer questions such as the geographic origin of modern human populations, the genetic signature of expanding populations, the genetic contacts between modern humans and Neanderthals, and the expected null distributions of genetic statistics applied on genome-wide data sets.
MethodSimulate the past demography (growth and migrations) of human populations into a geographically realistic landscape, by taking into account the spatial and temporal heterogeneity of the environment. Generate the molecular diversity of several samples of genes drawn at any location of the current human's range, and compare it to the observed contemporary molecular diversity.SPLATCHE uses a region sampling Bayesian framework that requires105 independent demographic and genetic simulations.
Otras aplicaciones (II)
Para mas infoXmipp web page: www.cnb.uam.es/~bioinfo
Unit web page: http://biocomp.cnb.uam.es
NA4 EGEE biomed applications home: http://egee-na4.ct.infn.it/biomed/index.php
Gracias