33
Bioinformática en el Grado de Ingeniería de la Salud M. Gonzalo Claros Díaz Dpto Biología Molecular y Bioquímica Plataforma Andaluza de Bioinformática http://about.me/mgclaros/ @MGClaros

150522 bioinfo gis lr

Embed Size (px)

Citation preview

Page 1: 150522 bioinfo gis lr

Bioinformática en el Grado de Ingeniería de la Salud

M. Gonzalo Claros Díaz Dpto Biología Molecular y Bioquímica

Plataforma Andaluza de Bioinformática

Centro de Bioinnovación

http://about.me/mgclaros/

@MGClaros

Page 2: 150522 bioinfo gis lr

Bioinformática solo se ofrece en la UMA

2http://www.uma.es/grado-en-ingenieria-de-la-salud

Page 3: 150522 bioinfo gis lr

¿Qué es la bioinformática?

3http://everydaylife.globalpost.com/medical-schools-bioinformatics-37686.html

La bioinformática es un campo científico nuevo y muy

atractivo que está en la interfase entre la informática, la biología y las matemáticas para descubrir informaciones

nuevas sobre las enfermedades y el cuerpo

humano

La bioinformática utiliza la biología y la informática para descubrir cómo funcionan los

seres vivos y sus enfermedades

Page 4: 150522 bioinfo gis lr

¿Qué es la bioinformática?

3http://everydaylife.globalpost.com/medical-schools-bioinformatics-37686.html

La bioinformática es un campo científico nuevo y muy

atractivo que está en la interfase entre la informática, la biología y las matemáticas para descubrir informaciones

nuevas sobre las enfermedades y el cuerpo

humano

La bioinformática utiliza la biología y la informática para descubrir cómo funcionan los

seres vivos y sus enfermedades

Page 5: 150522 bioinfo gis lr

Se están definiendo las competencias del bioinformático

4

Message from ISCB

Bioinformatics Curriculum Guidelines: Toward aDefinition of Core CompetenciesLonnie Welch1*, Fran Lewitter2, Russell Schwartz3, Cath Brooksbank4, Predrag Radivojac5, Bruno Gaeta6,

Maria Victoria Schneider7

1 School of Electrical Engineering and Computer Science, Ohio University, Athens, Ohio, United States of America, 2 Bioinformatics and Research Computing, Whitehead

Institute, Cambridge, Massachusetts, United States of America, 3 Department of Biological Sciences and School of Computer Science, Carnegie Mellon University,

Pittsburgh, Pennsylvania, United States of America, 4 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus,

Hinxton, Cambridge, United Kingdom, 5 School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States of America, 6 School of Computer

Science and Engineering, The University of New South Wales, Sydney, New South Wales, Australia, 7 The Genome Analysis Centre, Norwich Research Park, Norwich, United

Kingdom

Introduction

Rapid advances in the life sciences andin related information technologies neces-sitate the ongoing refinement of bioinfor-matics educational programs in order tomaintain their relevance. As the disciplineof bioinformatics and computational biol-ogy expands and matures, it is importantto characterize the elements that contrib-ute to the success of professionals in thisfield. These individuals work in a widevariety of settings, including bioinformaticscore facilities, biological and medical re-search laboratories, software developmentorganizations, pharmaceutical and instru-ment development companies, and institu-tions that provide education, service, andtraining. In response to this need, theCurriculum Task Force of the InternationalSociety for Computational Biology (ISCB)Education Committee seeks to definecurricular guidelines for those who trainand educate bioinformaticians. The previ-ous report of the task force summarized asurvey that was conducted to gather inputregarding the skill set needed by bioinfor-maticians [1]. The current article details asubsequent effort, wherein the task forcebroadened its perspectives by examiningbioinformatics career opportunities, survey-ing directors of bioinformatics core facili-ties, and reviewing bioinformatics educa-tion programs.

The bioinformatics literature providesvaluable perspectives on bioinformatics edu-cation by defining skill sets needed bybioinformaticians, presenting approaches forproviding informatics training to biologists,and discussing the roles of bioinformatics corefacilities in training and education.

The skill sets required for success in thefield of bioinformatics are considered byseveral authors: Altman [2] defines fivebroad areas of competency and lists keytechnologies; Ranganathan [3] presentshighlights from the Workshops on Educationin Bioinformatics, discussing challenges andpossible solutions; Yale’s interdepartmentalPhD program in computational biology andbioinformatics is described in [4], which liststhe general areas of knowledge of bioinfor-matics; in a related article, a graduate ofYale’s PhD program reflects on the skillsneeded by a bioinformatician [5]; Altmanand Klein [6] describe the Stanford Bio-medical Informatics (BMI) Training Pro-gram, presenting observed trends amongBMI students; the American Medical Infor-matics Association defines competencies inthe related field of biomedical informatics in[7]; and the approaches used in severalGerman universities to implement bioinfor-matics education are described in [8].

Several approaches to providing bioin-formatics training for biologists are de-scribed in the literature. Tan et al. [9]report on workshops conducted to identifya minimum skill set for biologists to beable to address the informatics challengesof the ‘‘-omics’’ era. They define arequisite skill set by analyzing responsesto questions about the knowledge, skills,and abilities that biologists should possess.The authors in [10] present examples ofstrategies and methods for incorporatingbioinformatics content into undergraduate

life sciences curricula. Pevzner and Shamir[11] propose that undergraduate biologycurricula should contain an additionalcourse, ‘‘Algorithmic, Mathematical, andStatistical Concepts in Biology.’’ Wingrenand Botstein [12] present a graduatecourse in quantitative biology that is basedon original, pathbreaking papers in diverseareas of biology. Johnson and Friedman[13] evaluate the effectiveness of incorpo-rating biological informatics into a clinicalinformatics program. The results reportedare based on interviews of four studentsand informal assessments of bioinformaticsfaculty.

The challenges and opportunities rele-vant to training and education in thecontext of bioinformatics core facilities arediscussed by Lewitter et al. [14]. Relatedly,Lewitter and Rebhan [15] provide guid-ance regarding the role of a bioinformaticscore facility in hiring biologists and infurthering their education in bioinfor-matics. Richter and Sexton [16] describea need for highly trained bioinformaticiansin core facilities and provide a list ofrequisite skills. Similarly, Kallioniemi et al.[17] highlight the roles of bioinformaticscore units in education and training.

This manuscript expands the body ofknowledge pertaining to bioinformaticscurriculum guidelines by presenting theresults from a broad set of surveys (of corefacility directors, of career opportunities,and of existing curricula). Although thereis some overlap in the findings of the

Citation: Welch L, Lewitter F, Schwartz R, Brooksbank C, Radivojac P, et al. (2014) Bioinformatics CurriculumGuidelines: Toward a Definition of Core Competencies. PLoS Comput Biol 10(3): e1003496. doi:10.1371/journal.pcbi.1003496

Published March 6, 2014

Copyright: ! 2014 Welch et al. This is an open-access article distributed under the terms of the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,provided the original author and source are credited.

Funding: No specific funding was received for writing this article.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

PLOS Computational Biology | www.ploscompbiol.org 1 March 2014 | Volume 10 | Issue 3 | e1003496

database management languages (e.g.,Oracle, PostgreSQL, and MySQL), andscientific and statistical analysis software(such as R, S-plus, MATLAB, and Math-ematica). Additionally, a bioinformaticianshould be able to incorporate componentsfrom open source software repositories intoa software system. The ability to effectivelyutilize distributed and high-performancecomputing to analyze large data sets isessential, as is knowledge of networkingtechnology and internet protocols. A bioin-formatician should be able to utilize webauthoring tools, web-based user interfaceimplementation technologies, and versioncontrol and build tools (e.g., subversion,Ant, and Netbeans).

While it is important for a bioinforma-tician to have a suite of computational,mathematical, and statistical skills, thisalone is insufficient. Throughout theircareers, bioinformaticians usually contrib-ute to a variety of scientific projects, such asvariant detection in human exome rese-quencing; human genetic diversity; geno-mic and epigenomic mechanisms of generegulation; viral diversity; neurodegenera-tion and psychiatric disorders; drug discov-ery; the role of transcription factors andchromatin structure in global gene expres-sion, development, and differentiation; andcancer/tumor biology. To be a fullyintegrated member of a research team, abioinformatician must possess detailedknowledge of molecular biology, genomics,genetics, cell biology, biochemistry, andevolutionary theory. Furthermore, it isnecessary to understand related technolo-gies, including next generation sequencingand proteomics/mass spectrometry. It is

also desirable for a bioinformatician to havemodeling experience or background in oneor more specialized domains, such assystems biology, inflammation, immunolo-gy, cell signaling, or physiology.

Additionally, a bioinformatician musthave a high level of motivation, beindependent and dedicated, possess stronginterpersonal and managerial skills, andhave outstanding analytical ability. Abioinformatician must have excellentteamwork skills and have strong scientificcommunication skills.

As a bioinformatician progressesthrough his or her career, it is helpful todevelop managerial and programmaticskills, such as staff management andbusiness development; understanding ofor experience with grant funding and/oraccess to finance; awareness of researchand development (R&D) and innovationpolicy and government drivers; the use ofmodeling and simulation approaches; abil-ity to evaluate the major factors associatedwith efficacy and safety; and ability toanswer regulatory questions related toproduct approval and risk management.It is also important to have familiarity withpresenting biological results in both oraland written forms.

In summary, a senior bioinformaticianwill benefit from strong analytical reasoningcapabilities, as evidenced by a track recordof innovation; scientific creativity, collabo-rative ability, mentoring skills, and inde-pendent thought; and a record of outstand-ing research. Table 1 summarizes the skillsets identified by (1) surveying bioinfor-matics core facility directors and (2) exam-ining bioinformatics career opportunities.

Preliminary Survey of ExistingCurricula

An important step in developing guide-lines for bioinformatics education is togain a comprehensive understanding ofcurrent practices in bioinformatics andcomputational biology education. To thisend, the task force surveyed and cata-logued existing curricula used in bioinfor-matics educational programs.

As a first step, the task force began amanual search for educational programs.Due to the large number of educationprograms, the decision was made to initiallyrestrict the search to programs awarding adegree or certificate and explicitly including‘‘computational biology,’’ ‘‘bioinformatics,’’or some close variant in the name of thedegree or certificate awarded. The searchthus excluded non-degree tracks or optionswithin more traditional programs, non-degree programs of study, or programs inrelated fields that might have high overlapwith bioinformatics (e.g., biostatistics orbiomedical informatics). Although this wasa controversial decision even within the taskforce, this narrow scope and definition ofprograms was intended to keep the searchfrom becoming too unfocused or beingsidetracked over questions of which pro-grams should be included as belonging tothe field.

A search by committee members pro-duced a preliminary collection of twoprograms awarding degrees of associate ofarts or sciences; 72 awarding bachelor ofscience, arts, or technology; 38 awardingmaster of science, research, or biotechnolo-gy; 39 awarding doctor of philosophy; and

Table 1. Summary of the skill sets of a bioinformatician, identified by surveying bioinformatics core facility directors andexamining bioinformatics career opportunities.

Skill Category Specific Skills

General time management, project management, management of multiple projects, independence, curiosity, self-motivation, ability tosynthesize information, ability to complete projects, leadership, critical thinking, dedication, ability to communicate scientificconcepts, analytical reasoning, scientific creativity, collaborative ability

Computational programming, software engineering, system administration, algorithm design and analysis, machine learning, data mining, databasedesign and management, scripting languages, ability to use scientific and statistical analysis software packages, open sourcesoftware repositories, distributed and high-performance computing, networking, web authoring tools, web-based user interfaceimplementation technologies, version control tools

Biology molecular biology, genomics, genetics, cell biology, biochemistry, evolutionary theory, regulatory genomics, systems biology, nextgeneration sequencing, proteomics/mass spectrometry, specialized knowledge in one or more domains

Statistics and Mathematics application of statistics in the contexts of molecular biology and genomics, mastery of relevant statistical and mathematicalmodeling methods (including experimental design, descriptive and inferential statistics, probability theory, differential equations andparameter estimation, graph theory, epidemiological data analysis, analysis of next generation sequencing data using R andBioconductor)

Bioinformatics analysis of biological data; working in a production environment managing scientific data; modeling and warehousing of biologicaldata; using and building ontologies; retrieving and manipulating data from public repositories; ability to manage, interpret, andanalyze large data sets; broad knowledge of bioinformatics analysis methodologies; familiarity with functional genetic and genomicdata; expertise in common bioinformatics software packages, tools, and algorithms

doi:10.1371/journal.pcbi.1003496.t001

PLOS Computational Biology | www.ploscompbiol.org 3 March 2014 | Volume 10 | Issue 3 | e1003496

http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.1003496

¿Qué tiene que saber?

¿Qué puede hacer?

06-04-14

Page 6: 150522 bioinfo gis lr

El bioinformático puede ejercer de varias formas• Como un ingeniero y usuario

• Facilitar tareas difíciles o tediosas

• Flujos de trabajo y automatización

• Como un informático

• Mejorar los algoritmos existentes

• Crear algoritmos nuevos

• Ensamblaje de secuencias

• Como un clínico

• Descubrir información biológica con el ordenador

• Relacionar enfermedades aparentemente inconexas

5

Inf

Ing

Clin

Page 7: 150522 bioinfo gis lr

El perfil de un bioinformático australiano

6http://www.ebi.edu.au/news/braembl-community-survey-report-2013

¿Dónde trabaja? ¿Quién es el bioinformático?

Esto es un usuario

Otro usuario

Este es el bioinformático Y este también

Page 8: 150522 bioinfo gis lr

El bioinformático no tiene problemas de movilidad

7

Page 9: 150522 bioinfo gis lr

La «info» no logra ponerse al ritmo de la tecnología «bio»

8

Page 10: 150522 bioinfo gis lr

Si no aumentan los recursos, habrá que dedicar más gente a analizar los datos

9

Se necesitan bioinformáticos

Page 11: 150522 bioinfo gis lr

…y se necesitan cada vez más

10http://www.indeed.com/jobtrends?q=molecular+biology,+bioinformatics,+biomedical+engineering&l=&relative=1

El estallido de la crisis provocó grandes diferencias

El bioinformático es el de mejores

perspectivas

El bioinformático no vive solo de los hospitales

Page 12: 150522 bioinfo gis lr

Todos los días hay nuevas peticiones de bioinformáticos

11

30-dic-13

Page 13: 150522 bioinfo gis lr

Todos los días hay nuevas peticiones de bioinformáticos

11

30-dic-13

Page 14: 150522 bioinfo gis lr

Y también en España y Europa

12http://www.eurosciencejobs.com/jobs/bioinformatics

Page 15: 150522 bioinfo gis lr

Si lo que quieres es ganar dinero, también

13

Puedes anunciarte aquí desde 50 euros

Contacta: 633 601 207 [email protected]

La Marea tiene un CÓDIGO ÉTICO consensuado con los socios para regular las inser-ciones publicitarias. La revista nunca publicará anuncios que entren en contradicción con nuestros principios. No acep-tamos publicidad con conte-nidos sexistas, racistas o que fomenten la discriminación.

BiogredosBollería y galletería, envasado de harinas, frutos secos y legumbres. Todo con deno-minación de agricultura ecológica.Ctra. AV923, km. 0,5. Mombeltrán. Ávila. Teléfono: 920 37 02 97

Genoma4uConocer tu genoma y el de tus hijos es la llave de la medicina personalizada.www.genoma4u.com

El Cantero de LeturAlimentos lácteos ecológicos de alta ca-lidad. Es lógico. Es ecológico. Teléfono: 967 42 60 66 www.elcanterodeletur.com

Ateneu RebelEspacio anticapitalista de lucha, encuen-tro y cultura. C. Font Honrada, 32-34. Barcelona. [email protected]

La MarabuntaLibrería-café con una amplia agenda cultu-ral. Punto de encuentro, crítica y reflexión. Poesía, música, debates políticos y sociales. C/ Torrecilla del Leal, 32. Madrid. www.lamarabunta. info libreria@lamarabunta. info

Farrachucho ComunicaciónApostamos por fomentar el espíritu crítico y crear valor social a través del diseño. En-redados con el cooperativismo y las em-presas de economía social.C/San Antón, 15. Casco Viejo. Pamplona-Iruña (Navarra) Teléfono: 948225971 [email protected]

Club de l’empanadaEmpanadas gallegas artesanas en el Barri Gòtic de Barcelona. Disfruta de una empanada de pulpo, de raxo, de baca-lao... Más de diez tipos diferentes. Menú diario. Cocina de mercado.Carrer de la Dagueria, 7. Barcelona. Teléfono: 93 310 76 47

EnCubiertaLa primera revista en formato ebook di-rigida a los lectores que leen en los dis-positivos electrónicos. Recomendamos títulos a partir de entrevistas con autores, extractos de libros, reseñas y listados de novedades. Publicamos cada número la primera semana del mes. www.encubierta.com

Ión RadioParticipa en esta nueva radio de análisis de los movimientos sociales. Periodismo a fuego lento.www.ionradio.es

Sindicato del cómicLibrería especializada en cómics y jue-gos de mesa, con actividades y presen-taciones.c/ Doctor Marañón, 15 Ourense Tfn. 988 25 08 28

EcogermenTienda de productos ecológicos, alimen-tación... Economía social. Consumo res-ponsable. Soberanía alimentaria. Plaza Elíptica 15, bis. Valladolid.Teléfono: 983 37 63 96. www.ecogermen.com

Librería Circus Una librería distinta. Libros usados, nue-vos, idiomas... Albacete. Frente Teatro Circo

DiDeSURAsociación para la promoción del comercio justo, el consumo crítico y la soberanía alimentaria. C/ Ciudad Real, 1 (El Foro. Local exterior) Azuqueca de Henares. Guadalajara. [email protected]

Librería AnónimaLibrería literaria general de barrio de pue-blo, universal, libertaria y aragonesa. C/Cabestany, 19. 22005. Huesca. www.libreriaanonima.es

Libraría PedreiraUnha libraría galega aberta ao mundo.Rúa do Home Santo, 55. Santiago de Compostela

Txoko TxinboTxokolatea, txurroak, eta zerbait gehiago. Chocolate, churros... y algo más.Plaza Nueva 10. Alde Zaharra. Bilbo (Bizkaia) www.txokotxinbo.eu

Anuncios breves 63Abril 2014www.lamarea.com

¿Se puede cambiar Europa a través del voto?El Parlamento de la UE gana poder pero carece de competencias para controlar organismos como la troika

ABRIL 2014

LA REVISTA MENSUAL

DE LA COOPERATIVA

MÁSPÚBLICO

MERCADONAEl rey de los supermercados impone sus propias condiciones laborales

AGUAEl Gobierno ultima la privatización de manantiales y de caudales de ríos

22-MLas Marchas de la Dignidad, un símbolo de unidad y poder popular

ABRIL 2014 | Nº15 | 3€

Page 16: 150522 bioinfo gis lr

Se les paga bien, al menos en el extranjero

14

Se paga mejor linux y OSX

que Windows

http://www.r-bloggers.com/r-skills-attract-the-highest-salaries/

En la rama de bioinformática

de GIS se estudia R

http://www.r-users.com

Page 17: 150522 bioinfo gis lr

Merece la pena estudiar bioinformática en la UMA

15

Page 18: 150522 bioinfo gis lr

El descubrimiento de nuevos fármacos «era» carísimo

16

Hay que sintetizar cada compuesto y comprobarlo

en los animales

Método clásico Método bioinformático

Solo se sintetizan los candidatos. Ahorro en

síntesis, tiempo y animales

Ligand database

Page 19: 150522 bioinfo gis lr

Ha valido para el Nobel de química en 2013

17

Por el desarrollo de modelos computacionales para conocer y predecir procesos químicos

Químico teórico Biofísico Bioquímico

http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/

Bioquímico

Page 20: 150522 bioinfo gis lr

Ha valido para el Nobel de química en 2013

17

Por el desarrollo de modelos computacionales para conocer y predecir procesos químicos

Químico teórico Biofísico Bioquímico

http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/

Bioquímico

This Nobel Prize is the first given to work in computational biology, indicating that the field has matured and is on a par with experimental biology

The blog of PLOS Computational Biology

Page 21: 150522 bioinfo gis lr

Diseño de fármacos sobre dianas en compartimentos

18

Send Orders for Reprints to [email protected] Current Pharmaceutical Design, 2014, 20, 293-300 293

Biocomputational Resources Useful For Drug Discovery Against Compartmentalized Targets

Francisca Sánchez-Jiménez*,#, Armando Reyes-Palomares#, Aurelio A. Moya-García, Juan AG Ranea and Miguel Ángel Medina

Department of Molecular Biology and Biochemistry and unit 741 of “Centro de Investigación en Red en Enfermedades Raras” (CIBERER), Faculty of Sciences, University of Malaga, 29071 Malaga, Spain

Abstract: It has been estimated that the cost of bringing a new drug onto the market is 10 years and 0.5-2 billions of dollars, making it a non-profitable project, particularly in the case of low prevalence diseases. The advances in Systems Biology have been absolutely deci-sive for drug discovery, as iterative rounds of predictions made from in silico models followed by selected experimental validations have resulted in a substantial saving of time and investments. Many diseases have their origins in proteins that are not located in the cytosol but in intracellular compartments (i.e. mitochondria, lysosome, peroxisome and others) or cell membranes. In these cases, biocomputa-tional approaches present limitations to their study. In the present work, we review them and propose new initiatives to advance towards a safer, more efficient and personalized pharmacology. This focus could be especially useful for drug discovery and the reposition of known drugs in rare and emergent diseases associated with compartmentalized proteins.

Keywords: Systems biology, diseasomes, compartmentalized proteins, drug discovery, rare diseases, lysosome, mitochondria, peroxisome.

SYSTEMS PHARMACOLOGY CONCEPTS AND AIMS During the second half of the 20th century both conceptual and technological developments have made it possible to establish rela-tionships between specific molecules (genes, proteins, metabolites, drugs) related to different human diseases applying reductionist approaches [1]. Following this strategy, the volume of molecular data from the analyses of human samples under different pathophysiological con-ditions and pharmacological testing was exponentially increasing. Despite these impressive research efforts, the molecular basis of many diseases remains far from being well characterized, since they are complex problems influenced by both genome and environment [2]. Although most genetic diseases are monogenic, around 20% of them are polygenic, as deduced from genetic disorder databases (OMIM, www.ncbi.nlm.nih.gov/omim; and Orphanet, www.orpha.net). In addition, next-generation sequencing is revealing novel causal variants and candidates genes involved in Mendelian disorders [3,4]. The majority of human diseases are the result of interactions between at least two types of overlapped, dynamic and very com-plex molecular networks at the cellular level (metabolic interaction and signaling networks). At present, it is well known that the huge amounts of molecular information obtained from fragmented subsystems -studied by re-ductionist strategies- need to be integrated, organized and even formalized in algorithms in order to be re-analyzed [5]. The idea that it is not possible to reach the full characterization of biological processes from only the sum of the properties of their partial sub-systems (the major Systems Biology axiom) has been claimed from almost 100 years ago [6], but the lack of proper technological re-sources has been blocking its effective application for decades. The evolution of Systems Biology as a scientific revolution has been re-viewed recently [1], highlighting how HTP, high-throughput tech-nologies and computational analysis tools are now making it possible to characterize living processes from a holistic perspective [7-9].

*Address correspondence to this author at the Department of Molecular Biology and Biochemistry, Faculty of Sciences, University of Malaga, 29071, Malaga, Spain; Tel: +34 952131674; Fax: +34 952131674; E-mail: [email protected] #Authors contributed equally to this work.

Although there have been significant advances in the construc-tion and analysis of biological networks in different organisms, the current state of the art still remains far from this holistic perspec-tive. The main restrictions are due to the inherent complexity of biological systems, but also by the limitations of computational approaches. The lack of systematic platforms of analysis for re-searches and the disregarded -or unavailability of- information could produce an unveiled bias in the problem under study [10]. In spite of all these difficulties, network biology has been pro-posed as an efficient computational tool to identify multi-scale mechanisms related to biomedical processes [9] and drug interven-tion strategies [11]. The structure and dynamics of these networks for each individual determine the effectiveness of the therapeutic strategies. Thus, pharmacogenomics is considered essential to iden-tify individualized responses to drug treatments (personalized medicine/pharmacology) based on systemic information. Moreover, the success in discovery and characterization of new drugs also depends on the degree of knowledge on the structure and dynamics of these networks. Thus, systems pharmacology is an emerging field that collects all the above mentioned concepts to discover and analyze potential drugs, network based-methods playing an essen-tial role in their development; in fact, network pharmacology is a new scientific field devoted to studying multiple active relation-ships between drugs and targets, to validate drug combinations and to predict new targets [12,13].

BIOCOMPUTATIONAL TOOLS, AN ESSENTIAL SUP-PORT FOR SYSTEMS PHARMACOLOGY A holistic view of a biological problem involves the integration of an impressive quantity of data (phenotypes, genes, proteins, compounds and their properties, as well as the relationships among them). Consequently, systemic approaches depend on biocomputa-tional resources for storage and dissemination of knowledge (e.g. databases), as well as relational and analytic algorithms (e.g. workflows, predictive and visualization tools). Data integration, management and analysis are only a part of Systems Pharmacology, since the computational models and the predictions made from their simulations should be experimentally validated. The major aim of the development of Systems Pharma-cology initiatives is to achieve the safest and most effective way to

1873-4286/14 $58.00+.00 © 2014 Bentham Science Publishers

298 Current Pharmaceutical Design, 2014, Vol. 20, No. 2 Sánchez-Jiménez et al.

small system as a few molecules) can be particularly interesting to discover new preventive protocols and treatments of rare/orphan and neglected diseases, where the research budget is not enough to address the objective only by experimental HTP methods. In the case of compartmentalized proteins and potential targets in organ-elle-associated diseases, these in silico technologies (followed by experimental validation) can also be especially useful, for instance, for the following applications: i) to identify drugs with different affinities for isozymes located in different intracellular compart-ments; ii) to achieve differential modulation of different prote-olytic/post-translational modified fragments of a given protein dif-fering in their biological activity-associated intracellular location. This is the case for many signaling transduction elements, protein sorting and membrane trafficking.

CONCLUDING REMARKS In the present review, we have shown that the combination of computational approaches (at different levels of complexity) and experimental approaches appears to be one of the most promising strategies for the development of new and more effective pharma-cological solutions, in general. This combined approach requires the existence of multidisciplinary R&D groups/collaborative net-works able to manage biology, physical chemistry, organic chemis-try, pharmacological and clinical information, and their collabora-

tion in order to develop and apply mathematic and computer sci-ence solutions. In our review, we focus on the cases of low preva-lence diseases involving compartmentalized molecular elements. We comment (without the aim of being exhaustive) on some of the most popular biocomputational resources for the study of the phar-macology of organelle-related diseases, as well as the major current limitations of in silico approaches to drug discovery for compart-mentalized proteins involved in organelle-related diseases. Addi-tional efforts to increase cooperation among bioinformatics, and between biomedicine and clinical specialists would be crucial to achieve translational results efficiently.

CONFLICT OF INTEREST The authors confirm that this article content has no conflicts of interest.

ACKNOWLEDGEMENTS This work has been funded by Grants PS09/02216, SAF2011-26518 and SAF2012-33110 (Spanish Government and FEDER funds), CVI-06585 (Andalusian Government). This work takes part in the activities of the COST Action BM0806 and Platform BIER (CIBERER) and AMER Consorptium (FEDER-Innterconecta, CDTI, Spain). CIBERER is an initiative of Instituto de Salud Car-

Fig. (2). Metabolic network of amine metabolism and their cellular compartments. This scheme illustrates a major re-ordering of metabolic interactions between genes associated with the amine metabolism (gene ontology term, GO:0009308) disregarding (A) or considering their location in cellular compart-ments (B). Metabolic interactions were retrieved from positive flux correlations (exceeding 0.1) between genes with "met score" greater than zero [78]; the cellular compartments were assigned using Mosaic [79]. It is noteworthy, that some genes can be associated with different organelles making it difficult to identify the true cellular location of these metabolic interactions.

• Bioquímica estructural • Biología de sistemas

Bioquímica estructural

Inf

Ing

Page 22: 150522 bioinfo gis lr

Las enfermedades y los biomarcadores

19

Chen and Wang Journal of Clinical Bioinformatics 2011 1:35 doi:10.1186/2043-9113-1-35

Se necesita la bioinformática para descubrir los candidatos

Bioinformática pura y dura

Con la bioinformática se descubren:

Page 23: 150522 bioinfo gis lr

Mejorar los algoritmos de detección de biomarcadores

20

•Minería de datos •Análisis de expresión génica

Aprendizaje computacional93

94

95

96

97

98Leukemia

accu

racy

(%)

0

10

20

30

40

50

60

70

80

90

100

robu

stne

ss (%

)

0534004640

04662

Filter+GA

0467005200 GA

04062

accuracyrobustness

95

96

97

98

99

100Lung

accu

racy

(%)

0

10

20

30

40

50

60

70

80

90

100

robu

stne

ss (%

)

0453004144

04010

Filter+GA

0451404610

05200 GA

accuracyrobustness

88

89

90

91

92

93Prostate

accu

racy

(%)

0

10

20

30

40

50

60

70

80

90

100

robu

stne

ss (%

)

0098005215 GA

0004004610

00480

Filter+GA

04512

accuracyrobustness

Figure 3: Accuracy and robustness obtained for the selected pathways for each considered database(Leukemia, Lung and Prostate). The graphs include the results obtained when using a strategy basedonly on genetic algorithms (GA) and on genetic algorithms plus the filtering approach (Filter + GA)(see text for more details).

(prostacyclin) synthase, a protein of cytochrome P450 superfamily of enzymes, involvedin the synthesis of prostacyclin, a potent vasodilator and inhibitor of platelet aggregationthat is also related to myocardial infarction, stroke, and atherosclerosis, and thus couldbe also involved in lung cancer.

As an overall conclusion, the results obtained suggest the important role that theincorporation of biological information might play for carrying out a robust feature se-lection procedure for cancer (and may be any other disease) diagnostic. Moreover, thismay open the way to use GA for the prognosis of cancer diseases in a near future, aclinical aspect that is still concerning most oncologist and cancer patients.

Acknowledgements

The authors acknowledge support through grants TIN2010-16556 from MICINN-SPAIN and P08-TIC-04026 (Junta de Andalucıa), all of which include FEDER funds.

17

Robust gene signatures from microarray data using genetic algorithmsenriched with biological pathway keywords

R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b

a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spainb Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spainc Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain

a r t i c l e i n f o

Article history:Received 24 July 2013Accepted 16 January 2014Available online 27 January 2014

Keywords:DNA analysisEvolutionary algorithmsBiological enrichmentFeature selection

a b s t r a c t

Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever, these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selectioncombining the genetic algorithm with biological information extracted from the KEGG database. A com-parative study is carried out over public data from three different types of cancer (leukemia, lung cancerand prostate cancer). Even though the analyses only use features having KEGG information, the resultsdemonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracyof a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitatethe definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a nearfuture. Additionally, it could also be used for biological knowledge discovery about the studied disease.

! 2014 Elsevier Inc. All rights reserved.

1. Introduction

The term cancer encompasses more than 100 potentially life-threatening diseases affecting nearly every part of the body. Canceris a complex, multifactorial, genetic disease involving structuraland expression abnormalities of both coding and non-codinggenes. In this sense, gene expression profiling plays an importantrole in a wide range of areas in biological science for handling can-cer diseases [1–4]. The analysis of DNA microarray data requires aselection of features (genes) due to the small number of samplesavailable (mostly less than a hundred) and the large number offeatures (in the order of thousands). This problem is well-knownin the literature as the ‘‘large-p-small-n’’ paradigm or the curseof dimensionality [5].

Evolutionary models have been proposed in several works[6–12] and constitute one of the most widely used techniques forfeature selection and prognosis analysis in microarray datasets.Despite all the variety of feature selection techniques proposedin the literature, it still remains a problematic intrinsic to the

domain of DNA microarrays. Genetic algorithms (GAs) [13–18],as a particular case of evolutionary models, use classification tech-niques within the algorithm to evaluate and evolve the population.Producing stable or robust solutions is a desired property of featureselection algorithms, in particular for clinical and biomedical stud-ies. Nevertheless, robustness is a property difficult to be analyzedand is often overlooked. In [19–21] different approaches are pro-posed, addressing the main drawbacks related to overfitting androbustness, through a modified GA that includes an early-stoppingcriteria and establishing a feature ranking method that leads tomore robust solutions. Although some proposals use biologicalinformation to analyze DNA microarray data [22], none of them in-cludes it into the mechanisms that guide the searching procedurein the GA. In our opinion, this strategy would, on one hand, pro-duce more robust feature subset selections and, on the other hand,permit to obtain signatures more relevant for clinicians and bio-medical researchers.

In this approach, a two-stage procedure is proposed in order toobtain robust feature subset selections with good performancerates in test future data. Bootstrap Cross-Validation (BCV) is usedsince its good behavior related to misclassification error with smallsamples has been previously demonstrated [23,24], including DNAmicroarray datasets. A novel feature scoring method within the GAis also proposed, taking into account biological information relatedto the studied disorders. One widely used source of biologicalinformation is the Gene Ontology (GO) database [25] since it

http://dx.doi.org/10.1016/j.jbi.2014.01.0061532-0464/! 2014 Elsevier Inc. All rights reserved.

⇑ Corresponding author. Address: Department of Computer Languages andComputer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga,Spain. Fax: +34 952131397.

E-mail addresses: [email protected] (R.M. Luque-Baena), [email protected](D. Urda), [email protected] (M. Gonzalo Claros), [email protected] (L. Franco),[email protected] (J.M. Jerez).

Journal of Biomedical Informatics 49 (2014) 32–44

Contents lists available at ScienceDirect

Journal of Biomedical Informatics

journal homepage: www.elsevier .com/locate /y jb in

• Bases de datos biológicas• Herramientas y algoritmos• Análisis de expresión génica

Combinar biología e informática es lo que da mejores resultados

Inf

Page 24: 150522 bioinfo gis lr

miRNA biomarcadores de supervivencia del cáncer de mama

21

A microRNA Signature Associated with Early Recurrencein Breast CancerLuis G. Perez-Rivas1., Jose M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4,

M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sanchez1, Nuria Ribelles1,

Emilio Alba1, Jose Lozano1,5*

1 Laboratorio de Oncologıa Molecular, Servicio de Oncologıa Medica, Instituto de Biomedicina de Malaga (IBIMA), Hospital Universitario Virgen de la Victoria, Malaga,

Spain, 2 Departamento de Lenguajes y Ciencias de la Computacion, Universidad de Malaga, Malaga, Spain, 3 Plataforma Andaluza de Bioinformatica, Universidad de

Malaga, Malaga, Spain, 4 Servicio de Anatomıa Patologica, Instituto de Biomedicina de Malaga (IBIMA), Hospital Universitario Virgen de la Victoria, Malaga, Spain,

5 Departmento de Biologıa Molecular y Bioquımica, Universidad de Malaga, Malaga, Spain, 6 Departmento of Biologıa Celular, Genetica y Fisiologıa Animal, Universidad de

Malaga, Malaga, Spain

Abstract

Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse patternafter surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years,respectively. Although several clinical and pathological features have been used to discriminate between low- and high-riskpatients, the identification of molecular biomarkers with prognostic value remains an unmet need in the currentmanagement of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developedearly (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregatedtumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarraydata analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentiallyexpressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs weredown-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-riskgroup of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by publicdatabases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result inan overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-relatedmicroRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breastsurgery.

Citation: Perez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoSONE 9(3): e91884. doi:10.1371/journal.pone.0091884

Editor: Sonia Rocha, University of Dundee, United Kingdom

Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014

Copyright: ! 2014 Perez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio deEconomıa, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucıa (TIN-4026, to JJ). The funders had no role in study design, datacollection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

. These authors contributed equally to this work.

Introduction

Breast cancer comprises a group of heterogeneous diseases thatcan be classified based on both clinical and molecular features [1–5]. Improvements in the early detection of primary tumors and thedevelopment of novel targeted therapies, together with thesystematic use of adjuvant chemotherapy, has drastically reducedmortality rates and increased disease-free survival (DFS) in breastcancer. Still, about one third of patients undergoing breast tumorexcision will develop metastases, the major life-threatening eventwhich is strongly associated with poor outcome [6,7].

The risk of relapse after tumor resection is not constant overtime. A detailed examination of large series of long-term follow-upstudies over the last two decades reveals a bimodal hazard functionwith two peaks of early and late recurrence occurring at 1.5 and 5

years, respectively, followed by a nearly flat plateau in which therisk of relapse tends to zero [8–10]. A causal link between tumorsurgery and the bimodal pattern of recurrence has been proposedby some investigators (i.e. an iatrogenic effect) [11]. According tothat model, surgical removal of the primary breast tumor wouldaccelerate the growth of dormant metastatic foci by altering thebalance between circulating pro- and anti-angiogenic factors[9,11–14]. Such hypothesis is supported by the fact that the twopeaks of relapse are observed regardless other factors than surgery,such as the axillary nodal status, the type of surgery or theadministration of adjuvant therapy. Although estrogen receptor(ER)-negative tumors are commonly associated with a higher riskof early relapse [15], the bimodal distribution pattern is observedwith independence of the hormone receptor status [16]. Otherstudies also suggest that the dynamics of tumor relapse may be a

PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884

A microRNA Signature Associated with Early Recurrencein Breast CancerLuis G. Perez-Rivas1., Jose M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4,

M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sanchez1, Nuria Ribelles1,

Emilio Alba1, Jose Lozano1,5*

1 Laboratorio de Oncologıa Molecular, Servicio de Oncologıa Medica, Instituto de Biomedicina de Malaga (IBIMA), Hospital Universitario Virgen de la Victoria, Malaga,

Spain, 2 Departamento de Lenguajes y Ciencias de la Computacion, Universidad de Malaga, Malaga, Spain, 3 Plataforma Andaluza de Bioinformatica, Universidad de

Malaga, Malaga, Spain, 4 Servicio de Anatomıa Patologica, Instituto de Biomedicina de Malaga (IBIMA), Hospital Universitario Virgen de la Victoria, Malaga, Spain,

5 Departmento de Biologıa Molecular y Bioquımica, Universidad de Malaga, Malaga, Spain, 6 Departmento of Biologıa Celular, Genetica y Fisiologıa Animal, Universidad de

Malaga, Malaga, Spain

Abstract

Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse patternafter surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years,respectively. Although several clinical and pathological features have been used to discriminate between low- and high-riskpatients, the identification of molecular biomarkers with prognostic value remains an unmet need in the currentmanagement of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developedearly (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregatedtumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarraydata analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentiallyexpressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs weredown-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-riskgroup of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by publicdatabases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result inan overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-relatedmicroRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breastsurgery.

Citation: Perez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoSONE 9(3): e91884. doi:10.1371/journal.pone.0091884

Editor: Sonia Rocha, University of Dundee, United Kingdom

Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014

Copyright: ! 2014 Perez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio deEconomıa, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucıa (TIN-4026, to JJ). The funders had no role in study design, datacollection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

. These authors contributed equally to this work.

Introduction

Breast cancer comprises a group of heterogeneous diseases thatcan be classified based on both clinical and molecular features [1–5]. Improvements in the early detection of primary tumors and thedevelopment of novel targeted therapies, together with thesystematic use of adjuvant chemotherapy, has drastically reducedmortality rates and increased disease-free survival (DFS) in breastcancer. Still, about one third of patients undergoing breast tumorexcision will develop metastases, the major life-threatening eventwhich is strongly associated with poor outcome [6,7].

The risk of relapse after tumor resection is not constant overtime. A detailed examination of large series of long-term follow-upstudies over the last two decades reveals a bimodal hazard functionwith two peaks of early and late recurrence occurring at 1.5 and 5

years, respectively, followed by a nearly flat plateau in which therisk of relapse tends to zero [8–10]. A causal link between tumorsurgery and the bimodal pattern of recurrence has been proposedby some investigators (i.e. an iatrogenic effect) [11]. According tothat model, surgical removal of the primary breast tumor wouldaccelerate the growth of dormant metastatic foci by altering thebalance between circulating pro- and anti-angiogenic factors[9,11–14]. Such hypothesis is supported by the fact that the twopeaks of relapse are observed regardless other factors than surgery,such as the axillary nodal status, the type of surgery or theadministration of adjuvant therapy. Although estrogen receptor(ER)-negative tumors are commonly associated with a higher riskof early relapse [15], the bimodal distribution pattern is observedwith independence of the hormone receptor status [16]. Otherstudies also suggest that the dynamics of tumor relapse may be a

PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884

• Bioquímica estructural • Biología Molecular

with tumors from relapse-free patients (group A, Table 2). MiR-625 was excluded from any further studies since RT-qPCR datashowed minimal variation between groups (FC,2). Next, we re-clustered the 71 tumors based on the 5-miRNA signature. Asshown in Figure 2, tumors from groups A and B were clearlysegregated in two distinct clusters, which included most of theexpected samples in each category: 78.8% group A in cluster 1b(low risk) and 70.4% group B in cluster 2b (high risk). Of note, thesupervised analysis included most tumors from group C (72.8%),in cluster 1b, indicating that the 5-miRNA signature specifically

discriminates tumors with an overall higher risk of earlyrecurrence.

The 5-miRNA signatureMiR-149 was the most significant miRNA downregulated in

group B, as determined by microarray hybridization and by RT-qPCR. This miRNA has been described as a TS-miR thatregulates the expression of genes associated with cell cycle,invasion or migration and its downregulation has been observed inseveral tumor diseases, including gastric cancer and breast cancer[70,77–81]. Down-regulation of miR-149 can occur epigenetical-

Figure 2. A 5-miRNA signature is associated with early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples basedon expression of the 5-miRNA signature. Note that lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includestumors from ‘‘high risk’’ patients (group B). On the contrary, most patients with good prognosis (group A) had tumors with normal or higher-thannormal levels of the 5-miRNA signature, defining a different cluster 1b (‘‘low risk’’).doi:10.1371/journal.pone.0091884.g002

Figure 3. The 5-miRNA signature discriminates patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included inthis study. B) Those patients whose tumors showed an overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) wereclassified as ‘‘high risk’’ (red line) and their cumulative RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort(‘‘low risk’’, black line). The Kaplan-Meier plot shows that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of earlyrecurrence.doi:10.1371/journal.pone.0091884.g003

A miRNA Signature Predictive of Early Recurrence

PLOS ONE | www.plosone.org 7 March 2014 | Volume 9 | Issue 3 | e91884

ly, by hypermethylation of the neighbouring CpG island [80] or byimpaired processing of the pri-miR-149 precursor, in a polymor-phic variant [79]. In a recent work, downregulation of miR-149has been associated with elevated levels of the transcription factorSP1, increase invasiveness and lower 5-year survival in colorectalcancer [80]. The p53 repressor ZBTB2 is also a target of miR-149[81], which could explain, at least partially, its function as a TS-miR.

MiR-30a-3p is a member of the miR-30 family, which isassociated with mesenchymal and stemness features [82,83] and isdownregulated in several types of cancer [84–86]. Recently,Rodriguez-Gonzalez et al. have linked low levels of this miRNA totamoxifen resistance in ER+ breast tumors. They have alsoproposed several targets of miR-30a-3p involved in proliferationand apoptosis, such as BCL2, NFkB, MAP2K4, PDGFA,CDK5R1 and CHN1 [87].

Regarding miR-20b, this miRNA is part of the miR-106b-363cluster, which is frequently deregulated in cancer [88–91]. Thelevels of miR-20b associate with histological grade in breast cancer[92,93]. This miRNA has been involved in regulating several keyproteins such as ESR1, HIF-1a, VEGF or STAT3 [92,94,95]. Inparticular, because it targets both HIF-1a and VEGF and HIF-1anegatively controls miR-20b levels, it has been defined as an anti-angiogenic miRNA [95].

Both oncogenic and tumor suppressor features have beenreported for miR-10a [96]. Thus, reduced expression of miR-10ahas been associated with MAP3K7- and bTRC-mediatedactivation of the proinflammatory NFkB pathway [97]. Also,miR-10a downregulation represses differentiation in part byderegulation of the histone deacetylase HDAC4 [98] andpositively affects invasiveness by de-repressing several membersof the homeobox family of transcription factors [99].

Regarding miR-342-5p, it appears significantly deregulatedonly when we compare B vs AC (Table 2). Together with itscounterpart (miR-342-3p), it is deregulated in inflammatory breastcancer [74] and its low expression has been associated with lower

post-recurrence survival [100], likely because it targets AKT1mRNA [101].

In sum, the available bibliographic data suggests that down-regulation of miR-149, miR-30a-3p, miR-20b, miR-10a andmiR342-5p in primary breast tumors could confer them enhancedproliferative, angiogenic and invasive potentials.

Prognostic value of the 5-miRNA signature. The relation-ship between expression of the 5-miRNA signature and RFS wasexamined by a survival analysis. Figure 3A shows a Kaplan-Meiergraph for the whole series of patients included in the study. Due tothe intrinsic characteristics of the cohort, decreases in the RFS areonly observed in the intervals 0–24 and 50–60 months(corresponding to groups B and C, respectively). We next groupedthe tumors according to their 5-miRNA signature status in twodifferent groups. One group included those tumors with all fivemiRNAs simultaneously downregulated, (FC.2 and p,0.05) anda second group included those tumors not having all five miRNAsdownregulated. A survival analysis was performed using clinicaldata from the corresponding patients. As shown in Figure 3B, theKaplan-Meier graphs for the two groups demonstrate that the 5-miRNA signature defines a ‘‘high risk’’ group of patients with ashorter RFS (Peto-Peto test with p-value = 0.02, when comparingthe low vs high risk groups).

Using a Cox proportional hazard regression model, we alsotested all possible combinations of different covariates (tumorsubtype, patient age, tumor size, number of lymph nodes affectedand the 5-miRNA signature) with early relapse (#24 months) toidentify the best prognostic factors. The best model according tothe AIC criterion included the tumor size and expression of the 5-miRNA signature (data not shown). Only the 5-miRNA signature(all five miRNAs down-regulated) resulted statistically significant inthe Cox model for the high risk group (p-value = 0.02 withHR = 2.73, 95% CI: 1.17–6.36). The 5-miRNA expression datawere also used to develop a predictor model through boot-strapping over a Naive Bayes classifier (B = 200 with N = 71, seemethods). The prognostic accuracy of the models was assessed by areceiver operating charateristic (ROC) test (Figure 4). Consideredindividually, miR-30a-3p and miR-10a showed a strikingly highArea Under the Curve (AUC) score (0.890 and 0.875, respective-ly). This result suggests that mRNA targets regulated by miR-30a-3p and miR-10a could potentially add a greater contribution tothe final outcome of the disease. However, the 5-miRNA signaturehad the strongest predictive value to discriminate tumors frompatients that will develop early relapse (group B) from those thatwill remain free of disease (group A), with an AUC = 0.993(Figure 4). In summary, the 5-miRNA signature has a goodperformance as a risk predictor for early breast cancer recurrence.

Candidate targets for the 5-miRNA signature. To extendour set of five miRNAs with regulatory information, we next tookadvantage of the existing public databases curating predicted andvalidated miRNA-target interactions (MTIs). In particular, vali-dated targets were obtained from the miTarBase and miRecordsrepositories (see methods). First, we created a biological network inCytoscape [66] containing all the individual miRNAs included inthe 5-miRNA signature (miR-149, miR-20b, miR10a, miR-30a-3p and miR-342-5p). Next, we extended the network by adding H.sapiens MTI data retrieved from the indicated repositories and,finally, extended regulatory interaction networks (RIN) weregenerated and visualized in Cytoscape. Each regulatory interac-tion in the network consist of two nodes, a regulatory component(miRNA) and a target biomolecule (mRNA) connected throughone directed edge. Figure 5 shows the extended network when theRIN threshold was set to 1 (i.e. each predicted target appears in, atleast, one RIN). Thus, at RIN = 1 the network included 14

Figure 4. Receiver operating characteristic curve (ROC) forearly breast cancer recurrence by the 5-miRNA signaturestatus. ROC curves generated using the prognosis information andexpression levels of the 5-miRNA signature can discriminate betweenpatients who will develop early recurrence and those who will remainfree of disease. Note that, although miR-30-3p and miR10a, individuallyhave a high area under the curve (AUC) score, the 5-miRNA signaturehas the strongest predictive value (AUC = 0.993) to discriminate thosepatients likely to recur early (group B in our cohort).doi:10.1371/journal.pone.0091884.g004

A miRNA Signature Predictive of Early Recurrence

PLOS ONE | www.plosone.org 8 March 2014 | Volume 9 | Issue 3 | e91884

validated targets assigned to miR-20b (VEGFA, BAMBI, EFNB2,MYLIP, CRIM1, ARID4B, HIF1A, HIPK3, CDKN1A, PPARG,STAT3, MUC17, EPHB4, and ESR1), 7 validated targetsassigned to miR-10a (HOXA1, NCOR2, SRSF1, SRSF10/TRA2B, MAP3K7, USF2 and BTRC) and 9 validated targetsassigned to miR-3a-3p (THBS1, VEZT, TUBA1A, CDK6,WDR82, TMEM2, KRT7, CYR61 and SLC7A6) (Figure 5).Taking these results into account and considering that i) theextended network was constructed with the 5-miRNA signature asthe network nodes and ii) all MTIs depicted in Figure 5 have beenexperimentally verified, we suggest that at least some of the

30 mRNAs (Figure 5) could be regulated in vivo by the 5-miRNAsignature in early-relapsing tumors.

To gain further insight into the molecular basis of the 5-miRNAsignature prognostic value, we investigated the biological pathwaysassociated with the 30 experimentally verified targets fromFigure 5. To that end, we searched for Gene Ontology (GO)terms and Kyoto Encyclopedia of Genes and Genomes (KEGG)pathways associated with the 30 targets as a whole set. It should benoted, however, that our restrictive approach –including onlyexperimentally validated miRNA targets-, left miR-149 and miR-342-5p out of the GO analysis and therefore, additional biologicalpathways could be affected by downregulation of the 5-miRNA

Figure 5. Prediction of mRNA targets likely to be regulated by the 5-miRNA signature. Biological networks were created using theCytoscape software. Each network includes two types of nodes: the five individual miRNAs included in the 5-miRNA signature and their predictedmRNA targets (yellow circles), obtained from two different public databases (miRTarBase and miRecords). The number of databases included in theanalysis defines the regulatory interaction network (RIN) threshold. Thus, at RIN = 1 the network includes all mRNA targets that appear in, at least, onedatabase. The databases included in the RIN are identified by the color of the connecting arrows: miRTarBase (blue) and miRecords (red). Althoughmany mRNAs are potential targets for miR-149 and miR-342-5p, the miRTarBase and miRecords versions included in this study did not reveal anytargets experimentally validated for the two miRNAs.doi:10.1371/journal.pone.0091884.g005

A miRNA Signature Predictive of Early Recurrence

PLOS ONE | www.plosone.org 9 March 2014 | Volume 9 | Issue 3 | e91884

Ing Clin

Page 25: 150522 bioinfo gis lr

La bioinformática se ha vuelto imprescindible

22http://pubs.niaaa.nih.gov/publications/arh311/5-11.htm

Through integration and modeling, these studies would allow us to better exploit the complexity of genomic and functional genomic data and to extract their biological and clinical significance

Page 26: 150522 bioinfo gis lr

Análisis de transcriptómica en la UMA

23

DATABASE Open Access

EuroPineDB: a high-coverage web database formaritime pine transcriptomeNoé Fernández-Pozo1, Javier Canales1, Darío Guerrero-Fernández2, David P Villalobos1, Sara M Díaz-Moreno1,Rocío Bautista2, Arantxa Flores-Monterroso1, M Ángeles Guevara3, Pedro Perdiguero4, Carmen Collada3,4,M Teresa Cervera3,4, Álvaro Soto3,4, Ricardo Ordás5, Francisco R Cantón1, Concepción Avila1, Francisco M Cánovas1

and M Gonzalo Claros1,2*

Abstract

Background: Pinus pinaster is an economically and ecologically important species that is becoming a woodygymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply.Therefore, the expressed portion of the genome has to be characterised and the results and annotations have tobe stored in dedicated databases.

Description: EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster(maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries andhigh-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic(germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs andInterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freelyavailable at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations,UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only coniferdatabase that provides this information) and will be periodically updated. Small assemblies can be viewed using adedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen canbe downloaded. Retrieval mechanisms for sequences and gene annotations are provided.

Conclusions: The EuroPineDB with its integrated information can be used to reveal new knowledge, offers aneasy-to-use collection of information to directly support experimental work (including microarray hybridisation),and provides deeper knowledge on the maritime pine transcriptome.

1 BackgroundConifers (Coniferales), the most important group ofgymnosperms, represent 650 species, some of which arethe largest, tallest, and oldest non-clonal terrestrialorganisms on Earth. They are of immense ecologicalimportance, dominating many terrestrial landscapes andrepresenting the largest terrestrial carbon sink. Currentlypresent in a large number of ecosystems, they haveevolved very efficient physiological adaptation systems.

Given that trees are the great majority of conifers, theyprovide a different perspective on plant genome biologyand evolution taking into account that conifers are sepa-rated from angiosperms by more than 300 million yearsof independent evolution. Studies on the conifer genomeare revealing unique information which cannot beinferred from currently sequenced angiosperm genomes(such as poplar, Eucaliptus, Arabidopsis or rice): around30% of conifer genes have little or no sequence similar-ity to plant genes of known function [1,2]. Unfortu-nately, conifer genomics is hindered by the very largegenome (e.g. the pine genome is approximately 160times larger than Arabidopsis and seven times larger

* Correspondence: [email protected] de Biología Molecular y Bioquímica, Facultad de Ciencias,Campus de Teatinos s/n, Universidad de Málaga, 29071 Málaga, SpainFull list of author information is available at the end of the article

Fernández-Pozo et al. BMC Genomics 2011, 12:366http://www.biomedcentral.com/1471-2164/12/366

© 2011 Fernández-Pozo et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

• Bases de datos biológicas• Herramientas y algoritmos• Análisis de expresión génica

• Biotecnología• Genómica, proteómica, metabolómica

Alumnas de 1.ªpromoción GIS-Bioinformática

Frontiers)in)Journal) ! Original!Research!2015204221!

ReprOlive: a Database with Linked-Data for the Olive Tree (Olea 1!

europaea L.) Reproductive Transcriptome 2!

ReprOlive:*an*olive*tree*reproductive*transcriptome*database*3!

Rosario)Carmona1,2,§,)A.)Zafra1,§,)Pedro)Seoane3,)A.)Castro1,)Darío)Guerrero@Fernández2,)Trinidad)Castillo@4!Castillo4,)Ana)Medina@García4,)Francisco)M.)Cánovas3,)José)F.)Aldana@Montes4,)Ismael)Navas@Delgado4,)5!Juan)D.)Alché1,)M.)Gonzalo)Claros2,3,*)6!

1"Department"of"Biochemistry,"Cell"and"Molecular"Biology"of"Plants."Estación"Experimental"del"Zaidín."CSIC."Granada."7!Spain."8!2"Plataforma"Andaluza"de"Bioinformática,"Edificio"de"Bioinnovación,"Universidad"de"Málaga."Spain"9!3!Departamento"de"Biología"Molecular"y"Bioquímica,"Universidad"de"Málaga."Málaga."Spain"10!4"Departamento"de"Lenguajes"y"Ciencias"de"la"Computación,"Universidad"de"Málaga."Spain."11!§These"authors"contributed"equally"to"this"work)12!

*)Correspondence:)M."Gonzalo"Claros,"Departamento"de"Biología"Molecular"y"Bioquímica,"Facultad"de"Ciencias,"13!Universidad"de"Málaga."29071"Málaga."Spain."EWmail:"[email protected]!14!

Keywords:)Olive,!transcriptome,!reproduction,!database,!pollen,!stigma,!linked2data,!annotation,!assembly.)15!)16!Abstract 17!

Plant reproductive transcriptomes have been analysed in different species due to the presence of 18!specific transcripts and the agronomical and biotechnological importance of plant reproduction. Here 19!we presented an olive tree reproductive transcriptome database with samples from pollen and stigma 20!at different developmental stages, and leaf and root as control vegetative tissues 21!(http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads and 1,549 Sanger sequences. 22!Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, 23!mapped and annotated with expression data, descriptions, GO terms, InterPro signatures, EC 24!numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts were also annotated with the 25!corresponding orthologues in Arabidopsis thaliana from TAIR and RefSeq databases to enable 26!Linked-Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with 27!average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 28!55,356 (75.9%) had an orthologue. A minimum of 23,568 different tentative transcripts was 29!identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome 30!can be reduced to 28,972 tentative transcripts for further gene expression studies. Partial 31!transcriptomes from pollen, stigma and vegetative tissues as control were also constructed. ReprOlive 32!provides free access and download capability to these results. Retrieval mechanisms for sequences 33!and transcript annotations are provided. Graphical localisation of annotated enzymes into KEGG 34!pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of 35!a Resource Description Framework (RDF) allowing Linked Data search for extracting the most 36!updated information related to enzymes, interactions, allergens, structures and reactive oxygen 37!species. 38!

39!

1. Introduction 40!

• Bases de datos biológicas• Herramientas y algoritmos • Biología de sistemas

Ing

Page 27: 150522 bioinfo gis lr

Incluyen el diseño y comprobación de flujos de trabajo

24

AutoFlow, a Versatile Workflow Engine Illustrated by Assembling an Optimised de novo Transcriptome for a Non-Model Species, such as Faba

Bean (Vicia faba)

Running title: AutoFlow, a versatile workflow engine

Pedro Seoane1, Sara Ocaña2, Rosario Carmona3, Rocío Bautista3, Eva Madrid4, Ana M. Torres2, M. Gonzalo Claros1,3,*

1 Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, E-29071, Malaga, Spain

2 Área de Mejora y Biotecnología, IFAPA Centro “Alameda del Obispo”, Apdo 3092, E-14080 Cordoba, Spain

3 Plataforma Andaluza de Bioinformática, Universidad de Málaga, E-29071 Malaga, Spain

4 Institute for Sustainable Agriculture, CSIC, Apdo 4084, E-14080 Cordoba, Spain

* Corresponding author

Manuel Gonzalo Claros Díaz

Departamento de Biología Molecular y Bioquímica,

Facultad de Ciencias, Universidad de Málaga,

E-29071, Malaga (Spain)

Fax: +34 95 213 20 41

Tel: +34 95 213 72 84

E-mail: [email protected]

#1 S. senegalensis

long-readsSeqTrimNext(pre-processing)

MIRA(pre-assembling)

EULER-SR(pre-assembling)

Debris

CAP3(reconciliation)

Unmapped contigs

Full-LengtherNext

UNIGENESS.senegalensis

v3

#6 Mapped contigs

#4 Contigs

#5 Coding contigs

Non-coding

Non-coding

#7 Coding unmapped

contigs

BOWTIE 2(mapping test)

#3

A#2 Rejected

Full-LengtherNext

#8

#9

#1Short reads

SeqTrimNext(pre-processing)

Oases(pre-assembling)

kmer 23 & 47paired-end + single

CD-HIT 99%

Miss-assembly rejection#3

#2 Rejected

#1 S. senegalensis

long-readsSeqTrimNext(pre-processing)

MIRA(pre-assembling)

EULER-SR(pre-assembling)

CAP3(reconciliation)

Unmapped contigs

UNIGENESS.senegalensis

v4

#6 Mapped contigs

#4 Contigs

Debris

Non-coding

#7 Coding unmapped

contigs

BOWTIE 2(mapping test)

#3

B #2 Rejected

#9

#10 #11

Full-LengtherNext

Missassemblies

#12 Contigs

#8

MOWServ: a web client for integration ofbioinformatic resourcesSergio Ramırez1, Antonio Munoz-Merida1, Johan Karlsson1, Maximiliano Garcıa1,Antonio J. Perez-Pulido2, M. Gonzalo Claros3 and Oswaldo Trelles1,*

1Departamento Arquitectura de Computadores, Escuela Tecnica Superior de Ingenierıa Informatica,Universidad de Malaga, Malaga, 2Centro Andaluz de Biologıa del Desarrollo (CSIC-UPO), Universidad Pablode Olavide, Sevilla and 3Departamento de Biologıa Molecular y Bioquımica, Facultad de Ciencias,Universidad de Malaga, Malaga, Spain

Received February 5, 2010; Revised May 12, 2010; Accepted May 18, 2010

ABSTRACT

The productivity of any scientist is affected by cum-bersome, tedious and time-consuming tasks that tryto make the heterogeneous web services compat-ible so that they can be useful in their research.MOWServ, the bioinformatic platform offered bythe Spanish National Institute of Bioinformatics,was released to provide integrated access to data-bases and analytical tools. Since its release, thenumber of available services has grown dramatical-ly, and it has become one of the main contributors ofregistered services in the EMBRACE Biocatalogue.The ontology that enables most of the web-servicecompatibility has been curated, improved andextended. The service discovery has been greatlyenhanced by Magallanes software and biodataSF.User data are securely stored on the main serverby an authentication protocol that enables the moni-toring of current or already-finished user’s tasks, aswell as the pipelining of successive data processingservices. The BioMoby standard has been greatlyextended with the new features included in theMOWServ, such as management of additional infor-mation (metadata such as extended descriptions,keywords and datafile examples), a qualified registry,error handling, asynchronous services and servicereplication. All of them have increased the MOWServservice quality, usability and robustness. MOWServ isavailable at http://www.inab.org/MOWServ/ and hasa mirror at http://www.bitlab-es.com/MOWServ/.

INTRODUCTION

Diversity, heterogeneity and geographical dispersion ofbiological data constitute problems that hinder the poten-tial integration of such information. Therefore, research-er’s productivity is affected by tedious, time-consumingand prone-to-error tasks such as searching for the appro-priate web services, collecting URLs, familiarizing them-selves with the different service interfaces, transferringdata from one service to another, formatting data forcompatibility purposes or copy/paste data in web-formswith different interfaces, to mention a few. The develop-ment of systems for interprocess communication has beenpreviously carried out with different goals: gatheringmultiple services with reliable access (1), providingaccess to a collection of independent analysis tools (2,3)or enabling the communication between a reduced set oftools (4–7). Standardization of bioinformatics services hasalso been largely analysed (8–15), standing-up over themthe use of web-services designed to support automaticmachine-to-machine interaction over a network, repre-senting BioMoby (16) the more successful case. In fact,the development of low-level data-interchange methodsbased on a specific ontology, together with the abilityfor wiring services to build powerful bioinformaticmachines, has been revealed as the most promisingsolution (17) as the growing number of web-basedservices for integrating bioinformatic tools demonstrates.MOWServ (18), the bioinformatic platform offered by theSpanish National Institute of Bioinformatics (INB),provides an integrated access to databases and analyticaltools and has strongly contributed to the development ofthe standard BioMoby protocol (17). In this article, the

*To whom correspondence should be addressed. Tel: +34 952 13 2823; Fax: +34 952 13 2790; Email: [email protected]

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Published online 4 June 2010 Nucleic Acids Research, 2010, Vol. 38, Web Server issue W671–W676doi:10.1093/nar/gkq497

! The Author(s) 2010. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

by on July 20, 2010 http://nar.oxfordjournals.org

Dow

nloaded from

Técnicas y modelos algorítmicos

Inf

Ing

Page 28: 150522 bioinfo gis lr

Relación entre genes, enfermedades y fenotipos

25

personalized medicine. Many of these genetic variations arelocated in intragenic regions of DNA and they constitute the basicdata to build disease-causing gene networks [10,11]. Thesenetworks are useful to find new genetic interactions betweendiseases, as well as to predict the influence of gene functions inexisting pathologies [48–50]. In the present work, we haveclassified the different patterns of gene-disease associations in foursubsets according to two different criteria (MD-MG, MD-PG, PD-MG, PD-PG, as depicted in Figure 1C). This is in contrast topreviously published works in which only one criterion was used,either specific and shared genes by diseases [30] or monogenic orpolygenic disease-causing genes [31,51]. Our findings indicate thatthe inferred associations are insufficient to describe properly bothinteractions among diseases and among genes. This effect can beeasily observed when analyzing bipartite graphs composed ofgene-to-disease edges. In these networks, more than 30% of thegenes participate in ‘‘bi-univocal’’ relationships (that is, genesassociated exclusively with a single disease). This specificity can beuseful for diagnostics, but it makes it more difficult to establishgroups or to identify interactions among diseases. On the otherhand, our results have also uncovered an enrichment of metabolicgenes in bi-univocal subsets, as well as an enrichment of essential

genes in pleiotropic subsets. The lack of cellular and molecularphenotyping platforms constrains the possibility to detect sharedfeatures among pathologies. Consequently, this reduces thepossibilities of generating new knowledge on the molecular basesof the pathophenotypic profiles, to distinguish classes andsubclasses of a given disease more precisely [7,11,26]. However,medical semantics remains the standard tool to establish the sets ofobserved clinical features associated with pathologies. In the caseof diseases with predominantly genetic origins, pathophenotypesare usually very conserved among patients. We have shown thatpathophenotypic similarity gene networks can be a great resourceto uncover the molecular mechanisms involved in the responses oforganisms to genetic disturbances. For instance, it shows to beuseful to merge biomolecular components involved in a samepathological process like MSUD.

In the future, network integration and standardization ofmolecular and cellular phenotypes could improve the understand-ing of the evolutionary mechanisms involved in pathologicalprocesses. Further experimental and analytical efforts in thisdirection are warranted.

Figure 8. Maple syrup urine disease pathological and metabolic interactions. In red genes associated with MSUD and in bluepathophenotypic similar genes. (A) Pathophenotypic similarity gene sub-network for MSUD causing genes. It can be noteworthy that there are noinferred relationships between MSUD genes and the rest. (B) Map of branched-chain amino acid degradation pathway from. This map has beenextracted from the Kyoto Encyclopedia of Genes and Genomes (KEGG, hsa:00280) developed by Kanehisa Laboratories. Enzymes encoded by humangenes are in green. (C) Pathophenotypes shared between genes in the same metabolic module.doi:10.1371/journal.pone.0056653.g008

Using Pathological Phenotypes for Human Diseasomes

PLOS ONE | www.plosone.org 16 February 2013 | Volume 8 | Issue 2 | e56653

Global Analysis of the Human PathophenotypicSimilarity Gene Network Merges Disease ModuleComponentsArmando Reyes-Palomares1,2, Rocıo Rodrıguez-Lopez1,2, Juan A. G. Ranea1,2, Francisca

Sanchez Jimenez1,2, Miguel Angel Medina1,2*

1 Department of Molecular Biology and Biochemistry, Faculty of Sciences, University of Malaga, Malaga, Spain, 2 CIBER de Enfermedades Raras (CIBERER), Malaga, Spain

Abstract

The molecular complexity of genetic diseases requires novel approaches to break it down into coherent biological modules.For this purpose, many disease network models have been created and analyzed. We highlight two of them, ‘‘the humandiseases networks’’ (HDN) and ‘‘the orphan disease networks’’ (ODN). However, in these models, each single noderepresents one disease or an ambiguous group of diseases. In these cases, the notion of diseases as unique entities reducesthe usefulness of network-based methods. We hypothesize that using the clinical features (pathophenotypes) to definepathophenotypic connections between disease-causing genes improve our understanding of the molecular eventsoriginated by genetic disturbances. For this, we have built a pathophenotypic similarity gene network (PSGN) andcompared it with the unipartite projections (based on gene-to-gene edges) similar to those used in previous networkmodels (HDN and ODN). Unlike these disease network models, the PSGN uses semantic similarities. This pathophenotypicsimilarity has been calculated by comparing pathophenotypic annotations of genes (human abnormalities of HPO terms) inthe ‘‘Human Phenotype Ontology’’. The resulting network contains 1075 genes (nodes) and 26197 significantpathophenotypic similarities (edges). A global analysis of this network reveals: unnoticed pairs of genes showingsignificant pathophenotypic similarity, a biological meaningful re-arrangement of the pathological relationships betweengenes, correlations of biochemical interactions with higher similarity scores and functional biases in metabolic and essentialgenes toward the pathophenotypic specificity and the pleiotropy, respectively. Additionally, pathophenotypic similaritiesand metabolic interactions of genes associated with maple syrup urine disease (MSUD) have been used to merge into acoherent pathological module. Our results indicate that pathophenotypes contribute to identify underlying co-dependencies among disease-causing genes that are useful to describe disease modularity.

Citation: Reyes-Palomares A, Rodrıguez-Lopez R, Ranea JAG, Jimenez FS, Medina MA (2013) Global Analysis of the Human Pathophenotypic Similarity GeneNetwork Merges Disease Module Components. PLoS ONE 8(2): e56653. doi:10.1371/journal.pone.0056653

Editor: Steve Horvath, University of California Los Angeles, United States of America

Received August 29, 2012; Accepted January 12, 2013; Published February 21, 2013

Copyright: ! 2013 Reyes-Palomares et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors’ experimental work is supported by grants SAF2011/26518, SAF2009/09839, PI12/01096 and PS09/02216 (Spanish Ministry of Economy andCompetitiveness and FEDER), and PIE P08-CTS-3759, CVI-6585 and funds from group BIO-267 (Andalusian Government and FEDER). JR acknowledges grantsSAF2009-09839 and SAF2012-33110 and FSJ acknowledges funds from an INTERCONNECTA-AMER grant (Spanish Ministry of Economy and Competitiveness andFEDER). The ‘‘CIBER de Enfermedades Raras’’ is an initiative from the ISCIII (Spain). The funders had no role in study design, data collection and analysis, decision topublish, or preparation of the manuscript.

Competing Interests: MAM is a PLOS ONE Editorial board member. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data andmaterials. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potentialconflict of interest.

* E-mail: [email protected]

Introduction

Phenotypes are the result of the expression of specific geneticbackgrounds submitted to the influence of changing environmen-tal conditions [1]. Thus, both the development and resultingsymptoms of a given pathology are conditioned by interactingelements at multiple interconnected levels (from molecular tosocial levels) [2]. These complex interactions can be represented asnetworks to be analyzed using the principles of Network Theory[3–6]. In this sense, Network Medicine emerged as a new field tostudy the relationships among diseases and disease-causing genes[7]. Generally, data from genetic association studies establish thebasic information for these analyses. Most of these data areavailable from different public repositories, for instance, OnlineMendelian Inheritance in Man (OMIM) [8] and Orphanet [9].This information can be projected onto networks also known as

diseasomes (i.e. ‘‘the human disease network’’ and ‘‘the orphandisease networks’’) [10,11]. These diseasomes open the possibilityto work on different types of network projections, treatingnetworks as graphs, which can be used to detect emergentinformation. For instance, disease-to-gene associations representbipartite edges (two different types of nodes in every edge) andconform a bipartite graph (as shown in the schematic represen-tation in Figure 1A). On the other hand, projections of gene-to-gene edges and disease-to-disease edges can be inferred from theinitial bipartite graph as two different ‘‘unipartite’’ graphs (eachwith only one type of node). Hence, edges in both inferredunipartite graphs represent either genes associated by a samedisease (Figure 1A) or diseases associated through a same gene(these edges were not considered in this study), respectively. Thefirst type of projections (gene-to-gene) are disease-causing gene

PLOS ONE | www.plosone.org 1 February 2013 | Volume 8 | Issue 2 | e56653

Bioquímica estructural

• Bioquímica estructural • Biología de sistemas

Clin

Inf

Page 29: 150522 bioinfo gis lr

O sea, el bioinformático encuentra la aguja del pajar

26

Page 30: 150522 bioinfo gis lr

La bioinformática conecta enfermedades inconexas

27

Se sabía que los enfermos de alzhéimer sufrían menos cáncer que el resto de la población

Molecular Evidence for the Inverse Comorbidity betweenCentral Nervous System Disorders and Cancers Detectedby Transcriptomic Meta-analysesKristina Ibanez1., Cesar Boullosa1., Rafael Tabares-Seisdedos2, Anaıs Baudot3*, Alfonso Valencia1*

1 Structural Biology and Biocomputing Programme, Spanish National Cancer, Research Centre (CNIO), Madrid, Spain, 2 Department of Medicine, University of Valencia,

CIBERSAM, INCLIVA, Valencia, Spain, 3 Aix-Marseille Universite, CNRS, I2M, UMR 7373, Marseille, France

Abstract

There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower thanexpected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity isdriven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. Weconducted transcriptomic meta-analyses of three CNS disorders (Alzheimer’s disease, Parkinson’s disease and Schizophrenia)and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap wasobserved between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genesdownregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in oppositedirections at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which couldincrease the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulationof another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing theCancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, andreveal potential new candidates, in particular related with protein degradation processes.

Citation: Ibanez K, Boullosa C, Tabares-Seisdedos R, Baudot A, Valencia A (2014) Molecular Evidence for the Inverse Comorbidity between Central NervousSystem Disorders and Cancers Detected by Transcriptomic Meta-analyses. PLoS Genet 10(2): e1004173. doi:10.1371/journal.pgen.1004173

Editor: Marshall S. Horwitz, University of Washington, United States of America

Received September 16, 2013; Accepted December 30, 2013; Published February 20, 2014

Copyright: ! 2014 Ibanez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by a Fellowship from Obra Social la Caixa grant to KI (http://obrasocial.lacaixa.es/laCaixaFoundation/home_en.html), FPI grantBES-2008-006332 to CB and grant BIO2012 to AV Group. The funders had no role in study design, data collection and analysis, decision to publish, or preparationof the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected] (AB); [email protected] (AV)

. These authors contributed equally to this work.

Introduction

Epidemiological evidences point to a lower-than-expectedprobability of developing some types of Cancer in certain CNSdisorders, including Alzheimer’s disease (AD), Parkinson’s disease(PD) and Schizophrenia (SCZ) [1–6]. Our current understandingof such inverse comorbidities suggests that this phenomenon isinfluenced by environmental factors, drug treatments and otheraspects related with disease diagnosis. Genetics can additionallycontribute to the inverse comorbidity between complex diseases,together with these external factors (for review, see [3–7]). Inparticular, we propose the deregulation in opposite directions of acommon set of genes and pathways as an underlying cause ofinverse comorbidities.

To investigate the biological plausibility of this hypothesis, abasic initial step is to establish the existence of inverse geneexpression deregulations (i.e., down- versus up-regulations) in CNSdisorders and Cancers. Towards this objective, we have performedintegrative meta-analyses of collections of gene expression data,publically available for AD, PD and SCZ, and Lung (LC),Colorectal (CRC) and Prostate (PC) Cancers. Clinical andepidemiological data previously reported inverse comorbidities forthese complex disorders, according to population studies assessingthe Cancer risks among patients with CNS disorders [8–17].

Results and Discussion

For each CNS disorder and Cancer type independently, weundertook meta-analyses from a large collection of microarraygene expression datasets to identify the genes that are significantlyup- and down-regulated in disease when compared with theircorresponding healthy control samples (Differentially ExpressedGenes – DEGs –, FDR corrected p-value (q-value),0.05, seeMethods and Table S1). Then, the DEGs of the CNS disordersand Cancer types were compared to each others. There weresignificant overlaps (Fisher’s exact test, corrected p-value (q-value),0.05, see Methods) between the DEGs upregulated inCNS disorders and those downregulated in Cancers. Similarly,DEGs downregulated in CNS disorders overlapped significantlywith DEGs upregulated in Cancers (Figure 1A). Significantoverlaps between DEGs deregulated in opposite directions in CNSdisorders and Cancers are still observed while setting morestringent cutoffs for the detection of DEGs (qvalues lower than0.005, 0.0005, 0.00005 and 0.000005, Figure S1). A significantoverlap between DEGs deregulated in the same direction was onlyidentified in the case of CRC and PD upregulated genes(Figure 1A).

A molecular interpretation of the inverse comorbidity between CNSdisorders and Cancers could be that the downregulation of certain

PLOS Genetics | www.plosgenetics.org 1 February 2014 | Volume 10 | Issue 2 | e1004173

Molecular Evidence for the Inverse Comorbidity betweenCentral Nervous System Disorders and Cancers Detectedby Transcriptomic Meta-analysesKristina Ibanez1., Cesar Boullosa1., Rafael Tabares-Seisdedos2, Anaıs Baudot3*, Alfonso Valencia1*

1 Structural Biology and Biocomputing Programme, Spanish National Cancer, Research Centre (CNIO), Madrid, Spain, 2 Department of Medicine, University of Valencia,

CIBERSAM, INCLIVA, Valencia, Spain, 3 Aix-Marseille Universite, CNRS, I2M, UMR 7373, Marseille, France

Abstract

There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower thanexpected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity isdriven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. Weconducted transcriptomic meta-analyses of three CNS disorders (Alzheimer’s disease, Parkinson’s disease and Schizophrenia)and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap wasobserved between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genesdownregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in oppositedirections at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which couldincrease the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulationof another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing theCancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, andreveal potential new candidates, in particular related with protein degradation processes.

Citation: Ibanez K, Boullosa C, Tabares-Seisdedos R, Baudot A, Valencia A (2014) Molecular Evidence for the Inverse Comorbidity between Central NervousSystem Disorders and Cancers Detected by Transcriptomic Meta-analyses. PLoS Genet 10(2): e1004173. doi:10.1371/journal.pgen.1004173

Editor: Marshall S. Horwitz, University of Washington, United States of America

Received September 16, 2013; Accepted December 30, 2013; Published February 20, 2014

Copyright: ! 2014 Ibanez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by a Fellowship from Obra Social la Caixa grant to KI (http://obrasocial.lacaixa.es/laCaixaFoundation/home_en.html), FPI grantBES-2008-006332 to CB and grant BIO2012 to AV Group. The funders had no role in study design, data collection and analysis, decision to publish, or preparationof the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected] (AB); [email protected] (AV)

. These authors contributed equally to this work.

Introduction

Epidemiological evidences point to a lower-than-expectedprobability of developing some types of Cancer in certain CNSdisorders, including Alzheimer’s disease (AD), Parkinson’s disease(PD) and Schizophrenia (SCZ) [1–6]. Our current understandingof such inverse comorbidities suggests that this phenomenon isinfluenced by environmental factors, drug treatments and otheraspects related with disease diagnosis. Genetics can additionallycontribute to the inverse comorbidity between complex diseases,together with these external factors (for review, see [3–7]). Inparticular, we propose the deregulation in opposite directions of acommon set of genes and pathways as an underlying cause ofinverse comorbidities.

To investigate the biological plausibility of this hypothesis, abasic initial step is to establish the existence of inverse geneexpression deregulations (i.e., down- versus up-regulations) in CNSdisorders and Cancers. Towards this objective, we have performedintegrative meta-analyses of collections of gene expression data,publically available for AD, PD and SCZ, and Lung (LC),Colorectal (CRC) and Prostate (PC) Cancers. Clinical andepidemiological data previously reported inverse comorbidities forthese complex disorders, according to population studies assessingthe Cancer risks among patients with CNS disorders [8–17].

Results and Discussion

For each CNS disorder and Cancer type independently, weundertook meta-analyses from a large collection of microarraygene expression datasets to identify the genes that are significantlyup- and down-regulated in disease when compared with theircorresponding healthy control samples (Differentially ExpressedGenes – DEGs –, FDR corrected p-value (q-value),0.05, seeMethods and Table S1). Then, the DEGs of the CNS disordersand Cancer types were compared to each others. There weresignificant overlaps (Fisher’s exact test, corrected p-value (q-value),0.05, see Methods) between the DEGs upregulated inCNS disorders and those downregulated in Cancers. Similarly,DEGs downregulated in CNS disorders overlapped significantlywith DEGs upregulated in Cancers (Figure 1A). Significantoverlaps between DEGs deregulated in opposite directions in CNSdisorders and Cancers are still observed while setting morestringent cutoffs for the detection of DEGs (qvalues lower than0.005, 0.0005, 0.00005 and 0.000005, Figure S1). A significantoverlap between DEGs deregulated in the same direction was onlyidentified in the case of CRC and PD upregulated genes(Figure 1A).

A molecular interpretation of the inverse comorbidity between CNSdisorders and Cancers could be that the downregulation of certain

PLOS Genetics | www.plosgenetics.org 1 February 2014 | Volume 10 | Issue 2 | e1004173

Figure 1. Comparisons of Differentially Expressed Genes (DEGs). (A) Comparisons of DEGs associated with Central Nervous System (CNS)disorders and Cancers. The DEGs identified as significantly up- and down-regulated (q-value,0.05) after gene expression meta-analysis in each CNSdisorder (Alzheimer’s Disease, AD; Parkinson’s Disease, PD; and Schizophrenia, SCZ) and Cancer type (Lung Cancer, LC; Colorectal Cancer, CRC; andProstate Cancer, PC) are compared to each others. (B) Comparisons of DEGs between CNS disorders, Cancers and Asthma, HIV, Malaria, Dystrophy,Sarcoidosis. The DEGs identified as significantly up- and down-regulated (q-value,0.05) after gene expression meta-analysis in each CNS disorder(Alzheimer’s Disease, AD; Parkinson’s Disease, PD; and Schizophrenia, SCZ), Cancer type (Lung Cancer, LC; Colorectal Cancer, CRC; and ProstateCancer, PC), and in Asthma, HIV, Malaria, Dystrophia and Sarcoidosis, are compared to each others. Cells are coloured according to the significance ofthe overlaps (Fisher’s exact test, Bonferroni correction for multiple testing, see Methods). Grey cells correspond to non-significant overlaps(q-value.0.05).doi:10.1371/journal.pgen.1004173.g001

Table 1. DEGs significantly downregulated in the three CNS disorders and upregulated in the three Cancer types (q-value,0.05).

PPIAP11, IARS, GGCT, NME2, GAPDHP1, CDC123, PSMD8, MRPS33, FIBP, OAZ2, IARS2, SLC35B1, APOO, TMEM189-UBE2V1, VDAC1, TMED3, SMS, DNM1L, PRPS1, SRSF2,TMEM14D, TOMM70A, ATP6V1C1, NUP93, MRPL15, UBA5, PPIH, SMYD3, NIT2, SRD5A1, NUDT21, MRPL12, EEF1E1, MRPS7, TTPAL, BZW1P2, RP11-552M11.4, TSN, MECR,ZWINT, RPRD1A, UCHL5, NHP2P2, TFB2M, FEN1, CGREF1, IMPAD1, ARL1, ACLY, MRPL42, LSM4, KPNA1, TIMM23B, RP11-164O23.5, RP11-762H8.2, FARSA, MRPL4, API5,RP3-425P12.4, RFC3, RANBP9, TFCP2, GMDS, CCNB1, TMEM177, GUF1, HSPA13, NMD3, GCFC2, TUBGCP5, TBCE, YKT6, PHF14, BRCC3

doi:10.1371/journal.pgen.1004173.t001

Inverse Comorbidity among Cancer and CNS Disorders

PLOS Genetics | www.plosgenetics.org 3 February 2014 | Volume 10 | Issue 2 | e1004173

Comparación de genes con expresión diferencialWorkflow

El flujo de trabajo

Cánceres

Enfe

rmed

ades

men

tale

s

Ing

Clin

Page 31: 150522 bioinfo gis lr

Se ve con claridad

28

AD and PD, and upregulated in CRC (Reactome database;Figure S2).

Aside the Wnt and p53 pathways, our analysis reveals otherpathways related to protein folding and protein degradationdisplaying patterns of downregulation in CNS disorders andupregulation in Cancers, and that may be relevant for inversecomorbidity. For instance, the Ubiquitin/Proteasome system isconsistently downregulated in CNS disorders and upregulated inCancers according to the three pathway databases analyzed(Figure 2, Figure S2, Table S3). The inverse relationshipbetween the levels of expression deregulations of these pathwayspossibly suggests opposite roles in CNS disorders and Cancers.

A detailed examination of the KEGG pathways deregulated inopposite directions in CNS disorders and Cancers finallyrevealed that 89% of the KEGG pathways that wereupregulated in Cancers and downregulated in CNS disordersare related to Metabolism and Genetic Information Processing(Figure 2, Figure 3). By contrast, the pathways downregulatedin Cancers and upregulated in CNS disorders are related to thecell’s communication with its environment (EnvironmentalInformation Processing and Organismal System; Figure 2,Figure 3). Hence, global regulations of cellular activity mayaccount for a protective effect between inversely comorbiddiseases.

Table 2. DEGs significantly upregulated in the three CNS disorders and downregulated in the three Cancer types (q-value,0.05).

MT2A, MT1X, NFKBIA, AC009469.1, DHRS3, CDKN1A, TNFRSF1A, CRYBG3, IL4R, MT1M, FAM107A, ITPKC, MID1, IL11RA, AHNAK, KAT2B, BCL2, PTH1R, NFASC

doi:10.1371/journal.pgen.1004173.t002

Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24]significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways werecompared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status asCancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blueand yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels arecoloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process(pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/).doi:10.1371/journal.pgen.1004173.g002

Inverse Comorbidity among Cancer and CNS Disorders

PLOS Genetics | www.plosgenetics.org 4 February 2014 | Volume 10 | Issue 2 | e1004173

El cáncer (próstata, colorrectal, pulmón) comparte 93 genes con otras enfermedades del sistema nervioso

central (párkinson, alzhéimer, esquizofrenia)

↑↑ cáncer ↓↓ SNC enfermo

74 genes19 genescáncer ↓↓

SNC enfermo↑↑

Genes exclusivos del cáncer

Genes exclusivos del SNC enfermo

Page 32: 150522 bioinfo gis lr

Conclusión: la formación del bioinformático le impide amedrentarse ante lo desconocido

29

Bioinformatician Biotechnologist

Other scientists

Page 33: 150522 bioinfo gis lr

Nuestro pequeño grupo interdisciplinar

30

Think & design Coding

Testing

Noé

B

Rocío

B

Hicham

B

Biólogos, médicos y tal

Ing. Informático

B

C

IS Bioinformáticos

Los bioinformáticos

IS

C

Rafa

Gonzalo

B

Darío

C

David

B

Isabel

B

Pedro

B

Rosario

B

Marina

B

Macarena

B