46
Valerie De Anda Ecology Institute UNAM México Laboratory of Computational Biology Zaragoza CSIC Spain [email protected] https://github.com/valdeanda @val_deanda The 12th International Conference on Genomics October 26 to 29, 2017 Shenzhen, China

Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Embed Size (px)

Citation preview

Page 1: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Valerie De AndaEcology Institute UNAM México

Laboratory of Computational Biology Zaragoza CSIC Spain

[email protected]://github.com/valdeanda

@val_deanda

The 12th International Conference on GenomicsO c t o b e r 2 6 t o 2 9 , 2 0 1 7S h e n z h e n , C h i n a

Page 2: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Revolution in microbialecology field

»

Genomic reconstruction: microbial dark matter

»

Large amount of data

»

Ability to evaluate complex metabolic

functions data in large data sets

remains:

The iceberg illusion of metagenomics

Biologically and computationally

challenging »»Diversity, ecology, evolution and functional makeup of the microbial world

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 / 2 2

»Really complex to infer and test

biological hypothesis in

such data

M E B S

Page 3: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

The Iceberg illusion of metagenomics

Microbial ecology-derived ‘omic’ studies

What do we need to improve efficiency of data processing?

Biological data interpretation

(evaluate, compare and analyze

complex data in a large scale)

Computationally efficiency:

(high performance, accuracy, high speed,

data processing, reproducibility)

» Most abundant

» Marker genes

Metagenomic data

» Statistically

≠ features

Gomez Cabrero et al 2014 BMC SBReshetova et al 2013 BMC SB

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 3 / 2 2M E B S

Page 4: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration

For a given system,multiple sources (andpossible types) of dataare available and wewant to study themintegratively to improveknowledge discovery

What are the available data that can be used to characterize large-scale metabolic machineries?

How do we integrate all to improve the understanding the system?.

CGomez Cabrero et al 2014 BMC SBReshetova et al 2013 BMC SB

Prior knowledge: Toreduce the solutionspace and/or tofocus the analysis onbiological meaningfulregions(specific metabolicmachineries)

(Targeted)

Metabolism Taxa involved in that particular

metabolism

Proteins involved in that particular

metabolism

Public available genomes?

Mathematical model Relative entropy

Informative ScoreMEBS

𝐇′ =

𝑖

𝑃 𝑖 log2𝑃 𝑖

𝑄 𝑖

n0

≥1

≤0

Informative

Non-Informative

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 4 / 2 2M E B S

Page 5: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

What are the available data that can be used to characterize large-scale metabolic machineries?

How do we integrate all to improve the understanding the system?.

C

Prior knowledge: Toreduce the solutionspace and/or tofocus the analysis onbiological meaningfulregions(specific metabolicmachineries)

(Targeted)

Metabolism Taxa involved in that particular

metabolism

Proteins involved in that particular

metabolism

Large scale dataset

Mathematical model Relative entropy

Informative ScoreMEBS

𝐇′ =

𝑖

𝑃 𝑖 log2𝑃 𝑖

𝑄 𝑖

n0

≥1

≤0

Informative

Non-Informative

Does is it really work?

Can capture an entiremetabolic machinery?Can we used toevaluate, compare andanalyze complex data inlarge scale ? (genomes,metagenomes)

Computationally efficient? Accurate, high speed in large datasets and reproducible

Data integration

Single Value

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 5 / 2 2M E B S

Page 6: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration: case of study

Atmosphere

Solar E°

Redox reactions

Metabolic guilds

Geological processes

An entire biogeochemical cycle

S-cycle

CHONS-P

What are the available data that can be used to characterize large-scale metabolic machineries?

How do we integrate all to improve the understanding the system?.

Taxa involved in that particular

metabolism

Proteins involved in that particular

metabolism

Large scale datasets

Mathematical model Relative entropy

Informative ScoreMEBS

𝐇′ =

𝑖

𝑃 𝑖 log2𝑃 𝑖

𝑄 𝑖

n0

≥1

≤0

Informative

Non-Informative

They really capture the

major processes

involved in the mobilization

and use of S-compounds

through Earth biosphere

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 6 / 2 2M E B S

Page 7: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration: case of study S-cycle

https://metacyc.org/META/NEW-IMAGE?object=Sulfur-Metabolism

http://www.genome.jp/kegg-bin/show_pathway?map00920

Manually curated reconstruction of the S-

metabolic machinery

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 7 / 2 2M E B S

Page 8: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration: case of study S-cycle

Taxa: metabolic guilds Metabolic machinery

i) CLSB: 24 generaii) PSB: 25 generaiii) GSB: 9 generaiv) SRB: 40 generav) SRM:19 genera vi) SO:4 genera

SuliN=161

i) Sulfur compounds

ii) Metabolic pathways

iii) Genesiv) Proteins

Complete nr sequenced S-genomes

SucyN=152

txt

GCF_000006985.1 Chlorobium tepidum TLS

GCF_000007005.1 Sulfolobus solfataricus P2

GCF_000007305.1 Pyrococcus furiosus DSM 3638

GCF_000008545.1 Thermotoga maritima MSB8

GCF_000008625.1 Aquifex aeolicus VF5

GCF_000008665.1 Archaeoglobus fulgidus DSM 4304

GCF_000009965.1 Thermococcus kodakarensis KOD1

>Protein1

MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM

MGAGYFSPAGFMNV

>Protein 2

MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID

>Protein 3

MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC

YSCVKACPHNAIDVR

Evidence linking them with the S-cycle (Curated DB and primarily literature)

Evidence suggesting their physiological and biochemical involvement in the use of sulfur compounds.

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 8 / 2 2M E B S

Page 9: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration: case of study S-cycle

Metabolic machinery

i) Sulfur compounds

ii) Metabolic pathways

iii) Genesiv) Proteins

SucyN=152

>Protein1

MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM

MGAGYFSPAGFMNV

>Protein 2

MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID

>Protein 3

MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC

YSCVKACPHNAIDVR

Evidence linking them with the S-cycle (Curated DB and primarily literature)

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 9 / 2 2M E B S

Page 10: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration: case of study S-cycleTable 1. Metabolic pathways of global biogeochemical S-cycle Pathway number Metabolisma

Chemical processb Sulfur compound Typec

Chemical formula Sourced

Number of Pfam domaise

P1 DS O Sulfite I SO32- E 9 P2 DS O Thiosulfate I S2O3

2- E 10 P3 DS O Tetrathionate I S4O6

2- E 2 P4 DS R Tetrathionate I S4O6

2- E 17 P5 DS R Sulfate I SO42- E 20 P6 DS R Elemental sulfur I Sº E 20 P7 DS D Thiosulfate I S2O3

2- E 9 P8 DS O Carbon disulfide O CS2 E 1 P9 A DE Alkanesulfonate O CH3O3SR S 5

P10 A R Sulfate I SO42- S 20

P11 DS O Sulfide I H2S E/S 29 P12 A DE L-cysteate O C3H6NO5S C/E 1 P13 A DE Dimethyl sulfone O C2H6O2S C/E 3 P14 A DE Sulfoacetate O C2H2O5S C/E 2 P15 A DE Sulfolactate O C3H4O6S C/S 14 P16 A DE Dimethyl sulfide O C2H6S C/S 16 P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12 P18 A DE Methylthiopropanoate O C4H7O2S C/S 7 P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7 P20 DS O Elemental sulfur I S° C/S/E 7 P21 DS D Elemental sulfur I S° C/S/E 1 P22 A DE Methanesulfonate O CH3O3S C/S/E 7 P23 A DE Taurine O C2H7NO3S C/S/E 11 P24 DS M Dimethyl sulfide O C2H6S C 1 P25 DS M Metylthio-propanoate O C4H7O2S C 1 P26 DS M Methanethiol O CH4S C 1 P27 A DE Homotaurine O C3H9NO3S N 1 P28 A B Sulfolipid O SQDG 4

P29 Markers Markers 12

1

Metabolic machinery

i) Sulfur compounds

ii) Metabolic pathways

iii) Genesiv) Proteins

SucyN=152

>Protein1

MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM

MGAGYFSPAGFMNV

>Protein 2

MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID

>Protein 3

MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC

YSCVKACPHNAIDVR

Evidence linking them with the S-cycle (Curated DB and primarily literature)

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 0 / 2 2M E B S

Page 11: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Data integration: case of study S-cycle

Metabolic machinery

i) Sulfur compounds

ii) Metabolic pathways

iii) Genesiv) Proteins

SucyN=152

>Protein1

MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM

MGAGYFSPAGFMNV

>Protein 2

MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID

>Protein 3

MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC

YSCVKACPHNAIDVR

Evidence linking them with the S-cycle (Curated DB and primarily literature)

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 1 / 2 2M E B S

Page 12: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Large omic datasetsWhat are the available data that can be used to

characterize large-scale metabolic pathways?

How do we integrate all to improve the understanding the system?.

Mathematical model Relative entropy

Informative ScoreMEBS

𝐇′ =

𝑖

𝑃 𝑖 log2𝑃 𝑖

𝑄 𝑖

n0

≥1

≤0

Informative

Non-Informative

Taxa involved in that particular

metabolism

Proteins involved in that particular

metabolism

txt

2,107 nr genomes (faa)

Gen1,5 GB

How many genomes were available at the time of analysis?

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 2 / 2 2

Num of complete prokarioticgenomes

≈4,000 (NCBI Refseq) Dec 2016

Non redundant 2,107 Dec 2016

Publicavailableand manuallycuarteddata

M E B S

Page 13: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Large omic datasetsWhat are the available data that can be used to characterize large-scale metabolic machineries?

How do we integrate all to improve the understanding the system?.

Mathematical model Relative entropy

Informative ScoreMEBS

𝐇′ =

𝑖

𝑃 𝑖 log2𝑃 𝑖

𝑄 𝑖

n0

≥1

≤0

Informative

Non-Informative

Taxa: Suli Proteins: Sucy

txt

2,107 nr genomes (faa)

Gen MetGenF

104GB≈ 500 GB

1,5 GB

How many metagenomes were available at the time of analysis?

i) were publicly availableii) contained associated metadata iii) had been isolated from well-defined environments

(i.e., rivers, soil, biofilms)iv) discarding host associated microbiome sequences

(i.e., human, cow, chicken)

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 3 / 2 2M E B S

Page 14: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

112-HMM of S-proteins

C

txt

GCF_000006985.1 Chlorobium tepidum TLS

GCF_000007005.1 Sulfolobus solfataricus P2

GCF_000007305.1 Pyrococcus furiosus DSM 3638

GCF_000008545.1 Thermotoga maritima MSB8

GCF_000008625.1 Aquifex aeolicus VF5

GCF_000008665.1 Archaeoglobus fulgidus DSM 4304

GCF_000009965.1 Thermococcus kodakarensis KOD1

>Protein1

MIKPVGSDELKPLFVYDPEEHHKLSHEAESLPSVVISSQGPRVSSM

MGAGYFSPAGFMNV

>Protein 2

MAYKTIIEDGIDVLVVGAGLGGTGAAFEARYWGQDKKIVIAEKANID

>Protein 3

MPTFVYMTRCDGCGQCVDICPSDIMHIDTTIRRAYNIEPNMCWEC

YSCVKACPHNAIDVR2,107 nr genomes (faa)

Gen GenF

Stage 1: Manual curation and omic datasets

Stage 2: Domain composition

Stage 4: Informative Score Can capture the S- metabolic machinery?Can we used to evaluate, compare and analyzecomplex data in large scale ? (genomes, metagenomes)

Computationally efficient? Accurate, high speed in large datasets and reproducible Single Value

Mathematical model

𝐇′ =

𝑖

𝑃 𝑖 log2𝑃 𝑖 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)

𝑄 𝑖 (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑)

n

≥1

Informative

Non-Informative

Stage 3: Relative Entropy

Domains enriched among the microorganisms of interest

𝑃 𝑖 = frequency of protein domain i in S genomes (161)

Q 𝑖 = frequency of protein domain i in Gen (2,107)

0

≤0

Taxa: Suli Proteins: Sucy

MEBS: GENERAL OVERVIEW

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 4 / 2 2M E B S

Page 15: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

https://github.com/eead-csic-compbio/metagenome_Pfam_score

2,107 genomes 161 Suli +

935 metagenomes

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 5 / 2 2M E B S

Page 16: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

an unnamed endosymbiont of a scaly snail from a black smoker chimney

archaeon Geoglobus ahangari, sampled from a 2,000m depth hydrothermal vent .

Distribution of Sulfur Score (SS)

in 2,107 nr-genomes

CandidatusDesulforudisaudaxviator MP104C

Metagenomic reconstructions hard-to culture taxa

SurN=192

»

»»

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 6 / 2 2M E B S

Page 17: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Positive instances

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 7 / 2 2

SuliN=161

(1946) > Negative instances.

Gen

ROC CURVE• Two-dimensional graphs in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis. • Depicts relative tradeoffs between benefits (true positives) and costs (false positives).

Perfect classification

M E B S

Page 18: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Distribution of Sulfur Score (SS) in the metagenomic dataset (935 metagenomes)

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

Distribution of SS values observed in 935 metagenomes classified in terms of features (X-axis) and colored according to their particular habitats Features are sorted according to their median SS values. Green lines indicate the lowest and largest 95th percentiles observed across MSL classes.

Geo-localized metagenomes sampled around the globe are colored according to their SS values

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2M E B S

Page 19: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

mebsBG cygling

Sgenes

Sgenomes

Informative

Non-informative

9.5

Markers Comp

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

C

Conclusions» We present MEBS a new open source software to evaluate, quantify, compare, and

predict the metabolic machinery of interest in large ‘omic’ datasets using one single value

» To test the applicability of this approach, we evaluated one of the most complex biogeochemical cycles the sulfur cycle.

» Using data integration and manual curation we reconstructed the entire sulfur machinery: Suli and Sucy

» We prove that the use of the mathematical framework of the relative entropy can be used to capture complex metabolic machineries in large scale omic samples.

» MEBS powerful and broadly applicable approach to predict, and classify microorganisms closely involved in the sulfur cycle even in hard-to culture microbial lineages

» Computationally efficient, accurate (AUC0985) and reproducible.

» Not in the presentation: the entropy can be used to detect marker domains and the completeness of the S-cycle pathways can be benchmarked in large scale

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 9 / 2 2

MEBS

M E B S

Page 20: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 0 / 2 2

mebsBG CYGLING

9.5

C N O

SFe P

BIOREMEDIATION ANTIBIOTICS

EXTREME ENVIRONMENTS

AGRICULTURE

?

Perspectives• We are currently finishing the analyses to demonstrate the applicability of

this approach to other biogeochemical cycles (C, N, O, Fe, P). • Thereby, we hope that the pipeline MEBS will facilitate analysis of

biogeochemical cycles or complex metabolic networks carried out by specific prokaryotic guilds, such as bioremediation processes (i.e., degradation of hydrocarbons, toxic aromatic compounds, heavy metals etc.).

• We look forward to collaborate and help other researchers by integrating comprehensive databases that might be helpful to the scientific community.

• Furthermore, we are currently working to improve the algorithm by using only a list of sequenced genomes involved in the metabolism of interest, in order to reduce the manual curation effort.

• We are also considering taking k-mers instead of peptide Hidden Markov Models to increase the speed of the pipeline.

• We anticipate that our platform will stimulate interest and involvement among the scientific community to explore uncultured genomes derived from large metagenomic sequences: exploring microbial dark matter

M E B S

Page 21: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

IcoquihZapata

Valeria SouzaLuis Equiarte

Bruno Contreras

De Anda et al., 2017 MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle GigaScience in press

Cesar-Poot Hernandez

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 1 / 2 2M E B S

Page 22: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

L A B O R A T O R Y O F M O L E C U L A R A N D E X P E R I M E N T A L E V O L U T I O N E C O L O G Y I N S T I T U T E U N A M M E X I C O

22

L A B O R A T O R Y O F C O M P U T A T I O N A L B I O L O G Y

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

T h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 2 2 / 2 2

Thank you for your attention!

M E B S

Page 23: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

supplementary files

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d am e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 / 1 2

Page 24: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

A B

Gen (n=2,107) Met (n=935) D. acidiphilus

HydrogenobacullumA. caldusA. ferrivorans

T. mobilis

D. aromaticaT. hauera sp. T. humireducensA. denitrificans

S. tokodaiiA. hospitalis (among other 12 genomes)

P. phaeoclathratiformeC. chlorochromatiiC. tepidumT. denitrificansT. violascensS. thiotaurini

Completeness

Supplementary files

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Page 25: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Table 1. Metabolic pathways of global biogeochemical S-cycle Pathway number Metabolisma

Chemical processb Sulfur compound Typec

Chemical formula Sourced

Number of Pfam domaise

P1 DS O Sulfite I SO32- E 9 P2 DS O Thiosulfate I S2O3

2- E 10 P3 DS O Tetrathionate I S4O6

2- E 2 P4 DS R Tetrathionate I S4O6

2- E 17 P5 DS R Sulfate I SO42- E 20 P6 DS R Elemental sulfur I Sº E 20 P7 DS D Thiosulfate I S2O3

2- E 9 P8 DS O Carbon disulfide O CS2 E 1 P9 A DE Alkanesulfonate O CH3O3SR S 5

P10 A R Sulfate I SO42- S 20

P11 DS O Sulfide I H2S E/S 29 P12 A DE L-cysteate O C3H6NO5S C/E 1 P13 A DE Dimethyl sulfone O C2H6O2S C/E 3 P14 A DE Sulfoacetate O C2H2O5S C/E 2 P15 A DE Sulfolactate O C3H4O6S C/S 14 P16 A DE Dimethyl sulfide O C2H6S C/S 16 P17 A DE Dimethylsulfoniopropionate O C5H10O2S C/S/E 12 P18 A DE Methylthiopropanoate O C4H7O2S C/S 7 P19 A DE Sulfoacetaldehyde O C2H3O4S C/S 7 P20 DS O Elemental sulfur I S° C/S/E 7 P21 DS D Elemental sulfur I S° C/S/E 1 P22 A DE Methanesulfonate O CH3O3S C/S/E 7 P23 A DE Taurine O C2H7NO3S C/S/E 11 P24 DS M Dimethyl sulfide O C2H6S C 1 P25 DS M Metylthio-propanoate O C4H7O2S C 1 P26 DS M Methanethiol O CH4S C 1 P27 A DE Homotaurine O C3H9NO3S N 1 P28 A B Sulfolipid O SQDG 4

P29 Markers Markers 12

1

The protein domains currently present in any given sample are divided by the total number of domains in the pre-defined pathway

Completeness

Supplementary files

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Page 26: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Supplementary files

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Page 27: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

35 private metagenomes:microbial mats, sediment

and lake water

Reads

Processing

ORF prediction

Gene Calling

(aa residues)

Mean Size Length

https://microbiome.wordpress.com/

Counts of prokaryotic genomes in each NCBI category as of July 2017

Non-redundant Redundant

LARGE SCALE

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 28: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

GenF size category 5-percentile 95-percentile

Real -0.091 0.101

30 -0.086 0.105

60 -0.09 0.104

100 -0.088 0.1

150 -0.09 0.103

200 -0.89 0.105

250 -0.09 0.106

300 -0.09 0.1

Completeness

Supplementary files

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Page 29: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Table 2 Informative Pfam domains with high H’ and low std. Novel proposed molecular marker

domains in metagenomic data of variable MSL

Pfam ID

( Suli

ocurrences)

H’

mean

H’

std Description

PF12139

58/161

1.2 0.01 Adenosine-5'-phosphosulfate reductase beta subunit: Key protein domain for both sulfur oxidation/reduction metabolic pathways. Has been widely studied in the dissimilatory sulfate reduction metabolism. In all recognized sulfate-reducing prokaryotes, the dissimilatory process is mediated by three key enzymes: Sat, Apr and Dsr. Homologous proteins are also present in the anoxygenic photolithotrophic and chemolithotrophic sulfur-oxidizing bacteria (CLSB, PSB, GSB), in different cluster organization [35].

PF00374

135/161

1.1 0.09 Nickel-dependent hydrogenase: Hydrogenases with S-cluster and selenium containing Cys-x-x-Cys motifs involved in the binding of nickel. Among the homologues of this hydrogenase domain, is the alpha subunit of the sulfhydrogenase I complex of Pyrococcus furiosus, that catalyzes the

reduction of polysulfide to hydrogen sulfide with NADPH as the electron donor [55].

PF01747

103/161

1.03 0.06 ATP-sulfurylase: Key protein domain for both sulfur oxidation and reduction processes. The enzyme catalyzes the transfer of the adenylyl group from ATP to inorganic sulfate, producing

adenosine 5′-phosphosulfate (APS) and pyrophosphate, or the reverse reaction [56].

PF02662

62/161

0.82 0.03 Methyl-viologen-reducing hydrogenase, delta subunit: Is one of the enzymes involved in methanogenesis and encoded in the mth-flp-mvh-mrt cluster of methane genes in Methanothermobacter thermautotrophicus. No specific functions have been assigned to the delta

subunit [48].

PF10418

122/161

0.78 0.06 Iron-sulfur cluster binding domain of dihydroorotate dehydrogenase B: Among the homologous genes in this family are asrA and asrB from Salmonella enterica enterica serovar Typhimurium, which encode 1) a dissimilatory sulfite reductase, 2) a gamma subunit of the sulfhydrogenase I complex of Pyrococcus furiosus and, 3) a gamma subunit of the sulfhydrogenase II complex of the same organism [12].

PF13247

149/161

0.66 0.06 4Fe-4S dicluster domain: Homologues of this family include: 1) DsrO, a ferredoxin-like protein, related to the electron transfer subunits of respiratory enzymes, 2) dimethylsulfide dehydrogenase β subunit (ddhB ), involved in dimethyl sulfide degradation in Rhodovulum sulfidophilum and 3) sulfur reductase FeS subunit (sreB) of Acidianus ambivalens, involved in the sulfur reduction using

H2 or organic substrates as electron donors [12].

PF04358

73/161

0.52 0 DsrC like protein: DsrC is present in all organisms encoding a dsrAB sulfite reductase (sulfate/sulfite reducers or sulfur oxidizers). The physiological studies suggest that sulfate reduction rates are determined by cellular levels of this protein. The dissimilatory sulfate reduction couples the four-electron reduction of the DsrC trisulfide to energy conservation [57]. DsrC was initially described as a subunit of DsrAB, forming a tight complex; however, it is not a subunit, but rather a protein with which DsrAB interacts. DsrC is involved in sulfur-transfer reactions; there is a disulfide bond between the two DsrC cysteines as a redox-active center in the sulfite reduction pathway. Moreover, DsrC is among the most highly expressed sulfur energy metabolism genes in isolated organisms and meta- transcriptomes (Santos et al., 2015).

PF01058

158/161

0.45 0.01 NADH ubiquinone oxidoreductase, 20 Kd subunit: Homologous genes are found in the delta

subunits of both sulfhydrogenase complexes of Pyrococcus furiosus [12].

PF01568

156/161

0.4 0.05 Molydopterin dinucleotide binding domain: This domain corresponds to the C-terminal domain IV

in dimethyl sulfoxide (DMSO) reductase [48].

Supplementary files

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Page 30: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

https://github.com/eead-csic-compbio/metagenome_Pfam_score

Modo avanzado manual

» Biogeochemical cycles (CNOPFe)

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 31: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Species SS Genus GuildAmmonifex degensii KC4 12,508 Moorella group SRB/SRArchaeoglobus profundus DSM 5631 12,024 Archaeoglobus SRBCandidatus Desulforudis audaxviator MP104C 11,972 Candidatus Desulforudis Sur

Pelodictyon phaeoclathratiforme BU-1 11,836Chlorobium/Pelodictyon

group GSB

Chlorobium phaeobacteroides BS1 11,649Chlorobium/Pelodictyon

group GSB

Chlorobium chlorochromatii CaD3 11,625Chlorobium/Pelodictyon

group GSBThiobacillus denitrificans ATCC 25259 11,61 Thiobacillus CLSBDesulfohalobium retbaense DSM 5692 11,511 Desulfohalobium SRBDesulfovibrio alaskensis G20 11,5 Desulfovibrio SRBDesulfovibrio vulgaris DP4 11,442 Desulfovibrio SRBChlorobium tepidum TLS 11,354 Chlorobaculum GSBendosymbiont of unidentified scaly snail isolate Monju 11,205 0 SurDesulfovibrio vulgaris str. 'Miyazaki F' 11,093 Desulfovibrio SRBDesulfovibrio desulfuricans subsp. desulfuricans str. ATCC 27774 11,034 Desulfovibrio SRB

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 32: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 33: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 34: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

34

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 35: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 36: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Sulfur: 112 H’ Nitrogen: 176 H’ Methane: 119 H’Oxygen:55 H’

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Iron: 112 H’

Page 37: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Biogeochemical cycle Genes Pfam domains Genomes AUC

Sulfur (S) 152 112 161 0.9855

Nitrogen (N) 267 176 144 0.791

Methane (C) 135 119 90 0.988

Oxygenic Photosynthesis (O) 50 55 53 0.983

Phosphorous (P)

Iron (Fe) 36 33 34 0.863

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 38: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

ID Description H’ mean stdPF00067 Cytochrome P450 0.644 0.033785

PF00115 Cytochrome C and Quinol oxidase polypeptide I 0.513 0.061551PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.55825 0.049936PF02560 Cyanate lyase C-terminal domain 0.93625 0.001389

PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.5525 0.040324PF04898 Glutamate synthase central domain 0.479 0.034699PF13442 Cytochrome C oxidase, cbb3-type, subunit III 0.6565 0.047093

python3 plot_entropy.py gen_genF_entropies.oxygen.tab -0.156 0.20625

Oxygen Markers

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 39: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

ID Description H’ mean stdPF01913 Formylmethanofuran-tetrahydromethanopterin formyltransferase 3.629125 0.0227PF01993 methylene-5,6,7,8-tetrahydromethanopterin dehydrogenase 2.876 0PF02240 Methyl-coenzyme M reductase gamma subunit 3.168 0PF02241 Methyl-coenzyme M reductase beta subunit, C-terminal domain 3.168 0

PF02289 Cyclohydrolase (MCH) 3.353 0PF02741 FTR, proximal lobe 3.63475 0.034648PF02745 Methyl-coenzyme M reductase alpha subunit, N-terminal domain 3.168 0PF02783 Methyl-coenzyme M reductase beta subunit, N-terminal domain 3.168 0PF04206 Tetrahydromethanopterin S-methyltransferase, subunit E 3.032 0PF04207 Tetrahydromethanopterin S-methyltransferase, subunit D 3.032 0PF04208 Tetrahydromethanopterin S-methyltransferase, subunit A 2.903375 0.015203PF04211 Tetrahydromethanopterin S-methyltransferase, subunit C 3.02575 0.017678PF05440 Tetrahydromethanopterin S-methyltransferase subunit B 2.980125 0.036537 python3 plot_entropy.py

gen_genF_entropies.methane.tab -0.121 0.1475m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Methane

Page 40: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

ID Description H’ mean std

PF00067 Cytochrome P450 0.57375 0.0056

PF00174 Oxidoreductase molybdopterin binding domain 0.528125 0.006578

PF00355 Rieske [2Fe-2S] domain 0.507 0.032076

PF00507 NADH-ubiquinone/plastoquinone oxidoreductase, chain 3 0.36975 0.010886

PF00547 Urease, gamma subunit 0.464 0

PF00699 Urease beta subunit 0.475125 0.001126

PF01077 Nitrite and sulphite reductase 4Fe-4S domain 0.47025 0.014568

PF02211 Nitrile hydratase beta subunit 0.405625 0.005041

PF02633 Creatinine amidohydrolase 0.58725 0.017466

PF03460 Nitrite/Sulfite reductase ferredoxin-like half domain 0.48 0.032715

PF05899 Protein of unknown function (DUF861) 0.52175 0.022914

PF09347 Domain of unknown function (DUF1989) 0.398875 0.007415

Nitrogen

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 41: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Iron

ID Description H’ mean std

PF14522 Cytochrome c7 and related cytochrome c 1.010 0.104

PF00355 Rieske [2Fe-2S] domain 0.51912 0.02854

PF00033 Cytochrome b/b6/petB 0.55875 0.04974

PF00034 Cytochrome c 0.5061 0.1013

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 42: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Positive instances

Positive classificationsonly with strong evidence so they make few false positiveerrors

MOTIVATION GENERAL IDEA RESULTS CONCLUSIONS PERSPECTIVES THANKS

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a 1 8 / 2 2

SuliN=161

(1946) > Negative instances.

Gen

ROC CURVE• Two-dimensional graphs in which tprate is plotted on the Y axis and fp rate is plotted on the X axis. • Depicts relative tradeoffs between benefits (true positives)

and costs (false positives).

Never issuing a positive classification; such a classifier commits no false positive errors but also gains no true positives

Perfect classification

Random guessing produces the diagonal line between (0,0) and (1, 1), which has an area of 0.5, no realistic classifier should have an AUC less than 0.5

Page 43: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 44: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 45: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files

Page 46: Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle

Rel

ativ

een

tro

py

H’

4Fe-4S dicluster domain

Molydopterindinucleotide bindingdomain

Cytochrome C oxidase, cbb3-type, subunit III

Nitrogenase component1 type Oxidoreductase

m e b sT h e 1 2 t h I n t e r n a t i o n a l C o n f e r e n c e o n G e n o m i c s O c t o b e r 2 0 1 7 S h e n z h e n C h i n a V a l e r i e d e A n d a

Supplementary files