13
DevOpsとcloudで達成する 再現性のあるスーパーコンピューティング 二階堂愛, Ph.D. <[email protected]> ユニットリーダー. バイオインフォマティクス研究開発ユニット 理化学研究所 情報基盤センター http://bit.accc.riken.jp/

DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

Embed Size (px)

DESCRIPTION

NIGスーパーコンピュータユーザ会ので発表

Citation preview

Page 1: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

DevOpsとcloudで達成する 再現性のあるスーパーコンピューティング二階堂愛, Ph.D. <[email protected]> ユニットリーダー. バイオインフォマティクス研究開発ユニット 理化学研究所 情報基盤センター http://bit.accc.riken.jp/

Page 2: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

国立遺伝学研究所スーパーコンピュータよる成果Thanks!

METHOD Open Access

Quartz-Seq: a highly reproducible and sensitivesingle-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneityYohei Sasagawa1,7†, Itoshi Nikaido1,7†, Tetsutaro Hayashi2, Hiroki Danno3, Kenichiro D Uno1, Takeshi Imai4,5 andHiroki R Ueda1,3,6*

AbstractDevelopment of a highly reproducible and sensitive single-cell RNA sequencing (RNA-seq) method would facilitatethe understanding of the biological roles and underlying mechanisms of non-genetic cellular heterogeneity. In thisstudy, we report a novel single-cell RNA-seq method called Quartz-Seq that has a simpler protocol and higherreproducibility and sensitivity than existing methods. We show that single-cell Quartz-Seq can quantitatively detectvarious kinds of non-genetic cellular heterogeneity, and can detect different cell types and different cell-cyclephases of a single cell type. Moreover, this method can comprehensively reveal gene-expression heterogeneitybetween single cells of the same cell type in the same cell-cycle phase.

Keywords: Single cell, RNA-seq, Transcriptome, Sequencing, Bioinformatics, Cellular heterogeneity, Cell biology

BackgroundNon-genetic cellular heterogeneity at the mRNA and pro-tein levels has been observed within cell populations indiverse developmental processes and physiological condi-tions [1-4]. However, the comprehensive and quantitativeanalysis of this cellular heterogeneity and its changes inresponse to perturbations has been extremely challenging.Recently, several researchers reported quantification ofgene-expression heterogeneity within genetically identicalcell populations, and elucidation of its biological roles andunderlying mechanisms [5-8]. Although gene-expressionheterogeneities have been quantitatively measured for sev-eral target genes using single-molecule imaging or single-cell quantitative (q)PCR, comprehensive studies on thequantification of gene-expression heterogeneity are limited[9] and thus further work is required. Because globalgene-expression heterogeneity may provide biologicalinformation (for example, on cell fate, culture environ-ment, and drug response), the question of how to compre-hensively and quantitatively detect the heterogeneity of

mRNA expression in single cells and how to extract biolo-gical information from those data remains to be addressed.Single-cell RNA sequencing (RNA-seq) analysis has

been shown to be an effective approach for the compre-hensive quantification of gene-expression heterogeneitythat reflects the cellular heterogeneity at the single-celllevel [10,11]. To understand the biological roles andunderlying mechanisms of such heterogeneity, an idealsingle-cell transcriptome analysis method would providea simple, highly reproducible, and sensitive method formeasuring the gene-expression heterogeneity of cellpopulations. In addition, this method should be able todistinguish clearly the gene-expression heterogeneityfrom experimental errors.Single-cell transcriptome analyses, which can be

achieved through the use of various platforms, such asmicroarrays, massively parallel sequencers and bead arrays[12-17], are able to identify cell-type markers and/or rarecell types in tissues. These platforms require nanogramquantities of DNA as the starting material. However, atypical single cell has approximately 10 pg of total RNAand often contains only 0.1 pg of polyadenylated RNA,hence, o obtain the amount of DNA starting material thatis required by these platforms, it is necessary to performwhole-transcript amplification (WTA).

* Correspondence: [email protected]† Contributed equally1Functional Genomics Unit, RIKEN Center for Developmental Biology, 2-2-3Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, JapanFull list of author information is available at the end of the article

Sasagawa et al. Genome Biology 2013, 14:R31http://genomebiology.com/2013/14/4/R31

© 2013 Sasagawa et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Molecular Cell

Article

Context-Dependent Wiring of Sox2 RegulatoryNetworks for Self-Renewalof Embryonic and Trophoblast Stem CellsKenjiro Adachi,1,9,10,* Itoshi Nikaido,2,9,11 Hiroshi Ohta,3,12 Satoshi Ohtsuka,1 Hiroki Ura,1 Mitsutaka Kadota,5

Teruhiko Wakayama,3,6 Hiroki R. Ueda,2,4 and Hitoshi Niwa1,7,8,*1Laboratory for Pluripotent Stem Cell Studies2Functional Genomics Unit3Laboratory for Genome Reprogramming4Laboratory for Systems Biology5Genome Resource and Analysis UnitRIKEN Center for Developmental Biology, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe 6500047, Japan6Faculty of Life and Environmental Sciences, University of Yamanashi, Yamanashi 4008510, Japan7Laboratory for Development and Regenerative Medicine, Kobe University Graduate School of Medicine, 7-5-1 Kusunokicho, Chuo-ku,Kobe 6500017, Japan8JST, CREST, Sanbancho, Chiyoda-ku, Tokyo, 1020075, Japan9These authors contributed equally to this work10Present address: Department of Cell andDevelopmental Biology,Max Planck Institute forMolecular Biomedicine, 48149Munster, Germany11Present address: Bioinformatics Research Unit, Advanced Center for Computing and Communication, RIKEN, Wako,Saitama 3510198, Japan12Present address: Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Sakyo-ku,Kyoto 6068501, Japan*Correspondence: [email protected] (K.A.), [email protected] (H.N.)http://dx.doi.org/10.1016/j.molcel.2013.09.002

SUMMARY

Sox2 is a transcription factor required for the mainte-nance of pluripotency. It also plays an essential rolein different types of multipotent stem cells, raisingthe possibility that Sox2 governs the commonstemness phenotype. Here we show that Sox2 isa critical downstream target of fibroblast growth fac-tor (FGF) signaling, which mediates self-renewal oftrophoblast stem cells (TSCs). Sustained expressionof Sox2 together with Esrrb or Tfap2c can replaceFGF dependency. By comparing genome-wide bind-ing sites of Sox2 in embryonic stem cells (ESCs) andTSCs combined with inducible knockout systems,we found that, despite the common role in safe-guarding the stem cell state, Sox2 regulates distinctsets of genes with unique functions in these twodifferent yet developmentally related types ofstem cells. Our findings provide insights into thefunctional versatility of transcription factors duringembryogenesis, during which they can be recur-sively utilized in a variable manner within discretenetwork structures.

INTRODUCTION

The transcriptional output of a given cell type is controlled byunique combinations of transcription factors under the control

of extrinsic signals that canmodulate the expression and activityof transcription factors, forming a gene regulatory network thatdictates a specific cellular phenotype. Tissue-specific transcrip-tion factors play deterministic roles in cell-type specification,which ismanifested as lineage reprogramming by forced expres-sion of such transcription factors (Graf and Enver, 2009; ZhouandMelton, 2008). Sox2 is one such transcription factor requiredfor themaintenance of pluripotent stem cells in vivo (Avilion et al.,2003) and in vitro (Masui et al., 2007) and for the induction ofpluripotency (Takahashi and Yamanaka, 2006). However, it isalso preferentially expressed in neural, retinal, and trophoblaststem cells (TSCs) (Avilion et al., 2003; Pevny and Nicolis,2010), suggesting a possible role for Sox2 in governing a com-mon stemness phenotype.In embryonic stem cells (ESCs), Sox2 forms a heterodimer

with Oct3/4 (also known as Pou5f1) on DNA with the OCT-SOXcomposite motifs, and these factors cooperatively activatepluripotency-related target genes such as Nanog, Fgf4, Utf1,Lefty1, and Fbxo15, as well as their own expression (Nakatakeet al., 2006, and references therein). Oct3/4-knockout ESCsare differentiated along the trophoblast lineage in a highly homo-geneous manner (Niwa et al., 2000). In contrast, the loss of Sox2causes differentiation of ESCs accompanied by upregulation ofmarkers for trophoblast and embryonic germ layers, althoughartificial maintenance of Oct3/4 from the transgene can sustainself-renewal and pluripotency of Sox2-null ESCs (Masui et al.,2007), suggesting that the unique function of Sox2 may beto maintain Oct3/4 expression. These two core transcriptionfactors, along withNanog, form an interconnected and hierarchi-cal network downstream of the leukemia inhibitory factor (LIF)-Stat3 and LIF-phosphatidylinositol 3-kinase (PI3K) signaling

380 Molecular Cell 52, 380–392, November 7, 2013 ª2013 Elsevier Inc.

世界最高精度の1細胞RNA-Seq開発 転写因子ネットワークの動的変化と分化

Page 3: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

バイオインフォマティクス研究開発ユニットAdvanced Center for Computing and Communication

Informatics Biology

1. DNAシーケンサーデータ解析手法・実験手法の開発!2. 理研内外の実験研究者との共同研究の推進・バイオインフォ教育・人材育成

xi

θi

G

G0γ

σ-­‐

a b

10#pg#total#RNA�

Amplified#cDNA�

Amplified#cDNA�

Sequence#Library#DNA�

1. 1細胞RNA-Seqとデータ解析技術の開発!2. 新規エピゲノムシーケンス法の開発

Page 4: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

Introduction of Bioinformatics research activity in RIKEN ACCC

Bioinformatics: 研究とエンジニアリング

•バイオインフォマティクス研究に集中したい •データ解析環境を構築することは手間がかかる • NGS解析はたくさんのツールの組み合わせ •ツールのアップデートが速い •たくさんのバイオデータベースを使う •調達や管理、保守の手間がかかる •解析の再現性担保 •論文のマテメソは記載が不足しており解析が再現できない

Page 5: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

IT インフラ

アプリケーション開発・リリース

ビジネスアイディア マーケット

http://ja.wikipedia.org/wiki/DevOps. modified

DevOps = Development + OperationsITインフラとアプリケーション開発の一体化

ビジネスアイディアを素早くマーケットに出すための ITに関する思想とその技術

Page 6: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

データ解析用PCクラスターのセットアップ

データ解析ツールやパイプラインシステムの開発

Bioinformatics Data analysis

BioDevOps

データ解析やソフト、データベースの品質管理

研究アイディア 論文出版

BioDevOps = Bioinfomatics + Development + Operationsバイオインフォマティクス解析とITインフラとアプリケーション開発の一体化

データ解析の実施

研究アイディアを素早く論文として出すための バイオインフォに関する思想とその技術

Page 7: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

解析環境をコードとして管理する: Infrastructure as Code

BioDevOps = 2つの技術

•バイオインフォマティクス研究に集中したい •データ解析環境を構築することは手間がかかる • NGS解析はたくさんのツールの組み合わせ •ツールのアップデートが速い •たくさんのバイオデータベースを使う •調達や管理、保守の手間がかかる •解析の再現性担保 •論文のマテメソは記載が不足しており解析が再現できない

Infrastructure as Code

Cloud computing

Page 8: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

Bioinformatics Analysis Environment as Codeバイオインフォ解析環境が完備されたLinuxを仮想マシンとして提供する

http://www.getchef.com/chef/

• 解析環境セットアップ情報はすべてコード

• ソースコード管理システムでバージョン管理

• コードのテスト • Zabbixによる計算リソースの監視

• データベースミラー

User

Zabbix

Page 9: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

Chef recipe and Integration TestExample: Installing NCBI BLAST by chef

Page 10: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

Chef recipe and Integration TestExample: Installing NCBI BLAST by chef

SGE, blast, R, Bioconductor, BioPerl, BioRuby, BioPython…

Page 11: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

理研クラウドシステムNGS解析のためのクラウドコンピュータシステム

Sequencing center Bioinformatics Research Unit Cloud Computer

Sequence data

User

Browser & Pipeline!BioDevOps

Browser & PipelinesData

Calc.Result

Browser & Pipeline

Consultation!Tutorial

Biological samples

目標: サンプルを送るとURLが納品される

Page 12: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

理研クラウドシステムの実装NGS解析のためのクラウドコンピュータシステム

node: 15 CPU: 2.6GHz x 16 RAM: 512GB

NFS for VM images

GPFS-cNFS for common storage 556TB

Apache CloudStack™ Open Source Cloud Computing™

Page 13: DevOpsとcloudで達成する再現性のあるDNAシーケンス解析とスーパーコンピューティング

• レシピを増やす

• Galaxy, GBrowse, RStudio Serverなどのウェブアプリ

• ツール動作のテスト、継続的インテグレーション

• RIKENバイオクラウド, AWSでの動作確認

• PCクラスタとしてのプロビジョニング

• Docker Index, GitHubでレシピ&VM公開予定

• 連携

• LPM, med-bio, BioCloudLinux, BioUno, Bioconductor, Bio*, …

http://biodevops.org/

今後の展開Advanced Center for Computing and Communication