Upload
nigel-briggs
View
220
Download
0
Embed Size (px)
Citation preview
By Xianfeng (Jeff) Chen, Ph.D.
Computational and Systems Biologist
Plant Genomics, Bioinformatics, and Systems Biology
Presented at:
陈贤丰博士简历
However, do we now and future have high quality ecosystem, enough food and energy, better and healthy living condition
to sustain life on this planet ?
We have over 5000 years of civilization, now with advanced technologies ….
What am I doing at ERDC DOD ? ---- Transcriptomics Based Gene Signature Detection to
Environmental Perturbations
(1) http://systemsbiology.usm.edu/BirdGenomics/(2) http://systemsbiology.usm.edu/EGGT/
Environmental Genomics on Sentinel Species
Example Project URLs:
--- Ecotoxicogenomics
Mapping of the Modules onto Plant Drought Responsive Co-Expression Biological Networks
Software: (1) WGCNA: an R package for weighted correlation network analysis. (2) Cytoscape for display as well.
Agenda Today
(1) Cyber-infrastructure and Systems Biology --- Cutting Edge Science and Technology.
(2) Cowpea and common bean genomic gene space sequence clustering, assembly, annotation, and knowledgebase establishment ---- Genome Assembly and Analysis.
(3) Soybean (Glycine max) and Medicago (Medicago
truncatula), Brachy (Brachypodium distachyon) WRKY transcription factor family classification and gene lead nomination for transgenic studies --- Data Mining and Lead Discovery.
(4) High level transgenic expression of promoters, isolated from soybean transcription factor genes, in lima bean and soybean for functional testing and validation --- Synthetic Biology.
Section One: Cyber-infrastructure and Systems Biology
Reductionist Approach,One Gene, One Protein
Systems Approach,Multiple Genes, Network Analysis
Cutting Edge Science and Technology
Systems Biology for Environment and Energy
Genomes Determine Ecosystem Behaviors
Status of Technologies in Systems Biology
First Genome Sequenced in 1995
First Human Genome Sequenced with $3 Billions Cost
Next Generation Sequencing (NGS) Platform
PacBio RS
Ion Proton HiSeq 2500
Popular 2nd Generation Sequencers
NextSeq 500
HiSeq X Ten
• cost $10 millions• 125 bp read length• $1000 per genome
• cost $900• 80 K bp read length
MinION
中国地区已经成为继北美和欧洲之后的第 3 大二代测序仪器设备拥有区,图中标注的数字为二代测序仪器数目,其中可以看出中国的二代测序仪器主要分布在深圳、北京和上海地区。
二代测序仪器的世界分布图
美国国家人类基因组研究所
英特尔创始人 (Gordon Moore) 之一的摩尔定律
当价格不变时,集成电路上可容纳的晶体管数目,约每隔 18 个月便会增加一倍,性能也将提升一倍。
Organisms with Sequenced Genomes
Bacteria: about 2300Archae: about 50Protists: 13Plants: 7 => Arabidopsis, rice, poplar, Chlamydomonas, soybean, medicago, brachyFungi: 14 => including Saccharomyces cerevisiaeAnimals: 16 => including C. elegans, Drosophila, mouse, human
Total : about 2600 genomes completely sequenced so far
Developing from Genome Data to Full Cell Simulation
Our CurrentEffort
Bioinformatics Bottlenecks for Successful Implementation of Latest Systems Biology Technologies
Data Generation Capacity
Bioinformatics No Marriage, Sad ..……
Supporting Cyber- infrastructure and Systems Biology Workflow
Historic strong area
Supporting
http://genomicscience.energy.gov/compbio/#page=news
2011 DOE Systems Biology Knowledge Base (Kbase) Initiative:
Cyber-knowledge System to Enable Genomics-based Predictive Biology
Bioenergy, Biodefense,Environmental QualityMonitoring and Bioremediation.
PC 1-2 CPUs Computing Unix Multiple CPUs ComputingCluster Computing,or Supercomputing
Cyber-infrastructure Component : High Performance Computing
Step 1 Step 2
Start point
Most Biology Labs 5-10 Biological Labs in US
for Large Sets of Data Analysis
--- Migration of Bio-Computing Capability
Project website : http://www.igece.org/LegumeSystemsBiology.html
Genomics Knowledgebase System Architecture---- Legumes as Bioenergy Model Organisms
(1) BMC Bioinformatics 2007, 8:129.(2) BMC Genomics 2008, 9:103.
Section Two: Genome Assembly and Analysis
--- CGKB – Cowpea Genomics Knowledge Base
Project website: http://cowpeagenomics.med.virginia.edu/CGKB/
We published on:
Genome Annotation Strategy (1): Homology-based Annotation for Cowpea GSS
263,425 Total Cowpea Gene Space Sequence (GSS).
High level coding region detected !
We published on BMC Genomics 9:103, 2008.
Common Bean 415K Genomic Gene Spaces Clustering and Assembly
for Unigene Production
Test Run No. of Contig No. of Singlet Parameter Setting
One 64,933 1,049 -c 4 -p 95 -l 60 -v 10 Two 62,140 219 -c 4 -p 98 -l 100 -v 5 Three 61,397 288 -c 4 -p 98 -l 40 -v 2
---- Used The Gene Indices Clustering Tools (TGICL), which uses megblast for homology-based clustering and CAP3 for assembly.
Diamond - SGI Altrix ICE Spercomputer Jade - Cray XT4 Supercomputer
Project Website: http://systemsbiology.usm.edu/Legume/CommonBean/
Common Bean Genomics and Bioinformatics
Non-codingDetected
CodingDetected
CodingPotential
Reference Proteins
207,587 207,412 49.98% NR.aa
216,217 198,782 48.90% Refseq
291,412 123,587 29.78% SwissProt
207,599 207,400 49.98% Uniref100
No. of Proteins
Coding Potential Detection of 415K Common Bean Genomic Gene Spaces
Note: Sequenced at the Washington University Genome Center via Methylation Filtration Techniques
10 millions
6.5 millions
10 millions
0.5 millions
Genome Annotation Strategy (2): Metabolic Pathway Integration for Cowpea GSS
We published on BMC Bioinformatics. 2007, 8:129.
Genome Annotation Strategy (3): GO Integration with Distribution of Function Assignments for Cowpea GSS
BMC Genomics. 2008, 9:103.
BMC Bioinformatics. 2007, 8:129.
We published on:
Genome Annotation Strategy (4): Comparative Genomics at Genome-scale
We published on BMC Genomics. 2008, 9:103.
---- Example of medicago vs cowpea
Genome Annotation Strategy (5): Comparison at Gene Family Level
(1) BMC Genomics 9:103, 2008.(2) Plant Physiology 147:280-295, 2008.(3) BMC Plant Biology 10:237, 2010.(4) BMC Bioinformatics 9:53, 2008.(5) Plant Biotechnology Journal 1: 1-10, 2011 (6) BMC Genomics, 13:270, 2012 (7) Journal of Agriculture Biotechnology,
22 (5): 572-579, 2014
--- WRKY and CONSTANS (CO) and CO-like Gene Families of Cowpea Transcription Factors
We published the methods on:
Genome Annotation Strategies: (6) Repeat, (7) Domain, (8) Gene Model for Cowpea GSS
We published on:BMC Bioinformatics 8:129, 2007.
Repeat
Domain
Gene Model
Project website: http://cowpeagenomics.med.virginia.edu/CGKB/
Genome Annotation Strategy (9): Comparative Genomics on Network for Conserved Protein Complexes
Comparative genome analysis
Conserved networks
---- May not have been done in plants yet ?
Genome Annotation Strategy (10): Functional Validation through Reverse Genetics Program
My name
2008
Section Three: Data Mining on Data Processed via Computational Approach in Kbase
Knowledge-based Discovery
Data Mining on Promoters and Transcription Factors
---- SURE, Soybean Upstream Regulatory Elements for Ongoing Regulatory Motif Annotation
http://systemsbiology.usm.edu/SystemsBiologyCenter/Soybean/Soybean_files/SURE.html
Project URL: http://www.igece.org/Soybean_TF/ We published on BMC Plant Biology 10:237, 2010.
4, 452 Predicted Transcription Factors in 76 Families
Project website: http://compsysbio.achs.virginia.edu/tobfac/
Plant Physiology 147:280-295, 2008.BMC Bioinformatics 9:53, 2008.
We published on:
1,134 Predicted Transcription Factors
115
89
Section Four: Nominating Transcription FactorsInvolved in Stress Response for Transgenic Studies
Group IX
Red Dot = Soybean ERF genes
Implicated in regulating wounding and jasmonate responses
Soybean Promoter :
GmWRKYs, GmERFs, Gmubis, Gmcons
more and more and more……..
10 promoters per month
Promoter
Tested WRKY Promoters for Expression Assays
2009 2039 2069
---- Climate Change Predicts Drought in the near Future
WRKYs Are the Key Proteins Regulating Plant Drought Tolerance
Project website: http://systemsbiology.usm.edu/PhytoTech/WRKY/
WRKYs across Plant Kingdom
Project website: http://systemsbiology.usm.edu/PhytoTech/WRKY/
Project URL:http://www.igece.org/WRKY/BrachyWRKY/
The WRKY Wide Web: The Database of WRKY Transcription Factors
MEME Analysis of Conserved Domains
We published on BMC Genomics, 13:270, 2012.
Soybean WRKY Promoter Transient Expressions at Peek
http://www.igece.org/WRKY/BrachyWRKY/SoySUREWRKY/WRKY.html
More details on methods : BMC Plant Biology 10:237, 2010
Lima Bean Cotyledons
Soybean Hairy Root
Video at:
Where are the Strong Promoters for WRKY Over-expression ?
More details on the techniques we published: BMC Plant Biology 10:237, 2010.
Acknowledgement
DOD ERDC ELDr. Victor MedinaMs. Xia Wang
DOD ERDC - ITLMr. Mark CowanMr. Ken LawrenceMr. Dave DumasMr. Phillip Bucci
UC, Berkeley & DavisDr. Chris VulpeDr. Paul GeptsDr. Dawei Lin
University of VirginiaDr. Mike Timko
The Ohio State UniversityDr. John Finer
University of MichiganDr. Youqun (Oliver) He
Cornell UniversityDr. Zhangjun Fei
University of Southern MississippiDr. Chaoyang (Joe) ZhangDr. George Glover, System Admin
IFXworks CorporationDr. Keith StewardDr. Vladimir MakarovDr. Rich Zhang
South Dakota State UniversityDr. Paul RushtonInstitute of Green Energy & Clean Environment
Dr. Jason LiDr. Anna MaDr. Joe Sung