Upload
eagle-genomics-ltd
View
737
Download
3
Embed Size (px)
DESCRIPTION
A presentation about: "Squeezing big data into a small organisation".
Citation preview
Squeezing big data into a small organisation
Dan MacLean
The Sainsbury Laboratory
The Sainsbury Lab
TSL Funding
Source
Gatsby Core
Other (BBSRC, EU etc)
TSL Research “The Sainsbury Laboratory is dedicated to making fundamental discoveries about plants and how they interact with microbes and viruses and favours daring, long-‐term research over work that could be equally well carried out elsewhere”.
Basic and translational research into Plant/Pathogen interactions
HTGS effector finding R gene cloning assembly de novo resequencing SNP detection annotation pipelines
genomics
RNA seq ChIP seq arrays
proteomics
image segmentation object detection algorithm development
transcriptomics
high throughput image analysis
pipeline development statistical methods for spectrum analysis
TSL Tech
Illumina GA II
Orbitrap LC-‐MS
Opera HCA Imaging
5 groups, ~ 80 scientists => 2 bioinformaticians
understand, manage, analyze
understand, manage, analyze
0
20
40
60
80
100
120
understanding analysis management
core informatician
project scientist
(biological provenance)
0
20
40
60
80
100
120
understanding analysis management
core informatician
project scientist
understand, manage, analyze
(biological provenance)
?
Where to focus?
Where to focus? (“bioinformatics is easy” – do your own &%*@@! BLAST)
Bioinformatics is a sub-‐discipline of molecular biology*
*Not true everywhere, but for our purposes true enough
Support Outline
Albugo genomics
Hpa genomics
P.infestans genomics
A.thaliana genomics
A.thaliana proteomics
P.infestans transcriptomics
A.thal HT Microscopy
Albugo transcriptomics
analysis deficit
understanding deficit management deficit
Training Workshops:
Perl, Ruby, CMD Line, MySQL, R + Stats, Excel VB Browsers and Desktop tools…
Resource provision: Workshop notes, Workshop Podcasts, Quick ‘How do I’ podcasts Library
Integration of best practice: Lab meetings, Journal Clubs,
An open dialogue – most important is to have someone approachable, who wants to do it
Systems Development (SOP for common tasks – keeping the house in order)
Common data storage – results and raw data
Don’t inconvenience Make access easy Make messing it up hard
Need to get PLs on-‐side…
Shared Data
Local validation rules for features and sequence • Annotation (GFF3) consistent with local specs • Dbxref, IDs, sl_id, species_id,
Systems Development Shared Data
Sequence and feature databases • version and update tracking – central store
Systems Development galaxy
Workflows – great for sharing ‘vanilla’ analysis protocols Customisable – great for running in-‐house scripts
Supplementary Research
De Bruijn graph representations of polymorphisms in sequence reads
Reference-‐free SNP detection
New method – reduces steps in SNP finding, makes possible where no reference available
Collaboration with Mario Caccamo at TGAC
Advantage
Time
• Get analysis know-‐how into heads with biology
• Reduce workload
• Improve reproducibility
• These things propagate…
Summary
• There is a lot a ‘biologist’ can do themselves
• Start a dialogue
• Get tough, get your house in order
• Lower the barriers to access and capability
Acknowledgements • Dr Graham Etherington
• Michael Burrell
• Sophien Kamoun
• Jonathan Jones
• Silke Robatzek
• Eric Ward
• Richard Leggett
• Mario Caccamo