19
Squeezing big data into a small organisation Dan MacLean The Sainsbury Laboratory

5. Dan MacLean- Sainsbury Laboratory

Embed Size (px)

DESCRIPTION

A presentation about: "Squeezing big data into a small organisation".

Citation preview

Page 1: 5. Dan MacLean- Sainsbury Laboratory

Squeezing  big  data  into  a  small  organisation  

Dan  MacLean    

The  Sainsbury  Laboratory  

Page 2: 5. Dan MacLean- Sainsbury Laboratory

The  Sainsbury  Lab  

Page 3: 5. Dan MacLean- Sainsbury Laboratory

TSL  Funding  

Source  

Gatsby  Core  

Other  (BBSRC,  EU  etc)  

Page 4: 5. Dan MacLean- Sainsbury Laboratory

TSL  Research  “The  Sainsbury  Laboratory  is  dedicated  to  making  fundamental  discoveries  about  plants  and  how  they  interact  with  microbes  and  viruses  and  favours  daring,  long-­‐term  research  over  work  that  could  be  equally  well  carried  out  elsewhere”.  

Basic  and  translational  research  into  Plant/Pathogen  interactions  

Page 5: 5. Dan MacLean- Sainsbury Laboratory

HTGS  effector  finding  R  gene  cloning  assembly  de  novo  resequencing  SNP  detection  annotation  pipelines  

genomics  

RNA  seq  ChIP  seq  arrays  

proteomics  

image  segmentation  object  detection  algorithm  development    

transcriptomics  

high  throughput  image  analysis  

pipeline  development  statistical  methods  for  spectrum  analysis  

Page 6: 5. Dan MacLean- Sainsbury Laboratory

TSL  Tech  

Illumina  GA  II  

Orbitrap  LC-­‐MS  

Opera  HCA  Imaging  

5  groups,  ~  80  scientists  =>  2  bioinformaticians      

understand,  manage,  analyze  

Page 7: 5. Dan MacLean- Sainsbury Laboratory

understand,  manage,  analyze  

0  

20  

40  

60  

80  

100  

120  

understanding   analysis   management  

core  informatician  

project  scientist  

(biological  provenance)  

Page 8: 5. Dan MacLean- Sainsbury Laboratory

0  

20  

40  

60  

80  

100  

120  

understanding   analysis   management  

core  informatician  

project  scientist  

understand,  manage,  analyze  

(biological  provenance)  

Page 9: 5. Dan MacLean- Sainsbury Laboratory

?  

Where  to  focus?  

Page 10: 5. Dan MacLean- Sainsbury Laboratory

Where  to  focus?  (“bioinformatics  is  easy”  –  do  your  own  &%*@@!  BLAST)  

Bioinformatics  is  a  sub-­‐discipline  of  molecular  biology*  

*Not  true  everywhere,  but  for  our  purposes  true  enough  

Page 11: 5. Dan MacLean- Sainsbury Laboratory

Support  Outline  

Albugo    genomics  

Hpa    genomics  

P.infestans    genomics  

A.thaliana    genomics  

A.thaliana    proteomics  

P.infestans    transcriptomics  

A.thal  HT      Microscopy  

 

 

Albugo    transcriptomics  

analysis  deficit  

understanding  deficit  management  deficit  

Page 12: 5. Dan MacLean- Sainsbury Laboratory

Training  Workshops:  

 Perl,  Ruby,      CMD  Line,      MySQL,    R  +  Stats,    Excel  VB    Browsers  and  Desktop  tools…      

Resource  provision:    Workshop  notes,      Workshop  Podcasts,      Quick  ‘How  do  I’  podcasts    Library        

Integration  of  best  practice:    Lab  meetings,      Journal  Clubs,        

An  open  dialogue  –  most  important  is  to  have  someone  approachable,  who  wants  to  do  it  

Page 13: 5. Dan MacLean- Sainsbury Laboratory

Systems  Development    (SOP  for  common  tasks  –  keeping  the  house  in  order)  

Common  data  storage  –  results  and  raw  data    

Don’t  inconvenience    Make  access  easy  Make  messing  it  up  hard  

Need  to  get  PLs  on-­‐side…  

Shared  Data  

Page 14: 5. Dan MacLean- Sainsbury Laboratory

Local  validation  rules  for  features  and  sequence  •  Annotation  (GFF3)  consistent  with  local  specs  •  Dbxref,  IDs,  sl_id,  species_id,        

Systems  Development    Shared  Data  

Sequence  and  feature  databases  •   version  and  update  tracking  –  central  store    

Page 15: 5. Dan MacLean- Sainsbury Laboratory

Systems  Development    galaxy  

Workflows  –  great  for  sharing  ‘vanilla’  analysis  protocols  Customisable  –  great  for  running  in-­‐house  scripts  

Page 16: 5. Dan MacLean- Sainsbury Laboratory

Supplementary  Research  

De  Bruijn  graph  representations  of  polymorphisms  in  sequence  reads  

Reference-­‐free  SNP  detection  

New  method  –  reduces  steps  in  SNP  finding,  makes  possible  where  no  reference  available  

Collaboration  with  Mario  Caccamo  at  TGAC  

Page 17: 5. Dan MacLean- Sainsbury Laboratory

Advantage  

Time  

•  Get  analysis  know-­‐how  into  heads  with  biology  

•  Reduce  workload  

•  Improve  reproducibility  

•  These  things  propagate…  

Page 18: 5. Dan MacLean- Sainsbury Laboratory

Summary  

•  There  is  a  lot  a  ‘biologist’  can  do  themselves  

•  Start  a  dialogue  

•  Get  tough,  get  your  house  in  order  

•  Lower  the  barriers  to  access  and  capability  

Page 19: 5. Dan MacLean- Sainsbury Laboratory

Acknowledgements      •  Dr  Graham  Etherington  

•  Michael  Burrell  

•  Sophien  Kamoun  

•  Jonathan  Jones  

•  Silke  Robatzek  

•  Eric  Ward  

•  Richard  Leggett  

•  Mario  Caccamo