20
Fast Genomic Sequence Searches with Symmetric Implementation of Parallel Blast Bhanu Rekepalli (BioTeam) Eduardo Ponce and Greg Peterson (UTK) BICoB 2015, March 10, 2015 “The Freedom To Discover”

BioTeam Bhanu Rekepalli Presentation at BICoB 2015

Embed Size (px)

Citation preview

Fast  Genomic  Sequence  Searches  with  Symmetric  Implementation  of  Parallel  Blast  

 Bhanu  Rekepalli  (BioTeam)  Eduardo  Ponce  and  Greg  Peterson  (UTK)  

BICoB  2015,  March  10,  2015  

“The  Freedom  To  Discover”  

Moving  from  Enabling  Science  to  TRANSFORMING  Science.  

The  BioTeam  

3  

BioTeam  

•  Independent  consulting  firm  

•  Staffed  by  scientists  forced  to  learn  IT,  SW  &  HPC  to  get  our  own  research  done  

•  Assess,  Design,  Implement  &  Train    

•  Bridging  the  “gap”  between  science,  IT  &  high  performance  computing  since  2002  

•  Skilled  Bio-­‐IT  Evolutionary  Anthropologists    (>  400  studied  in  the  last  year)  

Outline  

 •  The  genomics  data  problem  •  Highly-­‐scalable  parallel  wrapper  •  Parallel  BLAST  on  Xeon  Phi  •  Optimizations  •  Performance  evaluation  •  Conclusion  •  Future  work  

The  genomic  data  problem  

•  Advances  in  next-­‐generation  sequencing  techniques  are  producing  complete  genomes  at  faster  rates  than  data  analysis  can  process  

•  Data  is  managed  by  community-­‐centered  databases  (updated  routinely)  •  e.g.,  GenBank,  EMBL,  NR,  PDB    

•  Challenge:  Bioinformatics  research  requires  high-­‐throughput  processing  and  analytic  tools  to  sustain  the  exponential  growth  in  the  genomic  data  Add  the  fact  that  HPC  Is  difficult  to  utilize    

•  Solution:  modify  algorithms    and  frameworks  to  allow  scalable    analytics  in  modern  architectures  

Top500  Next-­‐Gen  Supercomputers  

Intel  Xeon  Phi  

•  A  many  integrated  core  (MIC)  for  massive  parallelism  •  Programming  models  

•  Native  –  all  code  on  MIC  •  Offload  –  main  code  on  CPU,  other  on  MIC  •  Symmetric  –  both  CPU  and  MIC  using  message  passing  •  Upload  –  main  code  on  MIC,  other  on  CPU  

Case  study:  NCBI  BLAST  

•  The  “Swiss  Army  knife”  of  biologists  •  BLAST  (Basic  Local  Alignment  Search  Tool)  aligns  genomic  chains  of  amino  acids  using  fast  heuristic  algorithms  to  find  regions  of  local  similarity.  •  Compares  query  sequences  to  sequence  databases  and  calculates  the  statistical  significance  of  matches.  

•  Sequencing  programs:  blastp,  blastn,  blastx,  psi-­‐blast  

•  Query  file  and  formatted  database  (FASTA  format)  

Highly-­‐scalable  parallel  wrapper  •  HSP-­‐wrap  •  Software  framework  for  scaling  life  science  informatics  applications  to  HPC  environments  via  task  parallelism    •  Bioinformatics  and  chemoinformatics  domains  •  Portable  à  written  in  C/C++  and  MPI  •  Load  balance,  parallel  output,  fault-­‐tolerance,  check-­‐pointing  

•  Successfully  ported  tools  •  BLAST,  HMMER,  MUSCLE  •  DOCK6,  AutoDock  Vina,  LINUS  

HSP-­‐wrap:  architecture  The  Wrapper  Approach

Database(NR,

Pfam,  …)

InputQueries

Results  1

Results  2

Results  N

Lustre FS

Database

Query  Block  1…

Query  Block  M

Output  Buffer

Tool  Process  1(BLAST,  DOCK6,  HMMER,  …)…

Compression

Data-­‐base

Query  Block

Worker  Nodes  [1..N]

Master  Node

CompressedBuffer

Main  Memory

Tool  Process  P(BLAST,  DOCK6,  HMMER,  …)

Main  Memory

Preload  Database

Bhanu  Rekapalli  et.  al.  BMC  Bioinformatics  2013,  14:  S3    

HSP-­‐wrap:  memory  management  

•  stdiowrap  –  module  for  file  management  •  Function  interposition  to  standard  I/O  calls  •  Minimal  modification  to  original  code  (if  any)  •  Input  file  management  •  Files  are  mapped  to  main  memory  on-­‐demand  •  Tracks  parallel  reads  

•  Output  file  management  •  Double  buffered  parallel  support  •  Minimizes  number  of  data  transfers  

In   symmetric   execution,   both   Xeon   and   Xeon   Phi  processors   used   as   network   hosts   for   distributed  processing.  

 

Symmetric  HSPH-­‐BLAST  

HSPH-­‐BLAST  speedup  over  NCBI  BLAST  

Xeon/Phi Configuratio

n

Input sequence

s

Worker nodes:[Xeon, Phi]

Physical cores:[Xeon, Phi] =

Total 3x_8p 17000 [2, 8] [48, 488] = 536

5x_16p 34000 [4, 16] [80, 976] = 1056

9x_32p 68000 [8, 32] [144, 1952] = 2096

17x_64p 136000 [16, 64] [272, 3904] = 4176

Weak  scaling  parameters  

Results:  speedup  

Results:  load  balance  

Results:  cost  per  sequence  

•  Parallel   wrappers   can   be   adapted   to   current   informatics  applications   to   greatly   improve   processing   throughput   and  scalability   on   supercomputing   platforms   due   to   similar  programming  models  and  I/O  characteristics.  

•  The   wrapped   tools   can   be   used   to   identify   species,   perform  DNA   mapping,   infer   on   functional   and   evolutionary  relationships   between   sequences   as   well   as   help   identify  members  of  gene  families.  

•  Symmetric   weak   scaling   studies   showed   linear   speedup   and  balanced   workload   distribution   for   course-­‐grained  parallelization  of  BLAST.  

•  Finer-­‐grained   vectorization   of   the   BLAST   process   could  improve   utilization   of   Xeon   Phi   processors,   thus   we   are  collaborating  with  Intel  engineers  and  NCBI  BLAST  developers  to  address  this.  

Conclusions  

Future  work  

•  Extend  Highly-­‐Scalable  Parallel  Hybrid  Software  Wrapper  •  Replication  and  fragmentation  schemes  for  large  data  

management  •  Input  models:  dot/cross  product  and  hybrid  approaches  

•  Make  tools  available  as  standard  software  modules  on  HPC  architectures  

•  Integrate  HSP-­‐tools  into  scientific  workflow  pipelines  to  provide  fast  processing  for  high-­‐impact  scientific  discovery  •  Incorporate  with  web-­‐enabled  science  gateways  

•  Optimize  NCBI  BLAST  for  Xeon  Phi  processors  and  scale  it  with  HSP-­‐Wrap  

•  Adapt   parallel   wrappers   to   other   primary   tools   used   in  informatics  fields  of  the  life  sciences  

•  Build  data  analysis  pipelines  for  novel  data  mining  and  large-­‐scale  knowledge  discovery  

Thank  you  

•  Graduate  students  •  Eduardo  Ponce  (EECS)  •  Amit  Upadhyay  (GST)  •  Paul  Giblock  (Cisco)