BioTeam Bhanu Rekepalli Presentation at BICoB 2015

Fast Genomic Sequence Searches with Symmetric Implementation of Parallel Blast

Bhanu Rekepalli (BioTeam) Eduardo Ponce and Greg Peterson (UTK)

BICoB 2015, March 10, 2015

“The Freedom To Discover”

Moving from Enabling Science to TRANSFORMING Science.

The BioTeam

3

BioTeam

•  Independent consulting firm

•  Staffed by scientists forced to learn IT, SW & HPC to get our own research done

•  Assess, Design, Implement & Train

•  Bridging the “gap” between science, IT & high performance computing since 2002

•  Skilled Bio-‐IT Evolutionary Anthropologists (> 400 studied in the last year)

Outline

•  The genomics data problem •  Highly-‐scalable parallel wrapper •  Parallel BLAST on Xeon Phi •  Optimizations •  Performance evaluation •  Conclusion •  Future work

The genomic data problem

•  Advances in next-‐generation sequencing techniques are producing complete genomes at faster rates than data analysis can process

•  Data is managed by community-‐centered databases (updated routinely) •  e.g., GenBank, EMBL, NR, PDB

•  Challenge: Bioinformatics research requires high-‐throughput processing and analytic tools to sustain the exponential growth in the genomic data Add the fact that HPC Is difficult to utilize

•  Solution: modify algorithms and frameworks to allow scalable analytics in modern architectures

Top500 Next-‐Gen Supercomputers

Intel Xeon Phi

•  A many integrated core (MIC) for massive parallelism •  Programming models

•  Native – all code on MIC •  Offload – main code on CPU, other on MIC •  Symmetric – both CPU and MIC using message passing •  Upload – main code on MIC, other on CPU

Case study: NCBI BLAST

•  The “Swiss Army knife” of biologists •  BLAST (Basic Local Alignment Search Tool) aligns genomic chains of amino acids using fast heuristic algorithms to find regions of local similarity. •  Compares query sequences to sequence databases and calculates the statistical significance of matches.

•  Sequencing programs: blastp, blastn, blastx, psi-‐blast

•  Query file and formatted database (FASTA format)

Highly-‐scalable parallel wrapper •  HSP-‐wrap •  Software framework for scaling life science informatics applications to HPC environments via task parallelism •  Bioinformatics and chemoinformatics domains •  Portable à written in C/C++ and MPI •  Load balance, parallel output, fault-‐tolerance, check-‐pointing

•  Successfully ported tools •  BLAST, HMMER, MUSCLE •  DOCK6, AutoDock Vina, LINUS

HSP-‐wrap: architecture The Wrapper Approach

Database(NR,

Pfam, …)

InputQueries

Results 1

Results 2

Results N

Lustre FS

Database

Query Block 1…

Query Block M

Output Buffer

Tool Process 1(BLAST, DOCK6, HMMER, …)…

Compression

Data-‐base

Query Block

Worker Nodes [1..N]

Master Node

CompressedBuffer

Main Memory

Tool Process P(BLAST, DOCK6, HMMER, …)

Main Memory

Preload Database

…

Bhanu Rekapalli et. al. BMC Bioinformatics 2013, 14: S3

HSP-‐wrap: memory management

•  stdiowrap – module for file management •  Function interposition to standard I/O calls •  Minimal modification to original code (if any) •  Input file management •  Files are mapped to main memory on-‐demand •  Tracks parallel reads

•  Output file management •  Double buffered parallel support •  Minimizes number of data transfers

In symmetric execution, both Xeon and Xeon Phi processors used as network hosts for distributed processing.

Symmetric HSPH-‐BLAST

HSPH-‐BLAST speedup over NCBI BLAST

Xeon/Phi Configuratio

n

Input sequence

s

Worker nodes:[Xeon, Phi]

Physical cores:[Xeon, Phi] =

Total 3x_8p 17000 [2, 8] [48, 488] = 536

5x_16p 34000 [4, 16] [80, 976] = 1056

9x_32p 68000 [8, 32] [144, 1952] = 2096

17x_64p 136000 [16, 64] [272, 3904] = 4176

Weak scaling parameters

Results: speedup

Results: load balance

Results: cost per sequence

•  Parallel wrappers can be adapted to current informatics applications to greatly improve processing throughput and scalability on supercomputing platforms due to similar programming models and I/O characteristics.

•  The wrapped tools can be used to identify species, perform DNA mapping, infer on functional and evolutionary relationships between sequences as well as help identify members of gene families.

•  Symmetric weak scaling studies showed linear speedup and balanced workload distribution for course-‐grained parallelization of BLAST.

•  Finer-‐grained vectorization of the BLAST process could improve utilization of Xeon Phi processors, thus we are collaborating with Intel engineers and NCBI BLAST developers to address this.

Conclusions

Future work

•  Extend Highly-‐Scalable Parallel Hybrid Software Wrapper •  Replication and fragmentation schemes for large data

management •  Input models: dot/cross product and hybrid approaches

•  Make tools available as standard software modules on HPC architectures

•  Integrate HSP-‐tools into scientific workflow pipelines to provide fast processing for high-‐impact scientific discovery •  Incorporate with web-‐enabled science gateways

•  Optimize NCBI BLAST for Xeon Phi processors and scale it with HSP-‐Wrap

•  Adapt parallel wrappers to other primary tools used in informatics fields of the life sciences

•  Build data analysis pipelines for novel data mining and large-‐scale knowledge discovery

Thank you

•  Graduate students •  Eduardo Ponce (EECS) •  Amit Upadhyay (GST) •  Paul Giblock (Cisco)