Upload
spark-summit
View
135
Download
0
Embed Size (px)
Citation preview
ACCELERATING SPARK GENOME
SEQUENCING IN CLOUD – A DATA DRIVEN
APPROACH, CASE STUDIES AND BEYOND
Yingqi (Lucy) Lu
Mulugeta Mammo
Eric Kaczmarek
Intel Corporation
Legal Disclaimer
2
• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,
software or service activation. Learn more at intel.com, or from the OEM or retailer.
• No computer system can be absolutely secure.
• Tests document performance of components on a particular test, in specific systems. Differences in hardware,
software, or configuration will affect actual performance. Consult other sources of information to evaluate
performance as you consider your purchase. For more complete information about performance and benchmark
results, visit http://www.intel.com/performance.
Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S. and/or other
countries.
*Other names and brands may be claimed as the property of others.© 2017 Intel Corporation
Spark Deployment Is Moving to Cloud
Cloud
On- premise
+ Quick deployment
+ Elasticity
+ Manageability/Maintenance
4
Spark Deployment Is Moving to Cloud
Cloud
On- premise
- Don’t expect similar performance
- Limited perf counters available
- Need to re-profile and retune your
application
5
Cloud vs. On-Premises
6
“Do I need 10 instances with 2 cores per instance and
network attached storage or a single instance with 20 cores
and attached storage”
Cloud vs. On-Premises
7
“Do I need 10 instances with 2 cores per instance and
network attached storage or a single instance with 20 cores
and attached storage”
It depends.
The performance of your application
in a Cloud environment will be
directly affected by your resource
partitioning.
Compute vs. IO
8
Setup #1
36 cores
9 storage disks
Setup #2
12 cores
9 storage disks
Setup #3
15 cores
9 storage disks
A Spark Application
CPU cycles spent
waiting on IO
computation wasted
CPU fully utilized
IO under utilized
Storage wasted
CPU fully utilized
IO fully utilized
Best ROI
Run on
Pay attention to IO vs. Core ratio
9
Starting from on-premises baseline, profiling
Spark Application and Java Virtual Machine– Hot functions
– Locking contentions
– Java garbage collection
Partition Resources in the Cloud
10
Partition Resources in the Cloud
Starting from on-premises baseline, profiling
Spark Application and Java Virtual Machine– Hot functions
– Locking contentions
– Java Garbage collection
*System– Processor
– Network and Storage
– Memory
* Be conscious on available tools and counters, not everything would actually work
Case Study – Genome Analysis Toolkit
Structured programming framework designed to enable rapid
development of efficient and robust analysis tools for next-
generation DNA sequencers
– Industry standard for analyzing/sequencing human genome data
– Developed by the Broad Institute of MIT and Harvard
11
Profile Application and Java VM
Java Flight Recorder
− Ships with Oracle JDK
− Thread lock contention
− Hot functions
− Garbage collection
12
Hot function
Lock contention Garbage collection
Lock Contention Example
13
• Spark application using SynchronizedMap resulting in heavy lock
contention (50+% of time spent waiting on lock)
• Replacing SynchornizedMap with ConcurentHashMap improved
performance by 3.5x
Uncover a Scala Scalability Issue
14
• The problem resides in Scala APIs is caused by highly concurrent
Instanceof calls from Java VM
• The problem gets exacerbated with increasing # of threads inside
Java VM
Scala API Fix
15
• Use polymorphism instead of instanceof!
• 1.6x performance improvement in the critical stage and 1.3x across
the entire workload.
• Code changes released in Scala 2.12.0
• https://issues.scala-lang.org/browse/SI-9823
Beyond Scala and Spark
16
• Scalability issue with Instanceof impacts other Java applications– Apache Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-
12787
– Similar fix results in 61% better throughput and 15% reduction in 99
percentile latency reduction
• Hottest GC function is
PSPromotionManager::copy_to_survivor_space
• Tuning following parameters improves 10% performance
-XX:SurvivorRatio
-XX:InitialTenuringThreshold
-XX:MaxTenuringThreshold
Garbage Collection Example
17
Eden
Old
Generation
Survivor Space #1 Survivor Space #2
Object
Profile System
18
• Baseline shows up to 40% CPU cycles spent waiting on IO
• With same total number of cores, changing Core vs. Storage ratio
from 32 vs.1 to 4 vs.1 provides 1.4x performance improvements
1.0
1.4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
6VM with 32vCPU/VM 48VM with 4vCPU/VM
Th
rou
gh
pu
t
1 storage disk/VM
Summary
• Spark deployment is moving from on-premises to cloud
• Cloud environment provides elastic deployment, but at
the same time brings the challenges of repartitioning
resources
• Profiling applications and understand their behavior lead
to good performance improvement
19
Thank You.
Yingqi (Lucy) Lu: [email protected]
Mulugeta Mammo: [email protected]
Eric Kaczmarek: [email protected]