Upload
tokyo-university-of-science
View
370
Download
3
Embed Size (px)
Citation preview
Platforms
Marat Zhanikeev
Hadoop versus Bigdata Replay
Tokyo Univ. of Science
O n P e r f o r m a n c e U n d e r H o t s p o t s i n
WebDB Forum 2017@お茶の水女子大
PDF → bit.do/170920
Background on Hadoop
• Hadoop performance measurement◦ creators on performance limits 09◦ superlinear effect 08◦ various benchmarks on Hadoop vs Spark 07◦ inconsistencies in measurements 11
• Hadoop/MapReduce optimization in 14 and a ton of other papers
• the ”Do We (actually) Need Hadoop?” argument in 10 and few recentpapers
09 K.Shvachko+0 ”HDFS scalability: the limits to growth” Usenix Login (2010)
08 N.Gunther+2 ”Hadoop Superlinear Scalability” ACM Queue (2015)
07 J.Shi+6 ”Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics” Very Large Data Bases (2015)
11 M.Xia+3 ”Performance Inconsistency in Large Scale Data Processing Clusters” 10th USENIX ICAC (2013)
14 A.Rasooli+1 ”COSHH: A Classiffication and Optimization based Scheduler for Heterogeneous Hadoop Systems” Future Gen.Comp.Sys. (2014)
10 A.Rowstron+1 ”Nobody ever got fired for using Hadoop on a cluster” 1st HotCDP (2012)
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 2/122/12
Modeling Hadoop Bottlenecks
Network (NW)
Bulk Storage (BS)
Shared Memory (SM)
Core Output
Big Data Processing
HPC, Simulators, Modeling
Small Data
Bulk Storage (BS)
On-Chip Shared Memory (hSM)
Number of parallel accesses
Network (NW)
Ability to isolate
Bottleneck (pipe width)
RAM-based Shared Memory (sSM) Bulk
Storage (BS)
Network (NW) 1
RAM-based Shared Memory (sSM)
Parallel accesses
Ability to isolate
Core Output
Small Data
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 3/123/12
Hadoop’s Answer: Rack Awareness
Rack
Switch
Datanode
Datanode
Datanode…
Rack
Switch
Datanode
Datanode
……
CoreSwitch
Client
Client
Logical
Client
Own RackSwitch
Other Rack Switch
Other Rack Switch
Other Rack Switch
Datanodes
• official Hadoop feature(not a bug) 12
• some dynamics, goesoff-rack when localnodes have too many jobs
• sadly, manualconfiguration of rackaffiliation (much potential here for
research on virtual network coordinates –
Meridian, Vivaldi...)
12 ”Hadoop: Rack Awareness” https://hadoop.apache.org (2017)
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 4/124/12
Hadoop vs Bigdata Replay Method
• basic idea similar to 10 but uses circuits 02 to transfer shards and multicore01 to parallel-process them
Name Node Storage Node (shard)
file A file B file C …
Hadoop Space
Manager
Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code) MapReduce job (your code)
many many
Name Server(s)
Client Machine
Hadoop Client
Your Code
You
Start Use Deploy
Find Read/parse
many
Internals (DC)
Users Storage Node (shard)
Time-Aware Sub-Store(s)
Manager
Client Machine
Client
Your Sketcher
You
Start Use
Schedule
Multicore Replay
Replay Node
many
10 A.Rowstron+1 ”Nobody ever got fired for using Hadoop on a cluster” 1st HotCDP (2012)
02 myself+0 ”Circuit Emulation for Big Data Transfers in Clouds” Networking for Big Data, CRC (2015)
01 myself+0 ”Streaming Algorithms for Big Data Processing on Multicore” Big Data: Algorithms, Analytics, and Applications, CRC (2015)
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 5/125/12
Replay Environment is Highly Flexible!• replay is time-aligned, so jobs can pick any spot on the timeline
• similar to Spark in going beyond key-value datatype but more – the full scopeof streaming algorithms 01
• massively multicore environments 04 with 100+ cores, dynamic re-packing ofjob batches, etc.
Core 1
Core 1
Core X
Replay Manager
Now(replay)
….
Time-Aligned Big Data Cursor
Time Direction
One Sketch One Sketch One Sketch Start End End End
Read/prepare
Shared Memory
Start
…. Time
Now (buffer head)
Manager
Job
Job
Buffer tail
pos
pos
Controller
Kill
2 Report
Manage in realtime
One Replay Batch
One Buffer
One Buffer
One Buffer Jobs
Jobs
Jobs
Replay at a scale
1
01 myself+0 ”Streaming Algorithms for Big Data Processing on Multicore” Big Data: Algorithms, Analytics, and Applications, CRC (2015)
04 myself+0 ”Volume and Irregularity Effects on Massively Multicore Packet Processors” APNOMS (2016)
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 6/126/12
Performance under hotspots
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 7/127/12
The Hotspot Distribution
0 20 40 60 80 100
Decreasing order
0
0.35
0.7
1.05
1.4
1.75
2.1
2.45
2.8
log(
val
ue)
Class A Class B Class C Class D Class E • models Flash/Hotspot/Killerapp/Blackswanevents using extreme variancein popularity
• generation method:stick-breaking process,Dirichlet distribution withparallel beta sources 05
• final step: classify based onthe number of hot/flash items
05 myself+1 ”Popularity-Based Modeling of Flash Events in Synthetic Packet Traces” CQ研 (2012)
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 8/128/12
The Binary ”Till Contention” Metric• not a common, but very realisticway to model performance under load
• note: even more applicable underhotspot-y input
RackRack Border(switch)
Client
DataShardsDataShards…
Volume
Contention
Contention -freeto contention -fulthreshold
• example: function of server responsetime to load can be expressed as:
T =1
2
[(L− n) +
√(L− n)2 + k
1− L
]• ...where T is response time, L is load,and k is the knee = contention point!
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 9/129/12
Performance Models• shard size as S and in-job traffic to shard size ratio r
◦ so, Hadoop jobs generate rS versus always strictly S under Replay
• contention threshold as C (for both contention and/or capacity)
• list of shard hotness (popularity){h1, h2, h3, ..., hn
}and sizes{
S1,S2,S3, ....,Sn
}• then we have (job/traffic) volume for Hadoop:
Vhadoop =∑i=1..n
rhiSi
• ... and for Replay method:Vreplay =
∑i=1..n
Si (1)
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 10/1210/12
Results
A 0.001 B 0.001 C 0.001 D 0.001 E 0.001
hadoopreplay
A 0.005 B 0.005 C 0.005 D 0.005 E 0.005
A 0.01 B 0.01 C 0.01 D 0.01 E 0.01
A 0.05 B 0.05 C 0.05 D 0.05 E 0.05
A 0.1 B 0.1 C 0.1 D 0.1 E 0.1
A 0.2 B 0.2 C 0.2 D 0.2 E 0.2
10 20 5010
020
050
010
0020
0050
0010
000
0.71.42.12.83.54.2
log
( 1 +
tim
e til
l con
tent
ion)
A 0.5 B 0.5 C 0.5 D 0.5 E 0.5
Replay period (step) is 10
A 0.001 B 0.001 C 0.001 D 0.001 E 0.001
hadoopreplay
A 0.005 B 0.005 C 0.005 D 0.005 E 0.005
A 0.01 B 0.01 C 0.01 D 0.01 E 0.01
A 0.05 B 0.05 C 0.05 D 0.05 E 0.05
A 0.1 B 0.1 C 0.1 D 0.1 E 0.1
A 0.2 B 0.2 C 0.2 D 0.2 E 0.2
10 20 5010
020
050
010
0020
0050
0010
000
0.81.62.43.2
44.8
log
( 1 +
tim
e til
l con
tent
ion)
A 0.5 B 0.5 C 0.5 D 0.5 E 0.5
Replay period (step) is 50
A 0.001 B 0.001 C 0.001 D 0.001 E 0.001
hadoopreplay
A 0.005 B 0.005 C 0.005 D 0.005 E 0.005
A 0.01 B 0.01 C 0.01 D 0.01 E 0.01
A 0.05 B 0.05 C 0.05 D 0.05 E 0.05
A 0.1 B 0.1 C 0.1 D 0.1 E 0.1
A 0.2 B 0.2 C 0.2 D 0.2 E 0.2
10 20 5010
020
050
010
0020
0050
0010
000
0.9
1.8
2.7
3.6
4.5
5.4
log
( 1 +
tim
e til
l con
tent
ion)
A 0.5 B 0.5 C 0.5 D 0.5 E 0.5
Replay period (step) is 200
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 11/1211/12
That’s all, thank you ...
M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 12/1212/12