On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms

Platforms

Marat Zhanikeev

[email protected]

Hadoop versus Bigdata Replay

Tokyo Univ. of Science

O n P e r f o r m a n c e U n d e r H o t s p o t s i n

WebDB Forum 2017@お茶の水女子大

PDF → bit.do/170920

Background on Hadoop

• Hadoop performance measurement◦ creators on performance limits 09◦ superlinear effect 08◦ various benchmarks on Hadoop vs Spark 07◦ inconsistencies in measurements 11

• Hadoop/MapReduce optimization in 14 and a ton of other papers

• the ”Do We (actually) Need Hadoop?” argument in 10 and few recentpapers

09 K.Shvachko+0 ”HDFS scalability: the limits to growth” Usenix Login (2010)

08 N.Gunther+2 ”Hadoop Superlinear Scalability” ACM Queue (2015)

07 J.Shi+6 ”Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics” Very Large Data Bases (2015)

11 M.Xia+3 ”Performance Inconsistency in Large Scale Data Processing Clusters” 10th USENIX ICAC (2013)

14 A.Rasooli+1 ”COSHH: A Classiffication and Optimization based Scheduler for Heterogeneous Hadoop Systems” Future Gen.Comp.Sys. (2014)

10 A.Rowstron+1 ”Nobody ever got fired for using Hadoop on a cluster” 1st HotCDP (2012)

M.Zhanikeev – [email protected] On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms – bit.do/170920 2/122/12

Modeling Hadoop Bottlenecks

Network (NW)

Bulk Storage (BS)

Shared Memory (SM)

Core Output

Big Data Processing

HPC, Simulators, Modeling

Small Data

Bulk Storage (BS)

On-Chip Shared Memory (hSM)

Number of parallel accesses

Network (NW)

Ability to isolate

Bottleneck (pipe width)

RAM-based Shared Memory (sSM) Bulk

Storage (BS)

Network (NW) 1

RAM-based Shared Memory (sSM)

Parallel accesses

Ability to isolate

Core Output

Small Data


Hadoop’s Answer: Rack Awareness

Rack

Switch

Datanode

Datanode

Datanode…

Rack

Switch

Datanode

Datanode

……

CoreSwitch

Client

Client

Logical

Client

Own RackSwitch

Other Rack Switch

Other Rack Switch

Other Rack Switch

Datanodes

• official Hadoop feature(not a bug) 12

• some dynamics, goesoff-rack when localnodes have too many jobs

• sadly, manualconfiguration of rackaffiliation (much potential here for

research on virtual network coordinates –

Meridian, Vivaldi...)

12 ”Hadoop: Rack Awareness” https://hadoop.apache.org (2017)


Hadoop vs Bigdata Replay Method

• basic idea similar to 10 but uses circuits 02 to transfer shards and multicore01 to parallel-process them

Name Node Storage Node (shard)

file A file B file C …

Hadoop Space

Manager

Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code) MapReduce job (your code)

many many

Name Server(s)

Client Machine

Hadoop Client

Your Code

You

Start Use Deploy

Find Read/parse

many

Internals (DC)

Users Storage Node (shard)

Time-Aware Sub-Store(s)

Manager

Client Machine

Client

Your Sketcher

You

Start Use

Schedule

Multicore Replay

Replay Node

many

10 A.Rowstron+1 ”Nobody ever got fired for using Hadoop on a cluster” 1st HotCDP (2012)

02 myself+0 ”Circuit Emulation for Big Data Transfers in Clouds” Networking for Big Data, CRC (2015)

01 myself+0 ”Streaming Algorithms for Big Data Processing on Multicore” Big Data: Algorithms, Analytics, and Applications, CRC (2015)


Replay Environment is Highly Flexible!• replay is time-aligned, so jobs can pick any spot on the timeline

• similar to Spark in going beyond key-value datatype but more – the full scopeof streaming algorithms 01

• massively multicore environments 04 with 100+ cores, dynamic re-packing ofjob batches, etc.

Core 1

Core 1

Core X

Replay Manager

Now(replay)

….

Time-Aligned Big Data Cursor

Time Direction

One Sketch One Sketch One Sketch Start End End End

Read/prepare

Shared Memory

Start

…. Time

Now (buffer head)

Manager

Job

Job

Buffer tail

pos

pos

Controller

Kill

2 Report

Manage in realtime

One Replay Batch

One Buffer

One Buffer

One Buffer Jobs

Jobs

Jobs

Replay at a scale

1

01 myself+0 ”Streaming Algorithms for Big Data Processing on Multicore” Big Data: Algorithms, Analytics, and Applications, CRC (2015)

04 myself+0 ”Volume and Irregularity Effects on Massively Multicore Packet Processors” APNOMS (2016)


Performance under hotspots


The Hotspot Distribution

0 20 40 60 80 100

Decreasing order

0

0.35

0.7

1.05

1.4

1.75

2.1

2.45

2.8

log(

val

ue)

Class A Class B Class C Class D Class E • models Flash/Hotspot/Killerapp/Blackswanevents using extreme variancein popularity

• generation method:stick-breaking process,Dirichlet distribution withparallel beta sources 05

• final step: classify based onthe number of hot/flash items

05 myself+1 ”Popularity-Based Modeling of Flash Events in Synthetic Packet Traces” CQ研 (2012)


The Binary ”Till Contention” Metric• not a common, but very realisticway to model performance under load

• note: even more applicable underhotspot-y input

RackRack Border(switch)

Client

DataShardsDataShards…

Volume

Contention

Contention -freeto contention -fulthreshold

• example: function of server responsetime to load can be expressed as:

T =1

2

[(L− n) +

√(L− n)2 + k

1− L

]• ...where T is response time, L is load,and k is the knee = contention point!


Performance Models• shard size as S and in-job traffic to shard size ratio r

◦ so, Hadoop jobs generate rS versus always strictly S under Replay

• contention threshold as C (for both contention and/or capacity)

• list of shard hotness (popularity){h1, h2, h3, ..., hn

}and sizes{

S1,S2,S3, ....,Sn

}• then we have (job/traffic) volume for Hadoop:

Vhadoop =∑i=1..n

rhiSi

• ... and for Replay method:Vreplay =

∑i=1..n

Si (1)


Results

A 0.001 B 0.001 C 0.001 D 0.001 E 0.001

hadoopreplay

A 0.005 B 0.005 C 0.005 D 0.005 E 0.005

A 0.01 B 0.01 C 0.01 D 0.01 E 0.01

A 0.05 B 0.05 C 0.05 D 0.05 E 0.05

A 0.1 B 0.1 C 0.1 D 0.1 E 0.1

A 0.2 B 0.2 C 0.2 D 0.2 E 0.2

10 20 5010

020

050

010

0020

0050

0010

000

0.71.42.12.83.54.2

log

( 1 +

tim

e til

l con

tent

ion)

A 0.5 B 0.5 C 0.5 D 0.5 E 0.5

Replay period (step) is 10

A 0.001 B 0.001 C 0.001 D 0.001 E 0.001

hadoopreplay

A 0.005 B 0.005 C 0.005 D 0.005 E 0.005

A 0.01 B 0.01 C 0.01 D 0.01 E 0.01

A 0.05 B 0.05 C 0.05 D 0.05 E 0.05

A 0.1 B 0.1 C 0.1 D 0.1 E 0.1

A 0.2 B 0.2 C 0.2 D 0.2 E 0.2

10 20 5010

020

050

010

0020

0050

0010

000

0.81.62.43.2

44.8

log

( 1 +

tim

e til

l con

tent

ion)

A 0.5 B 0.5 C 0.5 D 0.5 E 0.5


A 0.001 B 0.001 C 0.001 D 0.001 E 0.001

hadoopreplay

A 0.005 B 0.005 C 0.005 D 0.005 E 0.005

A 0.01 B 0.01 C 0.01 D 0.01 E 0.01

A 0.05 B 0.05 C 0.05 D 0.05 E 0.05

A 0.1 B 0.1 C 0.1 D 0.1 E 0.1

A 0.2 B 0.2 C 0.2 D 0.2 E 0.2

10 20 5010

020

050

010

0020

0050

0010

000

0.9

1.8

2.7

3.6

4.5

5.4

log

( 1 +

tim

e til

l con

tent

ion)

A 0.5 B 0.5 C 0.5 D 0.5 E 0.5



That’s all, thank you ...


Technology

On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms