18
Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy Jingwei Huang, David Nicol, and Roy Campbell 1

Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

Embed Size (px)

Citation preview

Page 1: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

1

Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

Jingwei Huang, David Nicol, and Roy Campbell

Page 2: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

2

Agenda• Background – Hadoop/YARN• Vulnerability• Threat Model• Simulating DoS attacks to Hadoop• Concluding Remarks

Page 3: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

3

Background• Hadoop – a widely used cloud platform for Big Data• A Hadoop cluster is shared by a dynamic set of clients; • each client has certain amount of assigned capacity; • Computing resource allocation in a cluster is carried out through a scheduler• Hadoop/MapReduce Hadoop/YARN

– Map slot / reduce slot -> container, a unified form of computing resource unit for a task– JobTracker -> Resource Manager, employing a scheduler to manage resource in the cluster– TaskTracker -> Application Manager

– More flexible -- supporting not only MapReduce, but also other parallel programming models such as Spark;

– More scalable -- facilitating up to 10,000 nodes;– More secure – e.g. Kerberos based authentication, access control on data, encrypted shuffle

Page 4: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

4

A Vulnerability of Hadoop• Vulnerability: In a Hadoop cluster, there are shared but

unconstrained computing resources, such as CPU, GPU, disk read/write, and network bandwidth;– Previously, Schedulers consider only memory, in resource sharing

among clients;– Most recently, Fair Scheduler can be configured to consider both

memory and CPU, based on “Dominant Resource Fairness” [Ghodsi et al NSDI’2011], but it still considers only memory by default.

• Question 1: why is this a vulnerability?• Threat: Malicious users can launch DoS attacks by

exhausting unconstrained resources of the hosts running their DoS “tasks”.

• Question 2: How bad it could be? How to model it? How to quantify the impact?

Page 5: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

5

Threat Model• Assumption 1: Attackers are rational. Specifically, to maximize impact of a DoS

attack, an attacker will use the compromised capacity to hold as many containers as possible, and use the containers to launch DoS “tasks” in as many nodes as possible;

• Assumption 2: For a DoS attack, a single attack task on a node can excessively use unconstrained resources, (e.g. CPU, GPU, disk I/O, bandwidth), to make that node have degraded service.

• Question: Given the compromised capacity, how many nodes can be infected by DoS tasks?– Lower bound: c × n,

• c: compromised capacity; n: number of nodes in Hadoop cluster

e.g. 1% × 1000 = 10; 2% -> 20– Upper bound: c × (n × c_nd) / c_min,

• c_nd: the size of resource that each node has;• c_min: the minimum resource that a container could have

e.g. 1% × 1000 × 64GB/1GB = 640; 2% -> 1000– [lb, ub] - Very broad interval; – what’s the most likely number?

• Modeling as “k ping-pong balls to n boxes problem”

Page 6: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

6

K-ping-pong balls to n-boxes problem• Container (task slot) -> ball; node -> box• The game

– A player has k ping-pong balls;– There are n boxes;– Goal: throw k ping-pong balls into as many boxes as possible; – So always target an empty box (Player can see whether or not a box has

balls);– The probability of a ball falling into a targeted box is uncertain (e.g. on a

windy day);– Generally, a ball may bounce out of all the boxes; in our context of

requesting a container from a host in Hadoop, we assume the probability is 0;

– So, a ball falls into either the targeted box or one of the others.• Questions:

– Given k balls and n boxes, how many boxes have balls after the game? – How many balls thrown can guarantee every box having at least one ball?

Page 7: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

7

Simulation• Solve K-ping-pong balls to n-boxes problem by simulation,

using Mobius (mobius.illinois.edu)– a stochastic discrete event system modeling and simulation software

tool.– Stochastic Activity Network -- an extended Petri Net.

• Notation– P_t: probability a ball falls into the targeted box– P_o: probability a ball bounces out of all the boxes– m: the number of boxes having balls– P_e: probability a ball falls into an empty box

• P_e = P_t + (1 - P_t - P_o) (n – m - 1 )/n• Success rate

S = E(m)/k when k < n; E(m)/n when k<=n

Page 8: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

8

Simulation results

Number ofBoxes

(n)

Number of Balls

(k)

Expected number of occupied boxes,

E(m)

Confidence interval

(in level 0.95)

Success Rate

E(m)/k

Success Rate

E(m)/n

1000 100 98.03 0.023 98.03%1000 500 440.64 0.084 88.13%1000 700 586.65 0.111 83.81%1000 900 719.93 0.135 79.99%1000 1000 781.60 0.1 78.16% 78.16%1000 1100 840.31 0.152 84.03%1000 1200 895.80 0.163 89.58%1000 1300 948.71 0.173 94.87%1000 1400 995.80 0.113 99.58%1000 1460 999.996 0.003 100.00%

10000 14600 10000 0.000 100.00%10000 10000 7824.13 0.455 78.24%20000 20000 15648.07 0.646 78.24%

Page 9: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

9

Simulating DoS Attack to Hadoop

• Problem is solved by discrete event system modeling and simulation, using Mobius.• DoS attack model

– Deploy attack tasks to a number of nodes– Simulate k-ping-pong balls to n-boxes problem

• Mapper model– Task completion time: Gamma distribution– Maximum time allowed

• Reducer model– Similar to mapper model, but has Shuffle phrase

• Q: How to quantify the impact?

Page 10: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

10

Task completion time• Assume that task completion time follows

Gamma distribution

• Mean: αβ• Mode: (α-1)β

The peak in PDF, ormost seen value

• Apply to map, shuffle, and reduce

Page 11: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

11

Simulation (Reducer) Using Mobius

Page 12: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

12

Measure of DoS Attacks• c: Compromised capacity in a Hadoop cluster• m/n: Attack broadness:– Rate of compromised nodes (m/n)– n: number of nodes in a Hadoop cluster– m: number of compromised nodes, can be estimated by

simulating• K-ping-pong balls to n-boxes game• A container -> a ball• A node -> a box

• d: Attack strength (denoted as d) :– Or, degree of degraded service – Percentage of the resources occupied by the DoS attack in

an infected node

Page 13: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

13

• Hadoop cluster: 1000 nodes X 16GB; n=1000• Job: 1024 mappers; 256 reducers; allocated capacity: 5%• S1: c=0 (compromised capacity)• S2: c=6.25% (m/n=78.2%), d=50% • S3: c=6.25% (m/n=78.2%), d=80%• S4: c=9.375% (m/n=100%), d=80% • S5: c=9.375% (m/n=100%), d=90% -> cdf curve almost stays as 0• S6: c=6.25% (m/n=78.2%), d=80%, with longer maximum time limit for M/R tasks• Simulation for larger cluster, such as n=10,000, has similar results.

Cumulated Distribution Function of Job Completion Before time t

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97100

0.00E+00

2.00E-01

4.00E-01

6.00E-01

8.00E-01

1.00E+00

1.20E+00

Series1Series2Series3Series4Series5Series6

1 2 3 4

5

6

Simulation Results

Page 14: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

14

Conclusion

• A small amount of compromised capacity in a Hadoop cluster with multi-tenancy can be used for DoS attacks, and cause a significant drop in the performance of that Hadoop cluster.

Page 15: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

15

Discussion: How to solve the problem?• Schedulers consider all types of resources?

– Pro: resolve the issue completely– Con: complexity and resource management cost

• Can ResourceManager (RM) avoid to tell AppMaster (AM) where a container is?– In current YARN design, RM needs to tell AM in which node an assigned container

is, because AM needs to work with the NodeManager on that node to manage the task run in the container assigned to that node.

• Know your neighbors in the shared cluster– In a cluster with multi-tenancy, it could be important to know you are sharing the

cluster with whom and their resource usage patterns. – But, it is not always possible to know that; also, privacy issues exist.

• Future Work– Metrics to characterize the status of a node, a rack, a cluster, and the patterns of

resource consumption of a job and a client.– Use the metrics as a new mechanism of cloud “transparency” – Enable YARN and an appplication to use the metrics to optimize performance.

Page 16: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

16

• Full paper can be found at:

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6906760

J. Huang, D.M. Nicol, R. Campbell, ``Denial-of-Service Threat to Hadoop/YARN Clusters with Multi-Tenancy'’, IEEE BigData 2014, June 27 - July 2, 2014, Anchorage, Alaska, USA.

Page 17: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

17

Acknowledgement

This work is conducted in the Assured Cloud Computing Center at UIUC, funded by ASFOR/AFRL.

Page 18: Denial-of-Service Threat to Hadoop (YARN) Clusters with Multi-Tenancy

18

Thank You !

[email protected]