40
A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers Yousi Zheng Dept. of ECE, The Ohio State University E-mail: [email protected] Joint work with Ness B. Shroff Prasun Sinha

A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

  • Upload
    sandro

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers. Yousi Zheng Dept. of ECE, The Ohio State University E-mail: [email protected]. J oint work with Ness B. Shroff Prasun Sinha. MapReduce. Data Centers Facility containing a very large number of machines - PowerPoint PPT Presentation

Citation preview

Page 1: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

A New Analytical Technique for Designing Provably Efficient

MapReduce Schedulers

Yousi ZhengDept. of ECE, The Ohio State UniversityE-mail: [email protected]

Joint work with Ness B. ShroffPrasun Sinha

Page 2: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

2

MapReduceData Centers

Facility containing a very large number of machinesThey process very large datasets: MapReduce

MapReduceFramework for processing highly parallel problems Developed (popularized) by Google Nearly ubiquitous (Google, IBM, Facebook, Microsoft,

Yahoo, Amazon, eBay, twitter…)Used in a variety of different applications

Distributed Grep, distributed sort, AI, scientific computation, Large scale pdf generation, Geographical data, Image processing…

Page 3: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

3

MapReduceMap phase:

Takes an input job and divides into many small sub-problems

Operations can run in parallel on potentially different machines

Reduce phaseCombines the output of MapOccurs after the Map phase

is completedRuns on parallel machines…

Goal: Schedule these Map and Reduce tasks in order to minimize the total flow time in the system

Page 4: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

4

Time is slottedN “Machines”: Each machine runs 1 unit of workload per slot

Scheduling Constraint: Reduce job startsafter all tasks in the Map job are completed.

Reduce

…Data Center with N machines

n jobs over T slots

Each job i consists of Map tasks and Reduce tasks

Model

Each Map task has 1 unit of workloadTotal Mi tasks for Map job i

Each Reduce task can have multiple units of workload

Ri (k)

units of workload for task k for job i

units of workload for Reduce job i

1

MapMi

Ri (k)

job i

Page 5: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

5

Preemptive Reduce tasks can be interrupted

at the end of a slot Remaining workload in the task can be

executed on any machine(s) Reasonable if overhead

of data-migration is small Non-preemptive

Types of Schedulers: Treatment of Reduce Tasks

R1 R1

R2

Machines

Time

Machine C

Machine A

Machine B

0 1 2 3 4 5 6 7

Page 6: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

6

Preemptive Reduce tasks can be interrupted

at the end of a slot Remaining workload in the task can be

executed on any machine(s) Reasonable if overhead

of data-migration is small Non-preemptive

Once a Reduce task is started, it can’t be interrupted till the end of the task

An individual Reduce task cannot be completed on different machines But different tasks from same job can be assigned to different

machines Note: Since Map tasks have unit workload, they cannot be

interrupted. In practice Map tasks may be > 1 unit, but small

Types of Schedulers: Treatment of Reduce Tasks

R1 R1

R2

Machines

Time

Machine C

Machine A

Machine BR2

0 1 2 3 4 5 6 7

Page 7: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

7

Flow-Time Minimization Problem: Preemptive Scenario

Total # machines is N

Total map workload is Mi for job i

Total reduce workload is Ri for job i

Minimize flow-time

Scheduling Constraint

Page 8: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

8

Flow-Time Minimization ProblemNon-Preemptive Scenario

Total # machines is N

Total map workload is Mi for job i

Total reduce workload is R(k)

i for task k in job i

Non-preemptive

Minimize flow-time

Both Problems are NP-hard in the strong sense

Page 9: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

9

Competitive Ratio

TheoremFor any constant c0 and any online scheduler S, there are sequences of arrivals and workloads in the non-preemptive case, such that the competitive ratio c is greater than c0

No policy gives a finite competitive ratio

: set of all possible schedulers (including non-causal) : Total Flow time of scheduler (sum of FT of all jobs)Define

Competitive ratio: A given online policy S is said to have a competitive ratio of c if for any arrival pattern

Page 10: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

10

Efficiency Ratio AnalysisEfficiency ratio: A given online policy S is said to have

an efficiency ratio of if (over some probability space of arrivals)

Theorem● If the workload of Reduce tasks of each job is

light-tailed distributed, then Any work-conserving scheduler has a constant

efficiency ratio, for both preemptive and non-preemptive scenarios.

Page 11: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

11

So FarNo scheduler has a finite competitive ratio in non-

preemptive scenarioAn evaluation metric efficiency ratio is introduced

All work conserving schedulers have a finite efficiency ratio () for both preemptive & non-preemptive scenariosBut the performance of these schedulers could still be

quite poor ( could be very large)Key Question: Can we design a simple scheduler

with a provably small efficiency ratio in the MapReduce paradigm?

Preemptive scenario: Yes! (ASRPT)

Page 12: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

12

ASRPT: Available Shortest Remaining Processing Time

SRPT: An infeasible schedulerPriority given to jobs with smallest remaining workload In SRPT, Map and Reduce for a given job can be

scheduled in the same time-slot (infeasible in reality)SRPT results in a lower flow time than any feasible

(including non-causal) schedulerASRPT

Map tasks are scheduled according to SRPT (or sooner) By running SRPT in a virtual manner

Reduce tasks greedily fill up the remaining machines In the order determined by the shortest remaining processing

time

Page 13: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

13

ASRPT: Available Shortest Remaining Processing Time

R1

M2M1

R21

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s, N

Time: Total Flow-time 4 units

R1R1

M2M1

R21

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s, N

Time

R1

: Total Flow-time 5 unitsSRPT ASRPT

R1

Theorem:If the job arrivals and workload are i.i.d., then ASRPT has an efficiency ratio of 2 in the preemptive scenario.

Page 14: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

14

Simulation• N = 100 machines, total time slots T = 800 • # Reduce tasks in each job is 10• Job arrival process: Poisson; λ = 2 jobs per time slot.• Preemptive Scenario• Schedulers

• ASRPT• Fair• FIFO

• ASRPT has smallerefficiency ratio thanFair and FIFO.

Page 15: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

15

Conclusion• Investigated the flow time minimization problem in MapReduce

schedulers

• No online policy has a finite competitive ratio in non-preemptive scenarios

• All work-conserving schedulers have constant efficiency ratios • Preemptive and non-preemptive scenarios.

• A specific algorithm (ASRPT) can guarantee an efficiency ratio of 2 in the preemptive scenario. • Is efficiency ratio of ASRPT similarly small in non-pre-emptive case?

• Efficiency Ratio based analysis provides worse case (w.p.1) guarantees (compared to competitive ratio), but gives more design flexibility.• Has the potential to be applied to other problems/domains

Page 16: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

16

Consider cost of data migration Combine preemptive and non-preemptive scenarios

Consider multiple phases With phase precedence (for complex processing)

Develop Green Energy solutions Combine with sleep-wake scheduling Consider renewable energy fluctuations

Future Work

Consider Shuffle Phase Introduce more information (e.g.,

dependency Graph) to the scheduler So that all reduce tasks don’t wait

until Map tasks are completed

Page 17: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

17

Thank You!

Page 18: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

18

Backup Slides

Page 19: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

19

Main Results Introduced a metric Efficiency Ratio to evaluate the

performance of schedulers in MapReduce Proved that no online scheduler has a finite competitive ratio Guarantee with probability 1 May have greater modeling implications for analysis of algorithms

Proved that Efficiency Ratio is bounded by a constant for any work-conserving scheduler For both bounded and light-tailed workload for Reduce phase For both preemptive and non-preemptive scenarios

Developed a specific work-conserving algorithm (ASRPT) that has an efficiency ratio of 2 for preemptive scenario.

Page 20: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

20

For any work-conserving scheduling policy: Preemptive:

Non-preemptive:

Similar result holds but with different constants ( )

Efficiency Ratio: Preemptive & Non-preemptive

where p0 = Prob (a slot has no job arrivals)

Page 21: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

21

MapReduce: FIFO scheduler

R1M2

M1R2

1

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s

Time

Flow-time of a job: time spent by job in the system

FT of Job 1: 3 time slots FT of Job 2: 3 time slots Total FT : 6 time slots

FCFS scheduler with 4 machines, 2 jobs

Map must finish before Reduce can start

R1

M1

R1

M2 R2

Page 22: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

22

MapReduce: Smarter Scheduler

R1

M2M1

R2

1

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s

Time

M1

R1

M2 R2

Flow-time: time in the system

FT of Job 1: 3 time slotsFT of Job 2: 2 time slotsTotal FT: 5 time slots

Goal: Minimize total flow-time

R1

“Smarter Scheduler”

Page 23: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

23

Consider cost of data migrationCombine preemptive and non-preemptive scenarios

Consider multiple phases With phase precedence (for complex processing)

Develop Green Energy solutionsCombine with sleep-wake schedulingConsider renewable energy fluctuations

Future Work

Consider Shuffle Phase Introduce more information (e.g.,

dependency Graph) to the scheduler

So that all reduce tasks don’t wait until Map tasks are completed

Page 24: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

24

Analysis Outline

Divide up time into repeating intervals Interval corresponds to the consecutive times slots until a

time-slot occurs with an idle machine(s)

Bound the moments of these intervals with constants

Bound the efficiency ratio using the moments

Analysis applies to: Preemptive and non-preemptive scenariosAllows jobs to have infinite but light-tailed support

Page 25: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

25

Preemptive Scenario: An ExampleComplete time-slot (all machines used)Incomplete time-slot (all machines not used)Kj: Length of j-th interval (ends in incomplete slot)

Observations for a job arriving in interval j (work-conserving)

Map task finishes in interval j (otherwise last time-slot would be complete)

Reduce tasks finish in interval j or interval j+1

Flow time per job is bounded, wp1, if the interval sizes (moments) are bounded

g is bounded wp1.

Page 26: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

26

Non-preemptive Scenario: An Example

Observation for reduce tasks: Unlike preemptive scenario, in the non-

preemptive scenario, each job arriving in interval j will start (not finish) all its reduce tasks in interval j or interval j+1

Reduce tasks must be finished by the end of interval j+1 plus time Ri-1.

Hence, if (all moments) of the interval are bounded then the flow times (per job) are bounded wp1 g is bounded wp1.

Page 27: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

27

Lemma: If the scheduler is work-conserving, then for any given number , there exists a constant , such that , for all j.

where

Bounds on the Moments

Rough idea: Since the traffic intensity ρ<1, and the workload in each

slot is light-tailed, the number of consecutive complete time slots will decay exponentially (Chernoff’s Theorem)

The moments of interval K are bounded

Page 28: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

28

Many unanswered questions

Competitive Ratio Efficiency Ratio

Preemptive ?Constant for any work-conserving scheduler

2 for ASRPT

Non-preemptive

No constant for any scheduler

Constant for any work-conserving scheduler

? for ASRPT

Page 29: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

29

Main schedulers proposed for MapReduce framework

FIFO scheduler:Default in HadoopSuffers from the well known head-of-line blocking

problemFair scheduler [Zaharia’09]

Mitigate head-of-line blocking problem of FIFOJobs stick to the machines on which they are initially

scheduledCoupling scheduler [Tan’12]

Mitigate the problem that jobs stick to machines

Page 30: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

30

Consider machines with speed-up [Moseley’11] O(1/ ε5) competitive algorithm with (1 + ε) speed for

the online case, where 0 < ε ≤ 1Speed-up of machines is necessary in the algorithm (if

ε decreases to 0, the competitive ratio will increase to infinity correspondingly).

Consider minimizing total completion time [Chang’11, Chen’12]Minimizing total completion time = minimizing total

flow time. Competitive ratio of “minimizing total completion time”

≠ competitive ratio of “minimizing total flow time”.

Main schedulers proposed for MapReduce framework

Page 31: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

31

ASRPT runs SRPT in a virtual fashion and keeps track of how SRPT would have scheduled jobs in any given slot.

ASRPT assigns machines to the tasks in the following priority order (from high to low): (Only in the non-preemptive scenario) The previously

scheduled Reduce tasks which are not yet finished,The Map tasks which are scheduled in the SRPT, The available Reduce tasks, The available Map tasks.

For each priority group, the algorithm greedily assign machines to the corresponding tasks through the sorted available workload.

ASRPT

Page 32: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

32

Light-tailed Distributed Workload

where

Page 33: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

33

SimulationsPreemptive Non-preemptive

Page 34: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

34

Main Results Competitive Ratio: Deterministic guarantee under adversarial arrivals

Unbounded for all non-preemptive scheduler

Efficiency Ratio Asymptotic Guarantee with probability 1 May have greater modeling implications for analysis of algorithms

Efficiency Ratio is bounded by a constant for any work-conserving scheduler For bounded workload for Reduce phase (Ri < Rmax) For light-tailed workload for Reduce phase

An Algorithm (ASRPT) with efficiency ratio of 2 for preemptive scenario.

Page 35: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

35

Non-preemptive One task can only be allocated to one machine (preserves

data locality) Once started cannot be interrupted till the end of the task

Types of Schedulers: Treatment of Reduce Tasks

Page 36: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

36

Preemptive Every task can be interrupted at the end of a slot Remaining workload can be executed on any machine

Types of Schedulers: Treatment of Reduce Tasks

Page 37: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

37

More in Practice

Shuffle Phase

Page 38: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

38

ASRPT: Available Shortest Remaining Processing Time [Example]

R1

M2M1

R21

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s, N

Time: Total Flow-time 4 units

R1R1

M2M1

R2

1

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s, N

Time

R1

: Total Flow-time 5 unitsSRPT ASRPT

Page 39: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

39

ASRPT: Available Shortest Remaining Processing Time [Example]

R1M2

M1R2

1

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s, N

Time: Total Flow-time 6 units

R1

R1

M2M1

R2

1

2

3

4

1 2 3 4

Num

ber o

f mac

hine

s, N

Time

R1

: Total Flow-time 5 unitsFIFO ASRPT

Page 40: A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

40

Lemma: If the scheduler is work-conserving, then for any given number , there exists a constant , such that , for all j.

where

Bounds on the Moments