Upload
mingtemp
View
402
Download
2
Embed Size (px)
DESCRIPTION
Scaling and scheduling to maximize application performance within budget constraints in cloud workflows IPDPS 2013
Citation preview
Ming Mao, Marty Humphrey
CS Department, UVa
Scaling and Scheduling to Maximize Application Performance within Budget
Constraints in Cloud Workflows
IPDPS 2013 (May 21st 2013)
1
2
Dynamic scalability and cost saving are two of the most important factors when considering cloud adoption
Two major benefits - dynamic scalability and cost
A survey from 39 major technology companies [1]
Cloud benefits On-demand self-services
Broad network access
Resource pooling
Rapid elasticity
Measured services
Cheaper maintenance
……
Why do you move into the cloud?
3
Dynamic scalability – the ability to acquire/release resources in response to demand dynamically
Dynamic scalability challenge → It relies on the users to tell the size of resource pool
Over-provisioning → cost more than necessary, offset cloud advantages
Under-provisioning → hurt application performance, cannot meet service level agreements and lose application customers
Cloud dynamic scalability
over-provisioning under-provisioning
4
Problem - What resources should be acquired/released in the cloud, and how should the computing activities be mapped to the cloud resources, so that the application performance can be maximized within the budget constrains?
In this paper, we discuss limited budget case
The unlimited budget case was discussed in our SC 11 paper
Solution - This paper argues that an automatic resource provisioning and allocation mechanism, i.e., an auto-scaling solution – is the key to successful cloud adoption. Essentially, an auto-scaling solution needs to answer the following two questions:
Capacity determination (or resource provisioning) what types of resources, how much and for how long
Job scheduling (or resource allocation) map computing activities onto the cloud resources
Problem statement
5
An application consists of service components. A workflow goes through different service components and therefore consists of multiple connected tasks
Workload is a stream of workflow jobs not known in advance
Task precedence constraints need to be preserved
Jobs have individual priorities
Service oriented architecture (SOA) & workflow jobs
6
Minimize job turnaround time within budget constraints Problem formulation
Problem terminology Cloud application
app = {Si}
Job class J = {DAG(Si), priorityJ| Si ∈ app}
Cloud VM VMv = {[𝐽𝑆𝑖]v , cv , lagv}
Workload Wt = 𝑗𝑜𝑏𝐽
𝑆𝑖𝑗𝑜𝑏𝐽𝑆𝑖
Scaling plan Scalingt = {VMv → Nv}
Scheduling plan Schedulet = { 𝑗𝐽
𝑆𝑖 →VMv}
Goal Min( 𝑗𝑜𝑏𝑡𝑢𝑟𝑛𝑎𝑟𝑜𝑢𝑛𝑑 × 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦/𝑗𝑜𝑏 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦𝑗𝑜𝑏 )
&& Cost(app) <= B (budget, dollars/hour)
Target - The service provider has a limited budget and aims to maximize the application performance.
Solution idea – a monitor-control loop that makes scaling and scheduling decisions based on updated workload and VM information
7
Scheduling-first Idea – allocate application budget to individual jobs based on priorities
and schedule tasks within job budget
Step 1 – Distribute budget: 𝐵𝑗 = 𝐵 × 𝑝𝑗/ 𝑝𝑗𝑗
Step 2 – Schedule tasks for each job, schedule as many tasks as possible on their fast machines
Step 3 – Consolidate budget return job budget to the application
the application uses the remaining budget collected from individual jobs to schedule high priority tasks
Step 4 – Acquire instance acquire instances and execute tasks based on the determined schedule plans
Minimize job turnaround time within budget constraints
Solution: scheduling-first
8
Scheduling-first
Step 1 – Distribute budget: 𝐵𝑗 = 𝐵 × 𝑝𝑗/ 𝑝𝑗𝑗
Minimize job turnaround time within budget constraints Solution: scheduling-first
Step 2 – Schedule tasks
e.g. Budget(B) = $1/h;
Large(L) = $0.5/h; Medium(M) = $0.3/h;
Small(S) = $0.1/h;
Step 1: job1 and job2 have the same priority,
job1 → $0.5/h, job2 → $0.5/h
Step 2: job1(T1) → $0.5(L);
job2(T5) → $0.5(L);
Step 3: job1(T2+T3) → $0.5(S+M);
job2(T6) → $0.5(L);
job1 returns $0.1 to system; job2(T7) → $0.1(S);
Step 4
acquire instances when necessary
Step 3 – Consolidate budget
Step 4 – Acquire instance
9
Minimize job turnaround time within budget constraints Solution: scaling-first
Scaling-first Idea – determine the computing capacity by looking at the overall
workload and schedule tasks based on priority
Step 1 – determine the VMs assume tasks run on their fastest machines and calculate the cost Cfast for the next
hour
acquire VMs proportionally based on Budget/Cfast
Step 2 – consolidate budget use the remaining the budget to purchase new machines.
Step 3 – schedule tasks schedule tasks based on task priority
10
Minimize job turnaround time within budget constraints Solution: scaling-first
Scaling-first Step 1 – determine the VMs
Step 2 – consolidate budget
Step 3 – schedule tasks
Step 1: assume tasks run on fastest machines and calculate Cfast and acquire VMs proportionally based on B/Cfast,
Step 2: the remaining $0.5 can be used to purchase 1 L machine
Step 3: tasks are scheduled based on their priorities
11
Instance consolidation
Schedule tasks on different VM types to save partial instance hour cost
Budget allocation schemes
Evenly distributed – e.g. daily x/365, hourly x/8760
Based on workload – e.g. high on busy times, low on non-busy times
Workload prediction – $/hour → $/job
Minimize job turnaround time within budget constraints Other considerations
Workload patterns
Application models
12
Time 72 hours
Task execution Randomly generated
VM lag 5 min
Minimize job turnaround time within budget constraints
Evaluation – experiment setup
Baseline Standard
VM Type Price
Micro $0.02/hour
Standard $0.080/hour
High-CPU $0.66/hour
High-Memory $0.45/hour
Extra-Large $1.3/hour
13
Minimize job turnaround time within budget constraints Evaluation – job turnaround time
above – weighted average job turnaround time for the hybrid application and cycle workload pattern
Scheduling-first and scaling-first can save 9.8%- 45.2% cost compared to the standard machine choice.
Scaling-first works better under small budget ranges while scheduling-first works better under large budget ranges.
14
Minimize job turnaround time within budget constraints Evaluation – sensitivity to inaccurate parameters
left – scheduling-first’s sensitivity to inaccurate parameters (Hybrid application + Cycle workload pattern)
right – scaling-first’s sensitivity to inaccurate parameters (Hybrid application + Cycle workload pattern)
When the estimation error is within ±20%, the job turnaround time shows -10.2% – 16.7% difference.
When the task estimation error reaches ±60%, the performance of both algorithms shows significant degradation (more than ±25% difference)
15
Minimize job turnaround time within budget constraints Evaluation – instance consolidation
left – job turnaround time / resource utilization of scheduling-first’s instance consolidation (Hybrid application + Cycle workload pattern)
right – job turnaround time / resource utilization of scaling-first’s instance consolidation (Hybrid application + Cycle workload pattern)
When budget is low or high, the improvement is small. When the budget is in between, the improvement is more significant (e.g. utilization rate improves 2.2% to 19.9% when the budget is between $15/hour and $25/hour).
Scaling-first benefits more from instance consolidation process than scheduling-first
16
Conclusions
choose appropriate VM types based on the workload.
Scheduling-first and scaling-first are trade-offs between the task execution time and waiting time.
As long as the VM performance can be correctly ranked, the proposed mechanisms have good tolerance to inaccurate parameters.
Instance consolidation is an efficient strategy to save partial instance hours and improve resource utilization.
Future work
Other billing models – reserved instances, spot instances, $/min
Maximize application performance within budget constraints for data-intensive applications
Hybrid and federate cloud environments
Develop evaluation benchmarks and simulation platforms
Conclusion and future work
17
Thanks!