36
Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical and Computer Engineering, University of Florida 1

Automated Control for Elastic Storage

  • Upload
    jenski

  • View
    38

  • Download
    2

Embed Size (px)

DESCRIPTION

Automated Control for Elastic Storage. Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University. Presented by: Yonggang Liu Department of Electrical and Computer Engineering, University of Florida. Outline. Introduction System overview - PowerPoint PPT Presentation

Citation preview

Page 1: Automated Control for Elastic Storage

1

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University

AUTOMATED CONTROL FOR ELASTIC STORAGE

Presented by: Yonggang LiuDepartment of Electrical and Computer Engineering,

University of Florida

Page 2: Automated Control for Elastic Storage

2

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 3: Automated Control for Elastic Storage

3

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 4: Automated Control for Elastic Storage

4

Introduction -Popularity of highly dynamic workloadsMany web-based services (especially Web

2.0) often experience rapid load surges and drops.One Facebook application saw an increase

from 25,000 to 250,000 users in 3 days, with up to 20,000 new users signing up per hour during peak times.

Elastic services offered by cloud computing becomes one solutionGrow/shrink service capacity dynamically as

the load changes.

Page 5: Automated Control for Elastic Storage

5

Introduction - Elasticity in cloud computing

Elasticity is one of cloud computing’s greatest features – Systems acquire and release resources in response to users’ dynamic workloads; users only pay for what they need.

SLAsWeb Services

Virtualization

Picture provided by Dr. Andy Li from UF

Page 6: Automated Control for Elastic Storage

6

Introduction -Topic of this paperThis paper addresses the challenges

associated with controlling the elastic storage in a data-intensive service, in cloud computing environment.

Intuitively, it does:If performance can not meet the Service

Level Objective (SLO) → grow storage capacity

If performance meets SLO, and system utilization is low → shrink storage capacity

Page 7: Automated Control for Elastic Storage

7

Introduction -Topic of this paperIn this paper, Hadoop Distributed File System

(HDFS) is employed as the storage system.When the controller increases the storage size:

Create new storage instancesMove storage data to the new instances (data

rebalancing)When the controller reduces the storage size:

Remove a certain number of storage instancesSome storage data on existing nodes get replicated

because the replica number is lower than the replica degree N. This is automatically done by DHFS.

Page 8: Automated Control for Elastic Storage

8

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 9: Automated Control for Elastic Storage

9

System overviewWhat is the big picture

Controller

Cloud Provider (Amazon EC2)

Web Tier (Apache server)Application Tier (Facebook

core)Storage Tier (Hadoop DFS)

Elastic Service

Clients

Sensor

Actuator

Gathermeasurements

Manage instances

Sensors highersystem load

Create more storageinstances, and rebalance data

Suppose we are hosting the Facebookserver on amazon EC2 instances, withthe proposed control techniques.

Sensors lowersystem load

Remove somestorage nodes

Page 10: Automated Control for Elastic Storage

10

System overviewChallenges in elastic storage controlControlling elastic storage involves many

challenges:Data Rebalancing. The newly added storage

nodes will not be effective until data rebalancing is done.

Interference to Guest Service. Data rebalancing also consumes the system resources.

Actuator Delay. The controller must consider the delay of the control operations, otherwise it may response too late or become unstable.

Page 11: Automated Control for Elastic Storage

11

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 12: Automated Control for Elastic Storage

12

System architectureThe controller is composed by:

Horizontal Scale Controller (HSC) - responsible for growing and shrinking the number of storage nodes.

Data Rebalance Controller (DRC) - controlling the data transfers to rebalance the storage tier after it grows or shrinks.

State machine - coordinating the actions of the HSC and the DRC.

Page 13: Automated Control for Elastic Storage

13

System architecture -Horizontal Scale Controller (HSC)Actuator: The HSC uses cloud APIs to

change the number of active server instances.

Sensor: The paper uses CPU utilization on the storage nodes as the sensor feedback metricIt is easy to measure, and strongly correlated

to overall response time of the Cloudstone benchmark when the bottleneck is on the storage tier.

Page 14: Automated Control for Elastic Storage

14

Modeling methodology -System model without controllerThe system without a controller can be described as this

graph:

U(z): Input to the system, the number of storage instances.D(z): The effect of client workload variance on the value of

storage instance number.V(z): The effective number of storage instancesY(z): The Output of the system, the CPU utilization on

storage nodes.G(z): The transfer function of the storage system.

G(z)U(z) Y(z)++ V(z)

D(z)

Page 15: Automated Control for Elastic Storage

15

Modeling methodology -Controller - Integral controlControl Policy (K): Integral control

- the integral gain parameter. - the current sensor measurement. - the desired reference sensor

measurement, which is 20% CPU utilization for 3 second average response time.

G(z)R(z)

K(z)+-E(z) U(z) Y(z)

++ V(z)D(z)

Page 16: Automated Control for Elastic Storage

16

Modeling methodology -Controller - discrete control functionsBecause discrete actuators (instance

number) are used in the system, the paper generates the following discrete control functions:

and are the higher and lower thresholds for CPU utilization .

Only when (under-provisioned) or (over-provisioned), , i.e., the controller adds/removes the storage instances.

Page 17: Automated Control for Elastic Storage

17

Modeling methodology -Proportional thresholdingHow to set and ?

They can’t be static, because for a cluster of size N, adding/removing a node affects 1/N of the total capacity.

“Proportional thresholding” mechanism:Set , and vary to vary the range.Suppose “workload” is the per-node

workload and we have N instances. We get

Suppose , we get

Page 18: Automated Control for Elastic Storage

18

System architecture -Data Rebalance Controller (DRC)The DRC rebalances the layout of data in the system

after the number of storage nodes grows or shrinks.Rebalancing is a cause of actuator delay and

interference.Tuning knob of HDFS rebalancer:

Bandwidth b allocated to the rebalancer.Select b to control the tradeoff between lag and

interference.Big b - fast rebalance, serious impacts on normal

service.Small b - slow rebalance, not very disruptive to normal

service.

Page 19: Automated Control for Elastic Storage

19

Modeling Methodology -Modeling the impacts of bThe paper employed multi-variate

regression to decide b:The time to completion of rebalancing (Time)

as a function of the bandwidth throttle (b) and size of data to be moved(s): .

The impact of rebalancing on service response time (Impact) as a function of the bandwidth throttle (b) and per-node workload (l): .

Values of s and l are measured by sensors in DRC.

Page 20: Automated Control for Elastic Storage

20

Modeling Methodology -Balancing between lag and interferenceThe Data Rebalance Controller poses the

choice of b as a cost-based optimization problem:

The ratio of can be specified by the guest

based on the relative preference towards Time over Impact.

Page 21: Automated Control for Elastic Storage

21

System architecture -State machineRecall that:

Horizontal Scale Controller (HSC) is used to increase/shrink the number of storage nodes

Data Rebalance Controller (DRC) is used to rebalance the storage after the changes in storage node size

They have mutual dependencies:After HSC adds a new storage node, the system cannot

obtain full service until DRC completes rebalancing.When one component is taking actions, the noise will be

introduced to the sensor measurements of the other one.To preserve stability during adjustments, a state

machine is employed to coordinate HSC and DRC to manage their mutual dependencies.

Page 22: Automated Control for Elastic Storage

22

System architecture -State machineThe following diagram shows the internal

state machine of the elasticity controller in the storage tier.

Horizontal Scale State

Rebalance state

Init

Storage tier configuration changed? No

Storage tier configuration

changed? Yes

Rebalancing done? Yes

Rebalancingdone? No

Elasticity Controller

Storage Tier

Page 23: Automated Control for Elastic Storage

23

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 24: Automated Control for Elastic Storage

24

Evaluation -Experimental TestbedThe paper employs CloudStone to run with GlassFish

as the front-end application server tier.CloudStone: a flexible Web 2.0 benchmark generatorGlassFish: an open source application server project

HDFS is used for the storageHDFS is modified to expose the rebalancer’s bandwidth

throttle b as an actuator to the external controller.The paper implements a local ORCA cluster as the

cloud infrastructure providerORCA: A resource control framework that provides a

resource leasing service; guests can lease resources from a substrate resource provider, such as a cloud provider

Page 25: Automated Control for Elastic Storage

25

Evaluation -Experimental TestbedThe experimental service cluster:

A group of servers running on a local network.To fully explore the effects of the storage tier:

Other tiers are statically over-provisioned.The storage tier nodes:

Dynamically allocated virtual machine instancesThey all have fixed resource configurations:

30 MB disk space; 512 MB RAM; single disk arm; 2.8 GHz CPU.

HDFS is preloaded with at least 36 GB data.

Page 26: Automated Control for Elastic Storage

26

Evaluation - Controller EffectivenessStatic and dynamic resource previsioning

to load burst of 10 times at .

a1. CPU utilization - static

b1. Response time - static

a2. CPU utilization - dynamic

b2. Response time - dynamic

Target response time:3 seconds.Target CPU utilization:20%.

See from the figures:1. Dynamic provisioningis able to adapt to the load burst.2. Instance creation anddata rebalancing hascost and delay on effect.

Page 27: Automated Control for Elastic Storage

27

Evaluation - Controller EffectivenessStatic and dynamic resource previsioning

to small load increase of 35% at .

a1. CPU utilization - static

b1. Response time - static

a2. CPU utilization - dynamic

b2. Response time - dynamic

Target response time:3 seconds.Target CPU utilization:20%.

See from the figures:1. Dynamic provisioningis alert enough to adapt tothe small load increase.2. The cost and delay ofnode creation/rebalancingare smaller than the prev.

Page 28: Automated Control for Elastic Storage

28

Evaluation - Resource EfficiencyStatic and dynamic resource previsioning

to load decrease of 30% at .

a1. CPU utilization - static

b1. Response time - static

a2. CPU utilization - dynamic

b2. Response time - dynamic

Target response time:3 seconds.Target CPU utilization:20%.

See from the figures:1. Shrinking the storage size has much lower cost/delay than increasing it.2. During resizing process,There are almost no SLOviolations.

Page 29: Automated Control for Elastic Storage

29

Evaluation - Comparison of Rebalance PoliciesRecall that:

, monotone decreasing function of b., monotone increasing function of b.And we want to optimize for the cost

function:

Page 30: Automated Control for Elastic Storage

30

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 31: Automated Control for Elastic Storage

31

Contribution and related workThis paper is the first to address the problem of automated

control for elastic storage in cloud computing.SCADS is a related work dealing with dynamically scaling

a storage system. It uses machine learning to predict resource requirements.

Padala et al. proposed a decoupled architecture (between guest and cloud provider) for cloud computing. They did not consider the actuator constraints.

Aqueduct uses a feedback controller to throttle the rebalancing bandwidth usage to ensure the SLOs will not be violated. The rebalancing may be able to use very little bandwidth.

Page 32: Automated Control for Elastic Storage

32

OutlineIntroductionSystem overviewSystem architecture and modeling

methodologiesEvaluationContribution and related workDiscussions and future work

Page 33: Automated Control for Elastic Storage

33

Discussions and future workThe proposed modeling method is not able

to correctly handle workloads with transient noise, which is common in reality.Adding a filter module solves the problem:

H(z)W(z)

G(z)R(z)

K(z)+-E(z) U(z) Y(z)

++ V(z)D(z)

Page 34: Automated Control for Elastic Storage

34

Discussions and future workThe proposed model sets tight resource

allocation model. A small system load change often triggers adding/removing storage instances, which is very disruptive.Recall the proposed control function:

By setting lower or higher (not exceed ), we prevent the system from changing frequently.

The drawback of this approach: The system will be under-provisioned to some

extent.

Page 35: Automated Control for Elastic Storage

35

Discussions and future workMake the resource configuration of newly

created storage instances tunable.Resizing storage size by adding/removing

storage instances with flexible resource configuration.

Optimizing the system by exploring the capacity and efficiency of individual storage instances, rather than storage instance amount.

This requires investigating the performance of storage nodes under different setups: disk size, CPU frequency, RAM size, etc.

Page 36: Automated Control for Elastic Storage

36

THANK YOU!