Power-Saving in Large-Scale Storage Systems with Data Migration

Preview:

Citation preview

Power-Saving in Large-Scale Storage Systems with Data Migration

Koji Hasebe, Tatsuya Niwa, Akiyoshi Sugiki, and Kazuhiko KatoUniversity of Tsukuba, Japan

Background

IT systems consume 1-2% of the total energy in the world.Green IT: A New Industry Shock Wave, Gartner Symp/ITxpo, 2007

In large data centers, storage systems consume <40% of the total power. StorageIO, Greg Sculz

Power-saving in storage systems is a central issue.

Previous Studies

WorkloadLow-power mode

Peak time Off-peak time

In the literature… MAID [Colarelli-Grunwald, '02], PDC [Pinheiro-Bianchini, '04]

DIV [Pinheiro et al., '06], Pergamum [Storer et al., '08]

RIMAC [Yao-Wang, '06], eRAID [Wang-Zhu-Li, '08]

Hibernator [Zhu et al., '05], PARAID [Waddle et al. '07], etc.

Commonly-observed technique:

Previous Studies

Limitations:• Central controller to manage data accesses• Relatively small number of disks (up to several dozen)

Harnik et al. [IPDPS'09] Propose the efficient allocation of replicated data

d1 d2 d3

Previous Studies

Limitations:• Central controller to manage data accesses• Relatively small number of disks (up to several dozen)

Harnik et al. [IPDPS'09] Propose the efficient allocation of replicated data

d1 d2 d3

Previous Studies

Limitations:• Central controller to manage data accesses• Relatively small number of disks (up to several dozen)

Harnik et al. [IPDPS'09] Propose the efficient allocation of replicated data

d1 d2 d3

Low-power mode

Motivation and Objective Apply the skewing technique to large storage systems Explore an efficient technique based on the data migration,

instead of the replication approach

Motivation and Objective Apply the skewing technique to large storage systems Explore an efficient technique based on the data migration,

instead of the replication approach

datadata data data data

Motivation and Objective Apply the skewing technique to large storage systems Explore an efficient technique based on the data migration,

instead of the replication approach

datadata data data data

Low-power mode

Central Idea (1)Underlying System

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Block 1 Block 2 Block 3 Block 4

Parent Children

Parent Child

Assume that 3 physical nodes are required at off-peak time May increase up to four-fold

Central Idea (1)Underlying System

P1 P2 P3

V1

V2

V3

V4

V5

V6

V7

V8

V9

P4 P5 P6 P7 P8 P9 P10 P11 P12

Block 1 Block 2 Block 3 Block 4

V1V2 V3 V4 V5

V6

V7V8V9

Managed by distributed hash table (DHT)

Cf. Chord [Stoica et al. '01]

Central Idea (1)Underlying System

V1

P1

V2

V3

V4

P2

V5

V6

V7

P3

V8

V9

P4 P5 P6 P7 P8 P9 P10 P11 P12

1

4

7

2

5

8

3

6

9

1

2

3

4

5

6

7

8

9

1

4

7

2

5

8

3

6

9

Block 1 Block 2 Block 3 Block 4

Central Idea (2)Migration of Virtual Nodes

V1

P1

V2

V3

V4

P2

V5

V6

V7

P3

V8

V9

P4 P5 P6

1

4

7

2

5

8

3

6

9

Block 1 Block 2

Overloaded

Central Idea (2)Migration of Virtual Nodes

V12/2

P1

V2

V3

V4

P2

V5

V6

V7

P3

V8

V9

P4 P5 P6

1

4

7

2

5

8

3

6

9

Block 1 Block 2

Overloaded

V11/2

Divide V1 into two

V9

V12/2

V11/2

Central Idea (2)Migration of Virtual Nodes

V12/2

P1

V2

V3

V4

P2

V5

V6

V7

P3

V8

V9

P4 P5 P6

4

7

2

5

8

3

6

9

Block 1 Block 2

Overloaded

V11/2

Central Idea (2)Migration of Virtual Nodes

V12/2

P1

V2

V3

V4

P2

V5

V6

V7

P3

V8

V9

P4 P5 P6

4

7

2

5

8

3

6

9

Block 1 Block 2

Overloaded

V11/2

Central Idea (2)Migration of Virtual Nodes

V12/2

P1

V2

V3

V42/2

P2

V5

V6

V7

P3

V8

V9

P4 P5 P6

4

7

2

5

8

3

6

9

Block 1 Block 2

Overloaded

V41/2

V11/2

Central Idea (2)Migration of Virtual Nodes

V12/2

P1

V2

V3

V42/2

P2

V5

V6

P3

V8

V9

P4 P5 P6

7

2

5

8

3

6

9

Block 1 Block 2

Overloaded

V41/2

V11/2V7

2/2V7

1/2

Central Idea (2)Migration of Virtual Nodes

V12/2

P1

V42/2

P2 P3 P4 P5 P6

Block 1 Block 2

V41/2

V11/2V7

2/2

V22/2

V32/2

V52/2

V62/2

V82/2

V92/2 V7

1/2

V51/2

V21/2

V81/2

V61/2

V31/2

V91/2

Central Idea (2)Migration of Virtual Nodes

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Block 1 Block 2 Block 3 Block 4

Parent Children

Parent Child

dd

d

dd

d

dd

dd d d

Power-Saving Algorithms Short-term optimization Extension Reduction

Long-term optimization

Power-Saving Algorithm 1Short-term Optimization (Extension)

Procedure1. Each physical node checks its own workload.2. If the workload exceeds its capacity, then one of the

virtual nodes is split and migrated to its child block.

V12/2

V2

V3

V42/2

V5

V6

V72/2

V8

V9

V11/2

V41/2 (5)

(8)V71/2

(2)

(6)

(9)

(3)

P1 P2 P3 P4 P5 P6

Parent Child

Power-Saving Algorithm 1Short-term Optimization (Extension)

Notes: Reusing the stored data in the previous day enables the

migration by copying the difference. The mapping of virtual nodes effectively skews the workload.

V12/2

V2

V3

V42/2

V5

V6

V72/2

V8

V9

V11/2

V41/2 (5)

(8)V71/2

(2)

(6)

(9)

(3)

P1 P2 P3 P4 P5 P6

Parent Child

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Problem

V1 V42/2

V5

V72/2

V9

V41/2

V71/2

P1 P2 P3 P4 P5 P6

Parent Child

V22/2

V32/2

V62/2

V82/2

V21/2 V3

1/2

V62/2

V82/2

The remaining capacity of physical nodes

The workload of each virtual node = 1

(1) (1) (2)

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Wrong migration

V1 V42/2

V5

V72/2

V9

V41/2

V71/2

P1 P2 P3 P4 P5 P6

Parent Child

V22/2

V32/2

V62/2

V82/2

V21/2 V3

1/2

V62/2

V82/2

(1) (1) (2)

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Wrong migration

V1 V42/2

V5

V72/2

V9

V41/2

V71/2

P1 P2 P3 P4 P5 P6

Parent Child

V22/2

V3 V6

V82/2

V21/2

V82/2

(1) (1) (2)

(0) (0)

Power-Saving Algorithm 2Short-term Optimization (Reduction)

The solution

V1 V42/2

V5

V72/2

V9

V41/2

V71/2

P1 P2 P3 P4 P5 P6

Parent Child

V22/2

V32/2

V62/2

V82/2

V21/2 V3

1/2

V62/2

V82/2

(1) (1) (2)

(0) (0) (0)

Power-Saving Algorithm 2Short-term Optimization (Reduction)

The solution

V1 V4

V5

V7

V9

P1 P2 P3 P4 P5 P6

Parent Child

V2

V32/2

V62/2

V8

V31/2

V62/2

(1) (1) (2)

(0) (0) (0)

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Procedure1. C → P: the information about the workloads for every virtual node2. P lists all possible combinations of a subset of physical nodes s.t. P can absorb

their virtual nodes

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Procedure1. C → P: the information about the workloads for every virtual node2. P lists all possible combinations of a subset of physical nodes s.t. P can absorb

their virtual nodes

P1 {P4, P5}

P2 {P4, P5}, {P5, P6}

P3 {P4, P5}

Candidates

V1 V4

V5

V7

V9

P1 P2 P3

V2

V32/2 V6

V8

(1) (∞) (∞)

V32/2

P4 P5 P6

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Procedure1. C → P: the information about the workloads for every virtual node2. P lists all possible combinations of a subset of physical nodes s.t. P can absorb

their virtual nodes

P1 {P4, P5}, {P4, P6}

P2 {P4, P5}, {P5, P6}

P3 {P4, P5}

Candidates

V1 V4

V5

V7

V9

P1 P2 P3

V22/2

V3 V6

V8

(1) (∞) (∞)

V22/2

P4 P5 P6

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Procedure1. C → P: the information about the workloads for every virtual node2. P lists all possible combinations of a subset of physical nodes s.t. P can absorb

their virtual nodes

3. P → C: the result of Step 24. C calculates the intersection for all possible combinations of the results.

P1 {P4, P5}, {P4, P6}

P2 {P4, P5}, {P5, P6}

P3 {P4, P5}

Candidates {P4, P5}

V1 V4

V5

V7

V9

P1 P2 P3

V2

V32/2 V62/2

V8

(1) (1) (2)

V32/2

P4 P5 P6

V62/2

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Procedure1. C → P: the information about the workloads for every virtual node2. P lists all possible combinations of a subset of physical nodes s.t. P can absorb

their virtual nodes

3. P → C: the result of Step 24. C calculates the intersection for all possible combinations of the results.

P1 {P4, P5}, {P4, P6}

P2 {P4, P5}, {P5, P6}

P3 {P4, P5}

Candidates {P4, P5}, {P5}

V1 V42/2

V5

V7

V9

P1 P2 P3

V2

V32/2 V6

V8

(1) (1) (2)

V32/2

P4 P5 P6

V41/2

Power-Saving Algorithm 2Short-term Optimization (Reduction)

Procedure1. C → P: the information about the workloads for every virtual node2. P lists all possible combinations of a subset of physical nodes s.t. P can absorb

their virtual nodes

3. P → C: the result of Step 24. C calculates the intersection for all possible combinations of the results.

P1 {P4, P5}, {P4, P6}

P2 {P4, P5}, {P5, P6}

P3 {P4, P5}

Candidates {P4, P5}, {P5}, {P4}

Solution

V1 V4

V5

V7

V9

P1 P2 P3

V2

V32/2 V62/2

V8

(1) (1) (2)

V32/2

P4 P5 P6

V62/2

Power-Saving Algorithm 3Long-term Optimization

To maintain effective power-saving, it requires load-balancing in each block.

Example:

V1

V2

V3

V4

V5

V6

V7

V8

V9

(1)

(4) (5)

(8)(7)

(2)

(6)

(9)

(3)

P1 P2 P3 P4 P5 P6

Parent Child

High workload

Power-Saving Algorithm 3Long-term Optimization

To maintain effective power-saving, it requires load-balancing in each block.

Example:

V4

V5

V6

V7

V8

V9

(4) (5)

(8)(7)

(6)

(9)

P1 P2 P3 P4 P5 P6

Parent Child

V12/2

V22/2

V32/2

V11/2

V21/2

V31/2

Low workload

Power-Saving Algorithm 3Long-term Optimization

To maintain effective power-saving, it requires load-balancing in each block.

Example:

V1

V2

V3

V4

V5

V6

V7

V8

V9

(1)

(4) (5)

(8)(7)

(2)

(6)

(9)

(3)

P1 P2 P3 P4 P5 P6

Parent Child

Power-Saving Algorithm 3Long-term Optimization

To maintain effective power-saving, it requires load-balancing in each block.

Example:

V1

V5

V9

V4

V2

V6

V7

V8

V3

(1)

(4) (5)

(8)(7)

(2)

(6)

(9)

(3)

P1 P2 P3 P4 P5 P6

Parent Child

Load is balanced

Purposes Evaluate the efficiency of skewing the workload. Evaluate the validity of long-term optimization.

Simulation environment

39

Evaluation (Simulation)

Number of physical nodes 800

Number of virtual nodes 10,000

Term of simulation 1 day

Migration condition Split:more than 90%Merge:less than 70%

Workload of all virtual nodes Initially at its lowest,increased until middle of the day.Gap was sixfold.

Virtual node groups Gap of the loads is twice.

40

Simulation Results (Average load)

• In the caseWithout long-term optimization:

57-69%With Long-term optimization:

67-74%

Long-term optimization algorithm improves the average load as expected.

Physical nodes run effectively, coping with the daily variation of workload.

Results

Time (hour)

Ave

rage

load

of a

ctiv

e ph

ysic

al n

odes

(%)

Simulation Results (Active nodes)

Long-term optimization saves onAverage:

7-14%Up to:

17-39%

41

Optimization improves the power consumption consistently and continually.

Results

Time (hour)

The

num

ber

of a

ctiv

e ph

ysic

al n

odes

Purposes Verify the efficiency of load intensive at real machine. Verify whether response time becomes below the desired time.

Response time:from sending a request until the data were loaded into memory in the server.

Experiment environment

42

Evaluation (Prototype implementation)

Number of physical nodes 40 •Xeon 3.60GHZ CPU x2•Memory about 2GB•HDD(SCSI) 36GB

Number of Files 60,000 x 1MB (total 60GB)

Term of experiment 1 day

Migration condition Split:over 90%,Merge:under 70%

Workload of all virtual nodes Initially at its lowest,increased until middle of the day.Gap is sixfold.

Virtual node groups Twice between two groups

Amount of each migration 10% of all the data

43

Response Time

• Average response time80msec

• Maximum response time534msec

Our algorithms can keep almost below desired response time.

Results

Time (hour)

Res

pons

e tim

e pe

r re

ques

t (m

s)

44

Average Load

• Overall average load:67% of the capacity

Can also skew the workload effectively as the simulation.

Results

Time (hour)Load

of a

ctiv

e ph

ysic

al n

odes

(%)

Number of Active Physical Nodes

• Migration is done onAverage: 0.14 virtual nodesMaximum: 20 virtual nodes

Our system adjusts the number of physical nodes to the variation of

workloads and reduces power effectively

Time (hour)

The

num

ber

of a

ctiv

e ph

ysic

al n

odes

The

num

ber

of m

igra

tions

Conclusions Power-saving method for large-scale distributed storage

systems. Short/Long-term optimization algorithms for reducing power

consumption.

Performance evaluation Simulation results showed that our method kept the workload

on Average: 67–74%

Prototype implementation results showed that Overall Average load was: 67% It can maintain a preferred response time

Future work Implement replication mechanism to improve reliability.

Improve the long-term optimization algorithm.

Recommended