Transcript
Page 1: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Shu Yin

Advisor: Dr. Xiao QinCommittee Members: Dr. Sanjeev Baskiyar

Dr. Alvin LimUniversity Reader: Dr. Shiwen Mao

Page 2: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

2

Presentation Outline

MotivationMINT ModelMREED ModelModels ValidationReliability ImprovementConclusion and Future Work

2

Page 3: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

3

Motivation

Data Intensive Applications

Stream Multimedia Bioinformatic

3D Graphic

BioinformaticBioinformatic

Weather Forecast

Bioinformatic

Page 4: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

4

Data Intensive Computing Application

Cluster System

Page 5: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

5

Problem: Energy Dissipation

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 6: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

6

Problem:Energy Dissipation(cont.)

Using 2010 Historical Trends Scenario

Data Centers consume 110 Billion kWh per Year;

Assume Average Commercial End User Is Charged ¢9.46 per kWh

Disk System Can Account for 27% of the Computing Energy Cost of Data Centers.

Disk Syste

m27%

Other73%

Disk System May Have An Electrical Cost of

2.8 Billion Dollars!

Page 7: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

7

Existing Energy Conservation Techniques

Software-Directed Power ManagementDynamic Power ManagementRedundancy TechniqueMulti- speed Setting

7

How Reliable Are They?

Page 8: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

8

Contradictory of Energy Efficiency and Reliability

Example: Disk Spin Up and Down

Energy Efficiency

Reliability

Page 9: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

9

Presentation Outline

Motivation

MINT ModelMREED ModelModels ValidationReliability ImprovementConclusion and Future Work

9

Page 10: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

10

MINT(MATHEMATICAL RELIABILITY MODELS FOR ENERGY-EFFICIENT PARALLEL DISK SYSTEMS)

Energy Conservation Techniques

Single Disk Reliability Model

System-Level Reliability Model

Page 11: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

11

Frequency Utilization

Disk Age Temperature

Reliability of Single Disk

Single Disk Reliability Model

MINT(Single Disk)

Page 12: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

12

MINT(Single Disk)

R=α*BaseValue[1]*TemperatureFactor+β*FrequencyAdder[2]

α and β are two coefficients to R

Assumption: α = β = 1 in our research

[1] E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. Proc. USENIX Conf. File and Storage Tech., February2007.

[2] IDEMA Standards. Specification of hard disk drive reliability.

Page 13: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

13

MINT(Single Disk)

R=α*BaseValue*TemperatureFactor+β*FrequencyAdder

Utilization Impact on AFR

Temperature Impact on Temperature Factor

Transition Frequency Impact on Frequency Adder

Page 14: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

14

MINT(Single Disk)

R=α*BaseValue*TemperatureFactor+β*FrequencyAdder

Single Disk Reliability

Frequency=250/Month, T=40°C

Frequency=350/Month, T=35°C

Frequency=250/Month, T=35°C

Base Value from Google Report[3]

[3] E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. Proc. USENIX Conf. File and Storage Tech., February 2007.

Frequency=350/Month, T=40°C

Page 15: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

15

MINT(Energy Conservation Techniques- PDC)

- hot data

- cold dataPopular Date Concentration (PDC)[3]

System Structure

[3] E. Pinheiro and R. Bianchini. Energy conservation techniques for disk array-based servers. Int’l Conf. on Supercomputing, pages 68–78, June 2004.

Page 16: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

16

MINT(Energy Conservation Techniques- PDC)

More Popular Disk Less Popular Disk

Access Rate<MIN(Access Rate)

Access Rate<MIN(Access Rate)

Access Rate>MAX(Access Rate)

Access Rate>MAX(Access Rate)

- hot data

- cold data

Page 17: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

17

MINT(Energy Conservation Techniques- PDC)

- hot data

- cold data

(Optimal Result for Certain Time Phases)

Popular Date Concentration (PDC)[3]

System Structure

Page 18: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

18

MINT(Energy Conservation Techniques- MAID)

- hot data

- cold dataMassive Array of Idle Disks (MAID)[4]

System Structure

[4] Dennis Colarelli and Dirk Grunwald. Massive arrays of idle disks for storage archives. Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Page 19: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

19

- hot data

- cold dataMassive Array of Idle Disks (MAID)[4]

System Structure

[4] Dennis Colarelli and Dirk Grunwald. Massive arrays of idle disks for storage archives. Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Access Rate>MAX(Access Rate)

Cache Disk Data Disk

MINT(Energy Conservation Techniques- MAID)

Page 20: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

20

MINT(System-Level)

Energy Conservation Techniques

Single Disk Reliability Model

System-Level Reliability Model

Reliability of Disk 1

Reliability of Disk n

Frequency Utilization

TemperatureAccess Pattern

Frequency Utilization

Disk Age

Reliability of A Parallel Disk System

Page 21: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

21

Preliminary Results(experimental setting)

Energy-efficiency Scheme

Number of DisksFile Access Rate(No. per month)

File Size(KB)

PDC20 data

(20 in total)0~106 300

MAID-115 data + 5 cache

(20 in total) 0~106 300

MAID-220 data + 5 cache

(25 in total) 0~106 300

Read-only Disks

Page 22: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

22

Preliminary ResultComparison Between PDC and MAID

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

Page 23: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

23

Preliminary ResultComparison Between PDC and MAID

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

- MAID- PDC

Page 24: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

24

MAID under High Access Rate

MAID-1

MAID-2

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

Page 25: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

25

MAID under High Access Rate

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

MAID-1

MAID-2

MAID-1

MAID-2

MAID-1

MAID-2

Page 26: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

26

MINT(conclusion)

Mathematical Model for Disk Systems MINT Study on PDC and MAIDBut ...

What about RAID?Data Stripping Mechanism

Energy Consumption IssuesReliability Issues

Complexity

Page 27: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

27

Presentation Outline

MotivationMINT Model

MREED ModelModels ValidationReliability ImprovementConclusion and Future Work

27

Page 28: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

28

MREED Model(MATHEMATICAL RELIABILITY MODELS FOR ENERGY-EFFICIENT RAID SYSTEMS)

Access Pattern Temperature

Energy Conservation Techniques

Frequency

Utilization

Annual Failure Rate

Weibull Analysis

Page 29: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

29

Weibull Analysis

A Leading Method for Fitting Life Date Advantages:

AccurateSmall SamplesWidely Used

29

Page 30: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

30

MREED Model(Energy Conservation Techniques- PARAID)

SoftState

RAID

Gears

321

Power-Aware RAID (PA-RAID)[5]

System Structure

[5] Charles Weddle, Mathew Oldhan, Jin Qian, An-I Andy Wang.PARAID: A Gear-Shifting Power-Aware RAID. USENIX FAST 2007.

Page 31: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

31

Reliability Evaluation(Experiment Setup)

Disk Type Seagate ST3146855FC

Capacity 146 GB

Cache Size Sata 16MB

Buffer to Host Transfer Rate 4Gb/s (Max)

Total Number of Disks 5

File Size 100 MB

Number of Files 1000

Synthetic Trace Poisson Distribution

Time Period 24 Hours

Interval Time (Time Phase) 1 Hour

Power on Hour Per Year 8760 Hours

Page 32: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

32

Reliability Evaluation(Disk Utilization Comparison)

Disk Utilization Comparison Between PARAID-0 and RAID-0 at A Low Access Rate (20/hr)

Page 33: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

33

Reliability Evaluation(Disk Utilization Comparison)

Disk Utilization Comparison Between PARAID-0 and RAID-0 at A High Access Rate (80/hr)

Page 34: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

34

Reliability Evaluation(AFR Comparison)

AFR Comparison Between PARAID-0 and RAID-0 at A Low Access Rate (20/hr)

Page 35: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

35

Reliability Evaluation(AFR Comparison)

AF

R

AFR Comparison Between PARAID-0 and RAID-0 at A High Access Rate (80/hr)

Page 36: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

36

Presentation Outline

MotivationMINT ModelMREED Model

Models ValidationReliability ImprovementConclusion and Future Work

36

Page 37: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

37

Model Validation

TechniquesRun the Systems for A Couple of Decades

The Event Validity Validation Techniques[6]

[6] R.G. Sargent, “Verification and Validation of Simulation Models”, in Proceedings of the 37 th conference on Winter Simulation, ser. WSC’05 Winter Simulation Conference, 2005.

Page 38: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

38

Model Validation

ChallengesUnable to Monitor PARAID Running for Years

Sample Size is Small from A Validation Perspective (e.g. 100 Disks for Five Years)

Page 39: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

39

Model Validation(DiskSim[7] Simulation)

[7] S.W.S John, S. Bucy, Jiri Schindler and G.R. Ganger, “The DiskSim Simulation Environment Version 4.0 Reference Manual”, 2008

File To Block Level Converter

Page 40: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

40

Model Validation(DiskSim Simulation)

Diagram of the Storage System Corresponding to the DiskSim RAID-0

Page 41: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

41

Model Validation(Result)

Utilization Comparison Between MREED and DiskSim Simulator

Page 42: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

42

Model Validation(Result)

Gear Shifting Comparison Between MREED and DiskSim Simulator

Page 43: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

43

Presentation Outline

MotivationMINT ModelMREED ModelModels Validation

Reliability ImprovementConclusion and Future Work

43

Page 44: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

44

Recall PDC

- hot data

- cold data

(Optimal Result for Certain Time Phases)

Popular Date Concentration (PDC)System Structure

Page 45: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

45

Problem of PDC

The Most Popular Disk:High AFRNo Replica

Page 46: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

46

Reliability Improvement of PDC

Method of Improving ReliabilityMirroring

Extra Disks for Replication -> More Energy Consumption

Disk SwappingSwap Existing Disks

Page 47: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

47

Disk Swapping SchemePDC

Swap the Most Popular Disk with the Least Popular Disk

Page 48: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

48

Swap the Highest AFR Disk with the Lowest AFR Disk

Disk Swapping SchemePDC

Page 49: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

49

Swap the Cache Disks with the Data Disks

Disk Swapping SchemeMAID

Page 50: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

50

Preliminary Results(experimental setting)

Energy-efficiency Scheme

Number of DisksFile Access Rate(No. per month)

File Size(KB)

PDC20 data

(20 in total)0~106 300

MAID-115 data + 5 cache

(20 in total) 0~106 300

MAID-220 data + 5 cache

(25 in total) 0~106 300

Read-only Disks

Mean Time to Data Lose (MTTDL)

Swapping Thresholds (2*105, 5*105, 8*105 No./Month)

Single Swapping

Page 51: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

51

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 2*105 No./Month

Comparison of Disk SwapPDC

Page 52: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

52

Comparison of Disk SwapPDC

AFR:Swap2 < Swap1 < No Swap

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 2*105 No./Month

Page 53: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

53

Comparison Between Different Threshold

PDC

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 2*105 No./Month

Page 54: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

54

Comparison Between Different Threshold

PDC

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 5*105 No./Month

Page 55: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

55

Comparison Between Different Threshold

PDC

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 8*105 No./Month

Page 56: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

56

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR (T=35°C)

Threshold = 2*105 No./Month, 5*105 No./Month, 8*105 No./Month

Comparison Between Different Threshold

PDC

AFRHigher Threshold -> Lower AFR

Page 57: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

57

Limitations

Read Only Disk Scenario

Data Migration within Certain Time Phases

Simple File Access Patterns

Page 58: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

58

Future Work

Extend the Models to investigate mixed read/write workloads;

Research the trade-offs between reliability and energy- efficiency;

Extend schemes to a real-world based environment;

Develop a multi-swapping mechanism

balancing the utilization & lowering the failure rate;

Evaluate more control groups.

Page 59: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

59

Conclusion

Generic Models coupled with power management optimization policies;

Two reliability models for the three well-known energy-saving schemes -- PDC, MAID and PARAID;

Disk swapping strategies to improve disk reliability for PDC.

Page 60: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Thanks

Page 61: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Questions?


Recommended