61
Reliability Modeling and Analysis of Energy-Efficient Storage Systems Shu Yin Advisor: Dr. Xiao Qin Committee Members: Dr. Sanjeev Baskiyar Dr. Alvin Lim University Reader: Dr. Shiwen Mao

Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Embed Size (px)

DESCRIPTION

With the rapid growth of the production and storage of large scale data sets it is important to investigate methods to drive the cost of storage systems down. Manyenergy conservation techniques have been proposed to achieve high energy efficiencyin disk systems. Unfortunately, growing evidence shows that energy-saving schemes in disk drives usually have negative impacts on storage systems. Existing reliability models are inadequate to estimate reliability of parallel disk systems equipped with energy conservation techniques. To solve this problem, we firstly propose a mathematical model - called MINT - to evaluate the reliability of a parallel disk system where energy-saving mechanisms are implemented. In this dissertation, MINT is focused on modeling the reliability impacts of two well-known energy-saving techniques - the Popular Disk Concentration technique (PDC) and the Massive Array of Idle Disks (MAID). Different from MAID and PDC which store a complete file on the same disk, the Redundancy Array of Inexpensive Disks (RAID) stripes file into several parts and stores them on different disks to ensure higher parallelism, hence higher I/O performance. However, RAID faces more challenges on energy efficiencyand reliability issues. In order to evaluate the reliability of power-aware RAID, wethen develop a Weibull-based model–MREED. In this dissertation, we use MREED to model the reliability impacts of a famous energy efficiency storage mechanism– the Power-Aware RAID (PARAID). Thirdly, we focus on validation of two models–MINT and MREED. It is challenging to validate the accuracy of reliability models, since we are unable to watch certain energy-efficiency systems for a couple of decades due to its time consuming and experimental costs. We introduce validated storage systemsimulator–DiskSim–to determine if our model and DiskSim agree with one another. In our validation process, we compare a file access trace in a real-world file system. Last part of of this dissertation focuses on improvement of energy-efficient parallel storage systems. We propose a strategy–Disk Swapping–to improve disk reliability by alternating disks storing data that is frequently accessed with disks holding less accessed data. In this part, we focus on studying reliability improvement of PDC and MAID. At last, we further improve disk reliability by introducing multiple diskswapping strategy.

Citation preview

Page 1: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Shu Yin

Advisor: Dr. Xiao QinCommittee Members: Dr. Sanjeev Baskiyar

Dr. Alvin LimUniversity Reader: Dr. Shiwen Mao

Page 2: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

2

Presentation Outline

MotivationMINT ModelMREED ModelModels ValidationReliability ImprovementConclusion and Future Work

2

Page 3: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

3

Motivation

Data Intensive Applications

Stream Multimedia Bioinformatic

3D Graphic

BioinformaticBioinformatic

Weather Forecast

Bioinformatic

Page 4: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

4

Data Intensive Computing Application

Cluster System

Page 5: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

5

Problem: Energy Dissipation

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 6: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

6

Problem:Energy Dissipation(cont.)

Using 2010 Historical Trends Scenario

Data Centers consume 110 Billion kWh per Year;

Assume Average Commercial End User Is Charged ¢9.46 per kWh

Disk System Can Account for 27% of the Computing Energy Cost of Data Centers.

Disk Syste

m27%

Other73%

Disk System May Have An Electrical Cost of

2.8 Billion Dollars!

Page 7: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

7

Existing Energy Conservation Techniques

Software-Directed Power ManagementDynamic Power ManagementRedundancy TechniqueMulti- speed Setting

7

How Reliable Are They?

Page 8: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

8

Contradictory of Energy Efficiency and Reliability

Example: Disk Spin Up and Down

Energy Efficiency

Reliability

Page 9: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

9

Presentation Outline

Motivation

MINT ModelMREED ModelModels ValidationReliability ImprovementConclusion and Future Work

9

Page 10: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

10

MINT(MATHEMATICAL RELIABILITY MODELS FOR ENERGY-EFFICIENT PARALLEL DISK SYSTEMS)

Energy Conservation Techniques

Single Disk Reliability Model

System-Level Reliability Model

Page 11: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

11

Frequency Utilization

Disk Age Temperature

Reliability of Single Disk

Single Disk Reliability Model

MINT(Single Disk)

Page 12: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

12

MINT(Single Disk)

R=α*BaseValue[1]*TemperatureFactor+β*FrequencyAdder[2]

α and β are two coefficients to R

Assumption: α = β = 1 in our research

[1] E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. Proc. USENIX Conf. File and Storage Tech., February2007.

[2] IDEMA Standards. Specification of hard disk drive reliability.

Page 13: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

13

MINT(Single Disk)

R=α*BaseValue*TemperatureFactor+β*FrequencyAdder

Utilization Impact on AFR

Temperature Impact on Temperature Factor

Transition Frequency Impact on Frequency Adder

Page 14: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

14

MINT(Single Disk)

R=α*BaseValue*TemperatureFactor+β*FrequencyAdder

Single Disk Reliability

Frequency=250/Month, T=40°C

Frequency=350/Month, T=35°C

Frequency=250/Month, T=35°C

Base Value from Google Report[3]

[3] E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. Proc. USENIX Conf. File and Storage Tech., February 2007.

Frequency=350/Month, T=40°C

Page 15: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

15

MINT(Energy Conservation Techniques- PDC)

- hot data

- cold dataPopular Date Concentration (PDC)[3]

System Structure

[3] E. Pinheiro and R. Bianchini. Energy conservation techniques for disk array-based servers. Int’l Conf. on Supercomputing, pages 68–78, June 2004.

Page 16: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

16

MINT(Energy Conservation Techniques- PDC)

More Popular Disk Less Popular Disk

Access Rate<MIN(Access Rate)

Access Rate<MIN(Access Rate)

Access Rate>MAX(Access Rate)

Access Rate>MAX(Access Rate)

- hot data

- cold data

Page 17: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

17

MINT(Energy Conservation Techniques- PDC)

- hot data

- cold data

(Optimal Result for Certain Time Phases)

Popular Date Concentration (PDC)[3]

System Structure

Page 18: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

18

MINT(Energy Conservation Techniques- MAID)

- hot data

- cold dataMassive Array of Idle Disks (MAID)[4]

System Structure

[4] Dennis Colarelli and Dirk Grunwald. Massive arrays of idle disks for storage archives. Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Page 19: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

19

- hot data

- cold dataMassive Array of Idle Disks (MAID)[4]

System Structure

[4] Dennis Colarelli and Dirk Grunwald. Massive arrays of idle disks for storage archives. Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Access Rate>MAX(Access Rate)

Cache Disk Data Disk

MINT(Energy Conservation Techniques- MAID)

Page 20: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

20

MINT(System-Level)

Energy Conservation Techniques

Single Disk Reliability Model

System-Level Reliability Model

Reliability of Disk 1

Reliability of Disk n

Frequency Utilization

TemperatureAccess Pattern

Frequency Utilization

Disk Age

Reliability of A Parallel Disk System

Page 21: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

21

Preliminary Results(experimental setting)

Energy-efficiency Scheme

Number of DisksFile Access Rate(No. per month)

File Size(KB)

PDC20 data

(20 in total)0~106 300

MAID-115 data + 5 cache

(20 in total) 0~106 300

MAID-220 data + 5 cache

(25 in total) 0~106 300

Read-only Disks

Page 22: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

22

Preliminary ResultComparison Between PDC and MAID

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

Page 23: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

23

Preliminary ResultComparison Between PDC and MAID

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

- MAID- PDC

Page 24: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

24

MAID under High Access Rate

MAID-1

MAID-2

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

Page 25: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

25

MAID under High Access Rate

AFR Comparison of PDC and MAIDAccess Rate(*104) Impacts on AFR (T=35°C)

MAID-1

MAID-2

MAID-1

MAID-2

MAID-1

MAID-2

Page 26: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

26

MINT(conclusion)

Mathematical Model for Disk Systems MINT Study on PDC and MAIDBut ...

What about RAID?Data Stripping Mechanism

Energy Consumption IssuesReliability Issues

Complexity

Page 27: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

27

Presentation Outline

MotivationMINT Model

MREED ModelModels ValidationReliability ImprovementConclusion and Future Work

27

Page 28: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

28

MREED Model(MATHEMATICAL RELIABILITY MODELS FOR ENERGY-EFFICIENT RAID SYSTEMS)

Access Pattern Temperature

Energy Conservation Techniques

Frequency

Utilization

Annual Failure Rate

Weibull Analysis

Page 29: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

29

Weibull Analysis

A Leading Method for Fitting Life Date Advantages:

AccurateSmall SamplesWidely Used

29

Page 30: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

30

MREED Model(Energy Conservation Techniques- PARAID)

SoftState

RAID

Gears

321

Power-Aware RAID (PA-RAID)[5]

System Structure

[5] Charles Weddle, Mathew Oldhan, Jin Qian, An-I Andy Wang.PARAID: A Gear-Shifting Power-Aware RAID. USENIX FAST 2007.

Page 31: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

31

Reliability Evaluation(Experiment Setup)

Disk Type Seagate ST3146855FC

Capacity 146 GB

Cache Size Sata 16MB

Buffer to Host Transfer Rate 4Gb/s (Max)

Total Number of Disks 5

File Size 100 MB

Number of Files 1000

Synthetic Trace Poisson Distribution

Time Period 24 Hours

Interval Time (Time Phase) 1 Hour

Power on Hour Per Year 8760 Hours

Page 32: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

32

Reliability Evaluation(Disk Utilization Comparison)

Disk Utilization Comparison Between PARAID-0 and RAID-0 at A Low Access Rate (20/hr)

Page 33: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

33

Reliability Evaluation(Disk Utilization Comparison)

Disk Utilization Comparison Between PARAID-0 and RAID-0 at A High Access Rate (80/hr)

Page 34: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

34

Reliability Evaluation(AFR Comparison)

AFR Comparison Between PARAID-0 and RAID-0 at A Low Access Rate (20/hr)

Page 35: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

35

Reliability Evaluation(AFR Comparison)

AF

R

AFR Comparison Between PARAID-0 and RAID-0 at A High Access Rate (80/hr)

Page 36: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

36

Presentation Outline

MotivationMINT ModelMREED Model

Models ValidationReliability ImprovementConclusion and Future Work

36

Page 37: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

37

Model Validation

TechniquesRun the Systems for A Couple of Decades

The Event Validity Validation Techniques[6]

[6] R.G. Sargent, “Verification and Validation of Simulation Models”, in Proceedings of the 37 th conference on Winter Simulation, ser. WSC’05 Winter Simulation Conference, 2005.

Page 38: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

38

Model Validation

ChallengesUnable to Monitor PARAID Running for Years

Sample Size is Small from A Validation Perspective (e.g. 100 Disks for Five Years)

Page 39: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

39

Model Validation(DiskSim[7] Simulation)

[7] S.W.S John, S. Bucy, Jiri Schindler and G.R. Ganger, “The DiskSim Simulation Environment Version 4.0 Reference Manual”, 2008

File To Block Level Converter

Page 40: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

40

Model Validation(DiskSim Simulation)

Diagram of the Storage System Corresponding to the DiskSim RAID-0

Page 41: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

41

Model Validation(Result)

Utilization Comparison Between MREED and DiskSim Simulator

Page 42: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

42

Model Validation(Result)

Gear Shifting Comparison Between MREED and DiskSim Simulator

Page 43: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

43

Presentation Outline

MotivationMINT ModelMREED ModelModels Validation

Reliability ImprovementConclusion and Future Work

43

Page 44: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

44

Recall PDC

- hot data

- cold data

(Optimal Result for Certain Time Phases)

Popular Date Concentration (PDC)System Structure

Page 45: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

45

Problem of PDC

The Most Popular Disk:High AFRNo Replica

Page 46: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

46

Reliability Improvement of PDC

Method of Improving ReliabilityMirroring

Extra Disks for Replication -> More Energy Consumption

Disk SwappingSwap Existing Disks

Page 47: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

47

Disk Swapping SchemePDC

Swap the Most Popular Disk with the Least Popular Disk

Page 48: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

48

Swap the Highest AFR Disk with the Lowest AFR Disk

Disk Swapping SchemePDC

Page 49: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

49

Swap the Cache Disks with the Data Disks

Disk Swapping SchemeMAID

Page 50: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

50

Preliminary Results(experimental setting)

Energy-efficiency Scheme

Number of DisksFile Access Rate(No. per month)

File Size(KB)

PDC20 data

(20 in total)0~106 300

MAID-115 data + 5 cache

(20 in total) 0~106 300

MAID-220 data + 5 cache

(25 in total) 0~106 300

Read-only Disks

Mean Time to Data Lose (MTTDL)

Swapping Thresholds (2*105, 5*105, 8*105 No./Month)

Single Swapping

Page 51: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

51

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 2*105 No./Month

Comparison of Disk SwapPDC

Page 52: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

52

Comparison of Disk SwapPDC

AFR:Swap2 < Swap1 < No Swap

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 2*105 No./Month

Page 53: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

53

Comparison Between Different Threshold

PDC

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 2*105 No./Month

Page 54: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

54

Comparison Between Different Threshold

PDC

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 5*105 No./Month

Page 55: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

55

Comparison Between Different Threshold

PDC

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR

(T=35°C)Threshold = 8*105 No./Month

Page 56: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

56

AFR Comparison of PDCAccess Rate(*104) Impacts on AFR (T=35°C)

Threshold = 2*105 No./Month, 5*105 No./Month, 8*105 No./Month

Comparison Between Different Threshold

PDC

AFRHigher Threshold -> Lower AFR

Page 57: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

57

Limitations

Read Only Disk Scenario

Data Migration within Certain Time Phases

Simple File Access Patterns

Page 58: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

58

Future Work

Extend the Models to investigate mixed read/write workloads;

Research the trade-offs between reliability and energy- efficiency;

Extend schemes to a real-world based environment;

Develop a multi-swapping mechanism

balancing the utilization & lowering the failure rate;

Evaluate more control groups.

Page 59: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

59

Conclusion

Generic Models coupled with power management optimization policies;

Two reliability models for the three well-known energy-saving schemes -- PDC, MAID and PARAID;

Disk swapping strategies to improve disk reliability for PDC.

Page 60: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Thanks

Page 61: Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Questions?