Metering Energy Consumption in Data Centres - Michael Rudgyard

Preview:

Citation preview

Commercial in Confidence

Energy Usage Optimisation in the

Data Centre

Michael Rudgyard (CTO)

Concurrent Thinking Ltd

Commercial in Confidence

• A spin-out company of a well-established UK SI

• Technology was developed for High Performance Computing – Management of HPC resources needs to be ‘system-wide’

– Scalability (of both the architecture and the GUI) is paramount

• New company formed in March 2010 – Took on the product IP and existing HPC customer base

– Notable investment from the UK Carbon Trust

• Currently in ‘semi-stealth’ mode; product launch: Nov 2011 – Have developed new features for the Data Centre market

– … that leverage the infrastructure of an existing product

Commercial in Confidence

• The average data centre has a PUE of 1.9 (Kooney, 2010)

• The majority of DCs operate at temperatures at >3-4oC below (old) ASHRAE recommendations (Paterson et al, 2009)

• A 1oC increase in DC temperature equates to 2-4% reduction in energy costs (UK financial institution)

• In a typical DC, 10% of running servers are not in use at all (Green Grid Survey, 2010)

• Average IT utilisation is between 5 & 10% for an un-virtualised DC, rising to 10 & 20% for a fully virtualised DC

Commercial in Confidence

• Traditionally, the focus has been to optimise PUE – For a 2MWatt DC with a PUE of 2, this would imply a maximum energy

saving of 50% (1MWatt)

• But what if (average) IT utilisation was only 10% ? – We could theoretically save 900Kwatt of power for IT

– We could theoretically optimise the cooling overhead to zero

• In this fully optimised DC, we would use only 100KWatts

• An energy saving of 95 %

Commercial in Confidence

• Assume a modern (dual socket,12 core) server is 6x faster than a 3 year- old dual socket, dual core server

• Assume it draws the same energy

• We would require 16% of the number of servers to deliver the same IT load: ie. 16kWatts

• We have saved >99% of our energy bill – (and have potentially re-claimed 84% of the DC floor space !)

• ‘Sweating’ the assets may not be so smart after all !

Commercial in Confidence

• A: It is part of the answer

• Typically human behaviour is: – A customer replaces a 3 year old (then state-of-the-art) server with a

new state-of-the-art server

– He puts 12 VMs on his new (6x faster) 12-core server rather than the single OS instance on his old 4-core server

– His IT efficiency goes from 10% to 20%

• This demonstrates the need to accurately spec new equipment based on real application and user requirements

• This is also driving a new market for more, less power-hungry, less powerful servers in the DC

Commercial in Confidence

• With few exceptions, the most successful methodology for improving energy conservation across all sectors is:

– Step 1: Identify who/what is responsible for significant energy waste

– Step 2: Drive behaviour to ‘encourage’ change

• What is the implication for the Data Centre ?

• We need to monitor and report IT Usage Effectiveness metrics by customer, department or end-user

– Who or what applications/service are the worst offenders ?

– Management can use data to drive better practice (charge-back ?)

– Help guide virtualisation strategy

Commercial in Confidence

• The potential for savings comes from multiple sources

Start with a PUE of 2

IT Equipment

Cooling and power overhead

Commercial in Confidence

• The potential for savings comes from multiple sources: – Optimised environmental management to improve PUE (& ITUE)

Perhaps reduce PUE from 2.0 to 1.8 ?

IT Equipment

Cooling and Power Overhead

Saving

Commercial in Confidence

• The potential for savings comes from multiple sources: – Optimised environmental management to improve PUE (& ITUE)

– Identification of unused, under-used, inefficient or over-spec’ed IT equipment

Perhaps 5 % of servers ?

IT Equipment

Cooling and Power Overhead

Saving

Commercial in Confidence

• The potential for savings comes from multiple sources: – Optimised environmental management to improve PUE (& ITUE)

– Identification of unused, under-used, inefficient or over-spec’ed IT equipment

– Using active power management during low utilisation periods

– Identification of poor equipment usage (ITUE) by end-users

Say 10% improvement ?

IT Equipment

Cooling and Power Overhead

Saving

Commercial in Confidence

• The potential for savings comes from multiple sources: – Optimised environmental management to improve PUE (& ITUE)

– Identification of unused, under-used, inefficient or over-spec’ed IT equipment

– Using active power management during low utilisation periods

– Identification of poor equipment usage (ITUE) by end-users

– Size and virtualise

Say 20% improvement ? IT Equipment

Cooling and Power Overhead

Saving

Commercial in Confidence

• The potential for savings comes from multiple sources: – Optimised environmental management to improve PUE (& ITUE)

– Identification of unused, under-used, inefficient or over-spec’ed IT equipment

– Using active power management during low utilisation periods

– Identification of poor equipment usage (ITUE) by end-users

– Size and virtualise

– Replace old servers

Say 10% saving ?

IT Equipment

Cooling and Power Overhead

Saving

Commercial in Confidence

• The potential for savings comes from multiple sources: – Optimised environmental management to improve PUE (& ITUE)

– Identification of unused, under-used, inefficient or over-spec’ed IT equipment

– Using active power management during low utilisation periods

– Identification of poor equipment usage (ITUE) by end-users

– Size and virtualise

– Replace old servers

– Dynamic orchestration of virtual machines based on environmental, power and IT usage constraints

??????

Commercial in Confidence

Concurrent Thinking’s

Products

Commercial in Confidence

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Monitoring and Management

Commercial in Confidence

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Monitoring and Management

o Low cost 5V temperature,

humidity… sensors

o 3rd party SNMP sensors

Commercial in Confidence

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Monitoring and Management

o 3rd Party PDU control

o Server PSUs (PMBus)

o Device association

o Power charge-back

o Scheduled actions

Commercial in Confidence

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Monitoring and Management

o 3rd party SNMP devices

o Modbus (and others..)

via SNMP bridge

Commercial in Confidence

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Monitoring and Management

o IPMI / DCMI support

o Power capping via

Intel Node Manager

o Scheduled actions

Commercial in Confidence

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Management

o OS monitoring

o Script repository

o OS Deployment

o VM migration (TBA)

Commercial in Confidence

Comprehensive Data Centre Management

and Orchestration

Environmental Monitoring

Power Management

Integration with Data

Centre Systems

Server Health Monitoring

OS & VM Monitoring and Management

Optiimise real-time DC

Facilities Efficiency

(PUE)

Optimise combined

Facilities & IT Efficiency

(ITUE)

Power & efficiency

metrics by rack /

users / customer/

application etc,

Identify unused,

under-used,

inefficient or cost

in-effective IT

equipment

Active power

management

during low

utilisation periods

Active

Environmental

Management & VM

Migration

Commercial in Confidence

Pretty Pictures…

Commercial in Confidence

Commercial in Confidence

Commercial in Confidence

Commercial in Confidence

Commercial in Confidence

Commercial in Confidence

Commercial in Confidence

Commercial in Confidence