Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Power consumption and efficiency
of cooling in a Data Center
Satoshi Itoh, Yuetsu Kodama, Toshiyuki Shimizu
and Satoshi Sekiguchi (AIST) ,
Hiroshi Nakamura (The University of Tokyo) ,
Naohiko Mori (NTT Communications Corporation)
1
This research was partially supported by the New Energy and Industrial Technology
Development Organization (NEDO) research project entitled “Research and
Development Project for Green Network/System Technology (Green IT Project)”
Motivation and Purpose
• There are many different ways to save energy in a data center.
– Lower power server
– More efficient power supply and cooling facility
– More efficient server operation
• How much does the improvement contribute to energy
efficiency? → Necessity of green metric
• Existing green metrics: PUE and server green metric
– PUE is too macroscopic
– Pitfalls
2
• Development of model and
metrics for data center and
server system
• In this paper, measure
various temperature and
power consumption using
our testing laboratory and
discuss mainly on cooling
facility and fan
Examples Data center A Data center B
Server
Performance 50GFlops 50GFlops
Performance/Watt
( GFlops / W) 0.2 0. 3
Power facility Alternating current,
no UPS
Direct current
with UPS
Cooling Facility Low airflow Heat remove
by air pressure
Total power / year 30GWh 30GWh
PUE* 1.2 1.8
No fan
DC/DC trans. no UPS Large Fan
PSU with UPS
loss loss
*PUE=Total power consumption / power consumption by IT
Model of Power Consumption for Data Center
3
Pdc = Pit + Ppu + Pcf
Pit = Pns + Pst + Psv
Pit / (Pit + Ppu) = Ep
Pcd = Pit ÷ Ec
Ep : Power efficiency of Power Unit
Ec : Cooling efficiency
Pdc : Total Power Consumption
of Data Center
Pit : IT equipment
Pns : Network switch
Pst : Storage
Psv : Server
Ppu : Power Unit
Pcd : Cooling Device
Power consumption of
IT equipment
Pit
Power Consumption of Datacenter
Pdc
Power loss in
Power Unit
Ppu
Power consumption
for Cooling Facility
Pcf
Ec:Cooling efficiency
COP : Coefficient of
Performance
Ep:Power
efficiency
NW
Pns
Storage
Pst
Server
Psv
Model of Power Consumption for Server
4
Pt : Total Server Power Consumption
= Psv
Pb : mother board (almost const.)
Pd : Disk
Pm : Memory
Pc : CPU
Pf : FAN
Pp : power loss in PSU
CPU
Pc
FAN
Pf
Components
In operation
Memory
Pm
Motherboard
Pb
PSU
Pp
Disk
Pd
CPU
load Mother board
Mem
access
Disk
access
FAN
rpm
PSU
Po / (Po + Pp) = Epsu
Pf = Pf_0 × (rpm)^3
rpm = rpm_min + f(Server Temp.)
Pc = Pc_idle + load * (Pc_max - Pc_idle)
Pm = Pm_no-access + Pm_access
Pd = Pd_no-access + Pd_access
Po Psv
Overview of the testing laboratory
5
Floor layout of testing laboratory
CRAC
#16
CRAC
#15
CRAC
#14
CRAC
#13
#1
#2
#3
#4
#5
#6
#7
Louver (Grill)
600mm×600mm
Rack
Flow panel under
the floor
Booth and vinyl
curtain on the floor
Wall
★ Temperature
measurement points
Photo of CRAC
Flow control panel under the floor
• Separate the space under the floor using cardboards
7
Side view of testing laboratory
CRAC
Wall
Flow panel
under the floor
Booth and vinyl curtain
Floor
Ceiling
★ Temperature
measurement points
Rack
IT equipment
• 1U servers
– IBM x3250 Xeon3040 1.86G, 90 nodes
– NEC Express 5800/i120Rg-1 Xeon 2.33G, 20 nodes
• Storage – IBM System Storage DS3400 Dual FC 42X
– IBM System Storage DS3200 Dual SAS 22X
– HP StorageWorks Modular Smart Array 2012FC
• Blade servers – HP BladeSystem c7000 X2 / BL460c
– Xeon5350 Quad 2.66GHz, 3 nodes
– Xeon5160 Dual 3.00GHz, 13 nodes
– Xeon5160 Dual 3.00GHz 2CPU, 16 nodes
9
Web server Rack #1-4, 6 Blade server Rack #5NW switch (Mgt) 37 37NW switch (Data) 36 36
Web server 35 35Web server 34 34Web server 33 33Web server 32 32Filler panel 31 31
30 3029 2928 2827 2726 2625 2524 2423 2322 22
Filler panel 21 Filler panel 21Web server 20 20Web server 19 19Web server 18 18Web server 17 17Web server 16 16Web server 15 15Web server 14 14Web server 13 13DB server 12 12DB server 11 11
10 109 9
Web server 8 8Web server 7 7Web server 6 6Web server 5 5Web server 4 4Web server 3 3Web server 2 2Web server 1 1
Blade enclosure 2
Power meters
KVM switch
Storage
Blade enclosure 1
Power meter at Power unit and CRAC
• Yokokawa Cramp Power Meter CW120
• Power consumptions per rack/balde chassis/ CRAC
are measured every 2 seconds
10
Power unit CRAC
Power meter for individual equipment
• Ohsaki Electric Watt Checker
MWC-01
• Power consumptions of 128
equipment are measured
every second
11
(unit) Range Accuracy
Voltage(RMS) (V) AC100V±10%、
AC200V±10% ±1%
Current(RMS) (A) 0.00~20.00 ±1%
Power (W) 0~2200(AC100V)
0~4400(AC200V) ±2%
Frequency (Hz) 47.0~63.0 ±2%
power factor 0.00~1.00 ±0.03
Electric Energy (kWh) 0.00~9999 ±2%
Output interval 1 second
Measured data is recorded through USB
network
Temperature sensors
• GRAPHTECmidi LOGGER GL800 with type K thermocouple
• Temperature at 87 points is measured every 1 second
12
LINPACK
• Whole of IT equipment consumes maximally 27kW
13
Rack Server CPU
(CPU=2core)
Freq.
(GHz
)
# of
nodes
Stand
by
(W)
Idle
power
(W)
LINPACK
G
Flops
Power
(W)
GFlops
/W
Rack1,2,3,4 IBM x3250 Xeon 3040 1.86 88 7.4 83 11.7 120 0.098
Rack 6 NEC Express Xeon 5148 X2 2.33 20 15.9 191 31.8 259 0.123
Rack5bdcb1 HP BL460 (a) Xeon 5355 x 1
(CPU=4core)
2.66 3 19.7 115 32.8 207 0.158
HP BL460 (b) Xeon 5160 x 1 3.00 13 15.5 115 19.6 179 0.109
Rack5bdcb2 HP BL460 (c) Xeon 5160 X2 3.00 16 16.4 125 36.4 225 0.162
0
5000
10000
15000
20000
25000
30000
linpack idle
Po
we
r (W
)
rack6(W)
rack4(W)
rack3(W)
rack2(W)
rack1(W)
bdcb2(W)
bdcb1(W)
5.9kW
3.1kW
3.1kW
3.2kW
3.2kW
4.9kW
3.7kW
IBM x3250
NEC Express
HP BL460 (a)
HP BL460 (b)
HP BL460 (c)
per node
per rack/chassis
Power consumption of Blade Fan
• HP BladeSystem c7000 has many sensors
• We measured power consumption and speed of fans with
different kinds of load: idle, LINPACK and SPECpower
• Speed of fan increases linearly with CPU temperature after 57℃
• Power consumption of fan can be represented by constant and
cube of fan speed. Power = 22.1 (rpm/10000)^3 + 8.2
• Fans consume roughly 0.8~1kW which is 16~20% of system
14
power = 22.1(rpm/10000)3 + 8.2
0.0
20.0
40.0
60.0
80.0
100.0
120.0
0 5,000 10,000 15,000
fan speed (rpm)
Po
wer
(W
)
y = 3.9103x - 189.8
0
10
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80
processor temp. (degree)
spp
ed
of v
irtu
al f
an (
%)
Cooling capability of CRAC
• CRAC : GV-15 (2003/08 products)
– Maker: SINKO INDUSTRIES LTD.
– Maximum volume: 14,000 CMH
– Rated power: 7.5kW
• Typical (default) setting
– Temperature of air : 15℃
– Volume of airflow : half (roughly)
• Methods to change capability of CRAC
– Volume of airflow
– Temperature of air
– Number of CRACs
15
Estimation of heat removing
• The heat what the CRAC can remove is estimated by
Quantity of heat (kW)
= specific gravity × airflow × specific heat ×⊿temperature
= 1.3Kg/m3 × 14000CMH × 1.0KJ/Kg·℃ × 5℃
= 84000kJ/h = 23.3kW
• The maximum air flow produced by the CRAC we used is 14000
CMH (Cubic meter per hour).
• ⊿temperature means difference in temperature between entrance and exit of airflow.
• When we run LINPACK on all nodes, temperature of return air
is about 20 degrees and ⊿temperature is 5 degrees.
• Thus one CRAC has a capability to remove heat of 23kW.
• Because the power consumption of IT equipment with
LINPACK is roughly 27kW, one CRAC is not enough to remove
all heat.
16
Volume of airflow
• Modify openness of dumper to change the volume of airflow.
• Because openness of dumper is not accurate, we measured
actual volume of airflow at the air duct using anemometer.
• Power consumption of CRAC
increases linearly to volume of airflow.
17 α
定格の7.5kWにほぼ近い
0
5000
10000
15000
20000
25000
12:3
2:30
12:5
1:38
13:1
0:46
13:2
9:54
13:4
9:02
14:0
8:10
14:2
7:18
14:4
6:26
15:0
5:34
15:2
4:42
15:4
3:50
16:0
2:58
16:2
2:06
16:4
1:14
17:0
0:22
17:1
9:30
17:3
8:38
17:5
7:46
18:1
6:54
18:3
6:02
18:5
5:10
19:1
4:18
19:3
3:26
19:5
2:34
20:1
1:42
50% 100% 25% 10% 50%
Openness (Graduations) of dumper
Graduations
of dumper Speed of
air (m/s) Power (W)
100% 11.52464 7184 50% 6.346562 4622 50% 4.686479 4542 25% 4.022166 3659 10% 2.437067 2107
0
1000
2000
3000
4000
5000
6000
7000
8000
0 2 4 6 8 10 12
Po
we
r co
nsu
mp
tio
n (
W)
Speed of air (m/s)
Po
we
r co
nsu
mp
tion (
W)
Po
we
r co
nsu
mp
tion (
W)
Speed of air (m/s)
Time →
Rated power: 7.5kW
Temperature of air
• Modify mixture ratio of cold water and hot water to change the
temperature of air.
• The volume of airflow is fixed to almost half of full speed.
• Power consumption of CRAC decreases 14W per 1 degree
18
4400
4450
4500
4550
4600
4650
4700
4750
4800
11
:30
:00
11
:45
:32
12
:01
:04
12
:16
:36
12
:32
:08
12
:47
:40
13
:03
:12
13
:18
:44
13
:34
:16
13
:49
:48
14
:05
:20
14
:20
:52
14
:36
:24
14
:51
:56
15
:07
:28
15
:23
:00
15
:38
:32
15
:54
:04
16
:09
:36
16
:25
:08
16
:40
:40
16
:56
:12
17
:11
:44
17
:27
:16
17
:42
:48
17
:58
:20
18
:13
:52
18
:29
:24
18
:44
:56
15℃ 17℃ 19℃ 21℃ 15℃
Po
we
r co
nsu
mp
tion
of C
RA
C (
W)
4500
4550
4600
4650
12 14 16 18 20 22 24
Po
we
r C
on
sum
pti
on
(W
)
Temperature of air (degree Celsius)
Power consumption of CRACS in operation
• The temperature of air is set to 15℃.
• The volume of airflow is fixed to almost half of full speed.
• Change #CRACs from 1 to 4 and 2.
• LINPACK run on all of equipment, but started gradually.
• Because parameters are set manually, power consumptions are
not the same.
19
4000
4200
4400
4600
4800
5000
5200
17000
22000
27000
32000
37000
42000
47000
52000
57000
13
:00
:01
13
:16
:55
13
:33
:49
13
:50
:43
14
:07
:37
14
:24
:31
14
:41
:25
14
:58
:19
15
:15
:13
15
:32
:07
15
:49
:01
16
:05
:55
16
:22
:49
16
:39
:43
16
:56
:37
17
:13
:31
17
:30
:25
17
:47
:19
18
:04
:13
18
:21
:07
18
:38
:01
18
:54
:55
19
:11
:49
19
:28
:43
19
:45
:37
20
:02
:31
20
:19
:25
20
:36
:19
20
:53
:13
21
:10
:07
21
:27
:01
21
:43
:55
AC13(W)
AC14(W)
AC15(W)
AC16(W)
IT power (W) Cooling power (W) • They spread from
4500W to 5000W,
deviation is about 10%.
• The same CRAC (for
example #14) shows
different values, when
number of active
CRACs is different.
← 27kW
Number of CRACs and room temperature
• Change number of CRACs in operation ; 1 to 3
• We measure room temperature (front, inside, rear of rack)
• Difficult to judge sufficiency of cooling capability
20
15
20
25
30
35
40bottom
middle
top
Front Inside Rear
Idle
15
20
25
30
35
40
top
middle
bottom
Idle
Front Inside Rear
15
20
25
30
35
40
top
middle
bottom
Idle
Front Inside Rear
15
20
25
30
35
40
top
middle
bottom
Idle
Front Inside Rear
LINPACK
Front Inside Rear
2
15
20
25
30
35
40
Front Inside Rear
LINPACK 1
15
20
25
30
35
40
LINPACK
Front Inside Rear
3
15
20
25
30
35
40
top middle bottom
Tem
pera
ture(℃)
Number of CRACs and CPU temperature
• Horizontal axis represents height of IBM’s servers (in unit of U)
• Lower position is cooler in the case of 1 CRAC
• Lower ↑, higher ↓ , when number of CRACs increases
• 3CRACs is necessary to keep them under 60℃, Tc critical temp.
21
35
40
45
50
55
60
65
0 10 20 30 40
rack1
rack2
rack3
rack4
35
40
45
50
55
60
65
0 10 20 30 40
rack1
rack2
rack3
rack4
35
40
45
50
55
60
65
0 10 20 30 40
rack1
rack2
rack3
rack4
1 2
3
Tem
pera
ture(℃)
Tc Tc
Tc • Flow with 1 CRAC is too weak to be
reached to top area and cold air is
consumed in the low area of the rack.
• Flow with 3 CRACs is strong, cold air
cannot be caught in the low area.
• We guess total capability of fans in
IBM x3250 is not so large.
Summary of changing number of CRACs
• Difficult to judge sufficiency by monitoring room temperature.
• Monitoring CPU temperature is a possible way to judge it.
• In order to keep the CPU temperature lower than 60 ℃,
3 CRACs are necessary.
• Power consumption by 3 CRACs (14kW), is necessary to remove
heat of IT equipment (27kW).
• The maximum volume of
airflow by 1 CRAC is 14000
CMH and can remove 23kW.
• 1 CRAC with half of airflow
can remove roughly 12kW.
• The necessary of 3 CRACs
seems to be reasonable.
• The efficiency of cooling is
Ec = 27kW / 14kW = 1.9
22
15
20
25
30
35
1 2 3 4
Tem
pe
ratu
re (℃
)
Number of CRACs
front of rack
rear of rack
inside of rack
10
20
30
40
50
60
1 2 3
Tem
pe
ratu
re (℃
)
Number of CRACs
front of rack
rear of rack
inside of rack
CPU average
CPU max Tc
10
20
30
40
50
60
1 2 3
Tem
per
atu
re (℃
)
Number of CRACs
front of rack
rear of rack
inside of rack
CPU average
CPU max Tc
Actual airflow
• We measured speed of air at air duct of CRAC, floor louver, and
front/rear of rack by anemometer and estimated airflow.
• It was found that quantity of air absorbed to rack is less than
50% of total flow of CRAC in the case of 2 CRACs.
• The remaining air
seems to reach
directly to ceiling.
• It is clear that
there is a lot of
energy loss.
CRAC
Wall
Flow panel
under the floor
Floor
Ceiling
Rack
Louver
(Grill)
<Notice>
The figure does not
represent accurately actual
environment.
Rack is rotated 90 degree
from the actual position.
Improvement of cooling efficiency
• We constructed front cover so that all of air was led into racks.
• CPU temperature in the case of 1 CRAC with front cover,
temperature of air : 15℃ , the volume of air : 50%
• CPU temperatures are at least 5℃ lower than those of 3 CRACs.
• Power consumption of 1 CRAC is about 4.5kW.
• The efficiency of cooling is Ec = 27kW / 4.5kW = 6.0
24
35
40
45
50
55
60
65
0 10 20 30 40
rack1
rack2
rack3
rack4
Tc
More improvement of cooling efficiency
• Reduce power consumption
by decreasing volume of
airflow.
• Only 20% of full volume of
air is enough for these IBM
servers, if front cover is
used.
• Then cooling power is
2.1kW
• The efficiency of cooling is
Ec = 27kW / 2.1kW = 12.9
25
35
40
45
50
55
60
65
0 10 20 30 40
CP
U te
mp
era
ture
(de
gre
e C
els
ius)
Bottom Rack Position (U) Top
rack1
rack2
rack3
rack4
20%
35
40
45
50
55
60
65
0 10 20 30 40
CP
U te
mp
era
ture
(de
gre
e C
els
ius)
Bottom Rack Position (U) Top
rack1
rack2
rack3
rack4
30% Tc
Tc
Summary
• Constructed testing laboratory : monitor temperature
and power consumption of IT equipment and cooling
facility
• Power consumption of fan behave as cube of fan
speed.
• Cooling power depends on not only power
consumption of IT equipment, but also configuration
of facility, rack and environment.
• It is possible to reduce cooling power by leading cool
air to rack directly.
• Monitoring temperature of CPU and controlling
number of CRACs and volume of air are useful
methods to reduce cooling power while maintaining
CPU temperature lower than threshold.
26