18
One Team – One Culture – One Purpose – One SSC Environment and Climate Change Canada HPC Renewal Project: Procurement Results 17 th Workshop on HPC in meteorology ECMWF, Reading, UK Alain St-Denis & Luc Corbeil October 2016

Environment and Climate Change Canada HPC Renewal Project

Embed Size (px)

Citation preview

Page 1: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Environment and Climate Change Canada HPC Renewal Project:Procurement Results

17th Workshop on HPC in meteorology

ECMWF, Reading, UK

Alain St-Denis & Luc Corbeil

October 2016

Page 2: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Outline

• Background

• History

• Scope

• RFP

• Outcome

2

Page 3: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

HPC Renewal for ECCC Background

• Environment Canada highly dependent on HPC in delivery of mandate: simulation of Environmental Forecasts for health, safety, security and economic well-being of Canadians.

• Contract with IBM expiring with few remaining options to extend

• Linked to Meteorological Service of Canada (MSC) Renewal Treasury Board Submission

Component 1: Monitoring Networks

Component 2: Supercomputing capacity

Component 3: Weather Warnings and Forecast System

• Joint ECCC-SSC submission for Supercomputing Capacity3

Page 4: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

New player: Shared Services Canada

• Created in 2012, to take responsibility of email, networks and data center for the whole Government of Canada.

• Supercomputing IT people working for ECCC transferred to SSC.

• Scope of the HPC team expanded to all science departments

• As in any reorganization, there are challenges and opportunities!

Page 5: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC 5

Shared Services Canada was

formed to consolidate and

streamline the delivery of IT

infrastructure services,

specifically email, data centre

and network services. Our

mandate is to do this so that

federal organizations and their

stakeholders have access to

reliable, efficient and secure IT

infrastructure services at the

best possible value.

SSC will Innovate, ensure full Value for Money and achieve Service Excellence !

Service to

Canadians

Departmental

Programs

SSC

Services

Shared Services Canada – Our Mandate

Page 6: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

A Bit of History

• ECCC has been using a supercomputer for weather forecasting and atmospheric science for more than half a century

6

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

Millio

ns o

f F

loati

ng

Po

int

Op

era

tio

ns p

er

Seco

nd

Year

Peak Sustained

Power7

360/65

NEC IBM

1

CRAYCDC

G20

IBMBendix

7600176

X-XMP 28

X-XMP 4-16

SX-3/44SX-3/44R

SX-4/16

SX-4/80M3SX-5/32M2

SX-6/80M10

Power4

Power5

Page 7: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

A Bit of (More Recent) History

• Request for Information (Fall 2012,

• Invitation to Qualify (Fall 2013, 4 bidders qualified)

• Review Refine Requirements (Summer 2014)

• Requests for Proposal (November 2014 – June 2015)

• Treasury Board Approval (April 2016)

• Contract Award (May 27 2016)

7

Page 8: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Scope

Scope In replacement of

Supercomputer clusters Two 8192 P7 cores clusters

Pre/Post-Processing clusters (PPP) Two 640 X86 cores custom clusters

Global Parallel Storage (Site-Store) CNFS and ESS clusters

Near-Line Storage (HP-NLS) StorNext based archiving cluster

Home directories Netapp home directories

8

As well as

• Hosting of the Solution

• High Performance Interconnects

• Software & tools

• Maintenance & Support

• Training & Conversion support

• On-going High Availability

Page 9: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

ECCC Supercomputing Procurement Requirements

• Contract for Hosted HPC Solution: 8.5 years + one 2.5 year option (Transition year + two upgrades + one optional)

• Connectivity betweenHPC Solution Data Halls and Dorval

• No more than 70km between Hall A, Hall B& Dorval

• Flexible Options for additional needs

Hall B

NCF

Solution Data Hall A Solution Data Hall B

Inter-H

all L

ink (x2

) Inter-Hall Link (x2)

Inter-Hall Link (x2)

Hall A

On-going Availability

Page 10: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

High Level Architecture

10

SCF Data Flow – Logical View 2014-10-07LPT, HPN/DADS, SSC

HP-NLS

SupercomputerB

HP-NLS

HPN Data Transfer

Storage Synchronization

Scratch

Cache Cache

Scratch

Site

Store

Home

Out-of-Band Management

Site

Store

Home

DATA Feeds

Pre/Post

ProcessingB

Pre/Post

ProcessingA

NCF

DATA Feeds

SupercomputerA

Solution

Data Hall B

Solution

Data Hall A

Page 11: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Outcome

• IBM was awarded the contract

Evaluation based on benchmark performance on a fixed budget

• IBM's Proposal for initial system

Supercomputer: Cray XC-40, Intel Broadwell, Sonexion Lustre Storage

PPP: Cray CS-400, Intel Broadwell

Site-Store and Homes: IBM Elastic Storage Server (ESS, GPFS-based)

HP-NLS: based on IBM High Performance Storage System (HPSS)

11

Page 12: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Sizing

• Computing

About 35,000 Intel Broadwell cores per Data Hall ♦ Super and PPP combined

• More than 40PB of disk storage

2.5 PB scratch storage per supercomputer (one per data hall)

18 PB site store per data hall

1.1 PB disk cache to the archive per data hall

• More than 230 petabytes of tape storage (two copies)

12

Page 13: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Comparison

13

HP-NLS storage (vs current tape capacity), petabytes

Site-Store, homes storage (vs current), petabytes

Sustained TFlopsSupercomputer and PPP

(vs P7, current PPP)

Peak TFlopsSupercomputer and PPP

(vs P7, current PPP)

Cores count Supercomputer and PPP (vs P7, current

PPP)

Scratch storage (vs p7), petabytes

0

1

2

3

4

5

6

Increase Factors

Page 14: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

The Newest Addition to a Long History

14

Bendix G20IBM 360/65

CDC 7600 CDC 176

Cray 1S

Cray XMP-28

Cray XMP 416NEC SX-3/44

NEC SX-3/44R

NEC SX-4/16

NEC SX-4/80M3NEC SX-5/32/M2

NEC SX-6/80M10IBM P4IBM P5

IBM P7

IBM/XC-40

0.01

0.10

1.00

10.00

100.00

1000.00

10000.00

100000.00

1000000.00

10000000.00

100000000.00

1000000000.00

10000000000.00

Historical Performance, EC Supercomputers (Flops)

Sustained

Peak

Page 15: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Resulting Architecture

15

Page 16: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

HPC Implementation Milestones: Delivery to Acceptance

• Data Hall and Hosting Site Certification

• Functionality Testing (IT infra)

• Security Accreditation

• Performance testing

• Conversion of Operational codes (Automated Environmental Analysis & Production (AEAPPS)

• Meeting the above triggers a 30 day availability test

Inspection

Functionality Testing

Performance Testing

Conversion

RFU

Acceptance

16

Page 17: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC

Challenge

• Change the Supercomputer clusters, PPP clusters, archiving system and homes. All at once. Never been done

A lot of preparation work has been done ahead of time♦ Most codes have already been ported to Intel architecture

♦ Our General Purpose Science Clusters available for PPP migration work

– Linux containers are being leveraged to smooth the transition

17

Page 18: Environment and Climate Change Canada HPC Renewal Project

One Team – One Culture – One Purpose – One SSC 18

Thank you!

Questions?