12
Research Data Alliance (RDA) for HPC SC13 Birds of a Feather session November 20, 2013 17:30-19:00 MST Colorado Convention Center Denver Colorado Contribution of John W. Cobb Oak Ridge National Lab. DataONE Project

SC13 BoF: RDA and HPC

Embed Size (px)

DESCRIPTION

5 minute presentation during the SC13 Birds of a Feather Session on the relationship between the Research Data Alliance and High Performance Computing.

Citation preview

Page 1: SC13 BoF: RDA and HPC

Research Data Alliance (RDA) for HPC SC13 Birds of a Feather session November 20, 2013 17:30-19:00 MST Colorado Convention Center Denver Colorado

Contribution of John W. Cobb Oak Ridge National Lab. DataONE Project

Page 2: SC13 BoF: RDA and HPC

2 Presentation name

Why Am I here? From what perspectives do I speak?

• Discipline scientist • HPC application evangelist • Cyberinfrastructure leverage for experimental facilities • Cyberinfrastructure/HPC center operations • Cyberinfrastructure efforts for data-Intensive science efforts

Without data there is no science

Page 3: SC13 BoF: RDA and HPC

3 Presentation name

HPC centers and archive have different service objectives

Cycles not used are lost

Data management involves a long-term commitment of resources

Page 4: SC13 BoF: RDA and HPC

4 Presentation name

Comparing HPC centers and data archives

Simulations • Generate data at will

• Can programmatically

control data quality • Can be reproduced more

easily •  ==> Can be copious • weaker tradition of

metadata and data quality

Experiment/Observation • Collect data from physical

events • Data quality may be limited

by collection methods • May be difficult, expensive,

or impossible to reproduce •  ==> May be more limited •  long-term focus on metadata

and data quality

Page 5: SC13 BoF: RDA and HPC

5 Presentation name

Consequently different challenges

• HPC centers excel at: –  Volume and velocity –  Analysis at scale

• Archives excel at: –  Variety –  Metadata capture –  Data quality

Page 6: SC13 BoF: RDA and HPC

6 Presentation name

Convergence of data and HPC Some DataONE experience

Page 7: SC13 BoF: RDA and HPC

7 Presentation name

eBird pilot project exploration and visualization

Spa$o-­‐Temporal  Exploratory  Model  iden$fies  factors  affec$ng  pa;erns  of  migra$on  

Diverse  bird  observa$ons  and  environmental  data  from  300,00  loca$ons  in  the  US  integrated  and  analyzed  using  High  Performance  Compu$ng  Resources  

Land  Cover  

Meteorology  

MODIS  –  Remote  sensing  data  

•  Examine  pa;erns  of  migra$on    

•  Infer  how  climate  change  may  affect  bird  migra$on  

Model  results  

Occurrence  of  Indigo  Bun=ng  (2008)  

Jan   Sep   Dec  Jun  Apr  

Page 8: SC13 BoF: RDA and HPC

8 Presentation name 8

Page 9: SC13 BoF: RDA and HPC

9 Presentation name

Exploration, Visualization, and Analysis

9

Benchmark  Observa=ons  

Terrestrial  Biosphere  

Model  Output  

Model    Structure  

Informa=on  

Provenance Framework

Workflows for hypothesis

development, testing, and exploration

Interactive maps and plots for multi-dimensional data exploration and analysis

Page 10: SC13 BoF: RDA and HPC

10 Presentation name

DataONE experience • CI created: interoperable data service functional interfaces •  4 reference interface implementations completed •  8 client-side “investigator toolkit” tools released, 4 more in

development •  16 collaborating Member Node repositories (internationally) •  > 100,000 data objects published • Conducted 81 workshops of data management • Published 65 data management “best practices” • Completed several baseline and follow-up surveys on state

of data management with scientists, libraries, librarians, …

Page 11: SC13 BoF: RDA and HPC

11 Presentation name

DataONE experience (cont.)

About half the effort has been on education, training and outreach about

data management practices

Page 12: SC13 BoF: RDA and HPC

12 Presentation name

“Data = Human” - Genevieve Bell SC13 Keynote