12
1 Challenge to the Data-intensive Science in Upper Atmospheric Research in Japan Y. Koyama *1 , K. Kurakawa, Y. Sato, Y. Tanaka, S. Abe, D. Ikeda, M. Nose, A. Shinbori, N. Umemura, T. Iyemori, S. UeNo, M. Yagi, and A. Yatagai *1 Graduate School of Science, Kyoto Univeristy & WDC for Geomag., Kyoto, Japan This is the final year of this projectt, & My contract will expires.

20140429 egu

Embed Size (px)

Citation preview

Page 1: 20140429 egu

1

Challenge to the Data-intensive Science

in Upper Atmospheric Research in Japan

Y. Koyama*1, K. Kurakawa, Y. Sato, Y. Tanaka, S. Abe, D. Ikeda, M. Nose, A. Shinbori,

N. Umemura, T. Iyemori, S. UeNo, M. Yagi, and A. Yatagai

*1Graduate School of Science, Kyoto Univeristy& WDC for Geomag., Kyoto, Japan

This is the final year of this projectt,

& My contract will expires.

Page 2: 20140429 egu

My viewpoint

Here I position my talk in

this session as right figure.

Many presenters will talk

about bottom layer.

So, I would like to

emphasize the necessity

of connection of these

three layers on the

Internet.

2

Tony Hey, Stewart Tansley, & Kristin Tolle (Eds.). (2009).

The Fourth Paradigm: Data-Intensive Scientific Discovery.

Microsoft Research.

Retrieved from http://research.microsoft.com/en-

us/collaboration/fourthparadigm/default.aspx

Redefined by Y. KOYAMA.

My v

iew

poin

t

Page 3: 20140429 egu

CAUTION To stay away from confusion, it is important to clarify the

presenter's and listener’s position.

Candidates of Stakeholder for this topic are Researcher,

Data Publisher,

Journal Publisher,

Funding Agency,

President of Institute,

Voter,

Tax Payer,

and so on.

Today, I talk in a Researcher's position.

3

Page 4: 20140429 egu

Problems of Scholarly

Communication From the beginning of journal

history,

It is difficult to reach data.

Tables in body text, and in appendix is not enough.

Repositories is not enough for Big Data, and Data already released on the Internet.

It is difficult to reproduce.

Reproducibility is poor from the beginning. In addition,

We are now facing Big Data.

The number of papers is increasing from 2000 dramatically.

May we ignore this situation?

4doi:10.1098/rstl.1665.0007

R. Boyle

Page 5: 20140429 egu

Linkage between paper and dataset

in Japan

Japan Link Center (JaLC) was established in 2012 as

the 9th Digital Object Identifier (DOI) Registration

Agency in the world.

JaLC is same rank of CrossRef and DataCite basically,

but JaLC is also under the umbrella of CrossRef and

DataCite simultaneously. (this situation is

complecated.)

JaLC will has a function which gives DOI to dataset

and JaLC’s metadata format is compatible of

DataCite’s one.5

Page 6: 20140429 egu

DOI,ORCID

DOI,ORCID

Linkage between paper and dataset

in near future.

• Europe, U.S, and Australia

are progressing more than

Japan.

• In near future, Data

Publication & Citation will

be spreaded all over the

world.

Page 7: 20140429 egu

How to reproduce the

contents of the paper?

Here, I would like to ask you a question again.

[Q] “If you reach the dataset by Data Publication and

Citation, can you reproduce the result of the paper?”

My answer is NO.

To reach data is just the first step toward reproduce.

Metadata to explain the dataset is needed.

Then general metadata by DataCite/JaLC is insufficient,

and domain specific metadata is needed such as

IUGONET metadata.

7

Page 8: 20140429 egu

Lack of Quick Look

Moreover, at least, Quick Look and Data

Analysis Software code should be shared. And

it should be freely distributed.

Code Citation is needed.

Is that all?

8

Page 9: 20140429 egu

1st2nd

3rd

Cliffs of Intermediate Data Layer

4thDOI

DOI • The data-analysis procedure

written by natural language in

literature is insufficient

because of lack of information.

• The literature and published data

through intermediate data need to

be connected on the Internet.

• The code which outputs and

understands data-analysis

procedure in machine-readable

format is needed.This figure is expressing the situation

that Intermediata Data Layer is not

shared on the Internet.

Page 10: 20140429 egu

Which field is the best

as testbed to realize this?

Upper Atmospheric Research is the best.

Far from ELSI (Ethical, Legal, Social Issues).

Far from Big Data that is generated by the mobile sensor

devices with GPS.

Open Data Culture is rooted from International Polar Year

(IGY:1957-1958) and most Data have already shared on

the Internet. It is very good starting point.

IUGONET, ESPAS, and V*Os already exist.There product

and community become the base.

10

Page 11: 20140429 egu

If Literature-Intermediate

Data-Published Data is

connected, Inheritance of

knowledge becomes

more certainly.

Reproduction of

knowledge accelerates.

11

+

Page 12: 20140429 egu

Conclusion I pointed out the importance of cooperation of Literature,

Intermediate Data, and Published Data on the Internet.

To realize these connection, sharing of following items are needed.

Data,

Metadata,

Persistent Identifer to specify Dataset, Subset, and Granule.

Code for understand metadata, generate and understand data-analysis procdure in machine-readable format, visualize, analyze data,

Data Sharing Infrastructure.

The human resource to realize this!

12