33
超超超超超超超超超超超超超超超超超超超超超超超 Consideration of the scholarly information infrastructure on upper atmospheric research field as a test bed Yukinobu KOYAMA orcid:0000-0001-5363-3870 Transdisciplinary Research Integration Center /National Institute of Informatics, Research Organization Information and Systems. 1

20151028koyama

Embed Size (px)

Citation preview

Page 1: 20151028koyama

1

超高層物理学を試験環境とした学術情報基盤の考察    Consideration of the scholarly information infrastructure on upper atmospheric research

field as a test bedYukinobu KOYAMA

orcid:0000-0001-5363-3870Transdisciplinary Research Integration Center

/National Institute of Informatics, Research Organization Information and Systems.

Page 2: 20151028koyama

Self Introduction2015 〜 新領域融合研究センター

2009 〜 2015  京大・理・地磁気センター IUGONET (DB + etc.)

2007 〜 2009 NAOJ ・ ALMA (Software Integration, Test & Support)

2006 〜 2007 京大・エネ科・ Energy Economics (DB   +   Linear Programming Model)

1998 〜 2000 〜 2006 NAIST ・ Phys.   Phys.+HPC

Page 3: 20151028koyama

From this Apr.

Page 4: 20151028koyama

4

大学共同利用機関法人人間文化研究機構自然科学研究機構高エネルギー加速器研究機構情報・システム研究機構

国立極地研究所国立遺伝学研究所統計数理研究所国立情報学研究所

手法に焦点。Big Data 時代、 Data-intensive Science を迎える前の、2004年の法人化の時点でこのように組織した。

研究対象に焦点。

Page 5: 20151028koyama

5

THE FOURTH PARADIGM

2009 年 10 月 Tony Hey 編

Jim Gray, Peter Fox らにって、実験、理論、数値計算につづく、第 4 のパラダイムである、 Data-intensive science の到来が示唆されている。

Page 6: 20151028koyama

6

世界最古!?のデータ中心科学ティコ・ブラーエ ( デンマーク、 1546-1601) : 肉眼で星のデータベースを作成。ヨハネス・ケプラー ( ドイツ、 1571-1630) : ブラーエの星のデータベースから、ケプラーの法則 (1609) を導く。(手法) 仮説→ティコの VO を調査→ blablablaハンス・リッペルスハイ(オランダ):  1608 年に望遠鏡を発明。ガリレオ・ガリレイ ( イタリア ) : 1609 に望遠鏡を宇宙に向ける。

Page 7: 20151028koyama

7

本編

Page 8: 20151028koyama

Origin of Journal Culture Royal society of London

philosophical transactions started to published in 1665.

Basically, the format is not changed for 350 years! Imcompleteness:

Data Citation, Metadata of Datasets, Description of the derivation

process,

Sharing problem of data visualization and analysis software.

8R. Boyle, doi:10.1098/rstl.1665.0007

Page 9: 20151028koyama

Introduction 1Number of articles and quantity of data

9

NISTEP, 2013

http:-reports/idc-digital-universe-2014.pdf

Total storage capacity in 2013: 4.4ZB( kilo, mega, giga, tera, peta, exa, zetta, yotta)It's increased 40 percent a year.

[Q] Articles & Data is increasing suddenly. Papers which have no reproducibility are generating. Is the current scholarly communication infrastructure enough?

Unable to validate the relevant preclinical research for almost two-thirds [Wadman, 2013]

サイエンスは国家公務員がやるもの?

Page 10: 20151028koyama

10

To simplify the issue

We consider Upper Atmospheric Research field to stay away fromEthical, Legal, Social Issues.

http://www.nipr.ac.jp/jare/now/20150901.html

Page 11: 20151028koyama

Overview of Scholarly informations

Page 12: 20151028koyama

12

Japan Link Center (JaLC) JaLC is the 9th registration agency of DOI in the world.

Koyama is a member of External Committee of JaLC.

JaLC started to mint DOI into Research Data in 2014

Page 13: 20151028koyama

13

JaLC 会員である NICT で走っている、 DOI 登録仲介システム 日本の WDS/WDC メンバー用。

Drupal ベース。 プロトタイプは小山作。

その後、業者にお任せ。

Page 14: 20151028koyama

14

Japanese UsecaseLanding Page of DOI

Our WDS/WDC group in Japan minted a DOI to mesospheric wind velocity data observed by NICT.

This is the first case in “ DOI REGISTRATION EXPERIMENTAL PROJECT TO RESEARCH DATA” by JaLC.

This DOI have already refered from JGR paper.(doi:10.1002/2014JD022647)

doi:10.17591/55838dbd6c0ad

Page 15: 20151028koyama

15

Ecosystem of scholarly communications

https://theresearchwhisperer.wordpress.com/2013/04/23/data-citation/

Page 16: 20151028koyama

Overview of Scholarly informations

Page 17: 20151028koyama

17

Upper Atmospheric Domain Specific Metadata Database

(IUGONET Metadata DB)

http://search.iugonet.org/(Customized Dspace 1.7.2)

Instantiation

Insert into DB

Page 18: 20151028koyama

18

Data Handlingin Upper Atmospheric Research

Upper Atmospheric FieldVariety issues in Big Data.Data Format is not unified. To unify it is too difficult.

Data Analysis absorb the difference of data format.

W3 CSV on the web working group.

Page 19: 20151028koyama

19

5 Stars OPEN DATA⭐️ make your stuff available on the Web  

(whatever format) under an open license.

⭐️⭐️

make it available as structured data (e.g., Excel instead of image scan of a table).

⭐️⭐️⭐️make it avaibalbe in a non-proprietary open format (e.g., CSV as well as of Excel).

⭐️⭐️⭐️⭐️use URIs to denote things, so that people can point at your stuff.

⭐️⭐️⭐️⭐️⭐️link your data to other data to provide context.

Page 20: 20151028koyama

20

Upper Atmopsheric Domain SpecificData Visualization & Analysis Software

(SPEDAS) IDL is needed:

$2,500/license in Japan. Can’t use CLI on free VM. IDL: Popular soft. in Astro. However, SPEDAS conflicts with SolarSoft

in Astronomy because of name space. Confliction because of no name space. Not enough for Big Data Analysis to

use many core because of limitation of number of licenses.

For domain researcher mainly. Not good choice for neighbor field scientist,

Data Scientist, scientist in Development Country, Citizens?

SPEDAS

Page 21: 20151028koyama

21

ドメイン研究者であっても、直面する問題。name space 問題

solar soft とコンフリクト

Many core 問題ライセンスに縛られて、 many core を用いた解析ができない。

Page 22: 20151028koyama

22

The Open Definitionby opendefinition.org

Open means anyone can freely access, use, modify, and share for any purpose.

Open data and content can be freely used, modified, and shared by anyone for any purpose.

Open Format:Specifically, data should be machine-readable,

available in bulk, and provided in an open format, at the very least, can be processed with at least one free/libre/open-source software tool.

Page 23: 20151028koyama

23

Basic Concept

Page 24: 20151028koyama

24

Deployment Diagram

Page 25: 20151028koyama

25

Class Diagram

GeoToos by

OSGeo

OpenCV

InheritDst

Index

Page 26: 20151028koyama

26

JavaFX-based iUgonet Data Analysis Software

1984/10/1 Dst Index

Page 27: 20151028koyama

The byte code runs on multi platform.

SolarisWindows 10

Linux(SL7)

Mac OS X(El Capitan)

Page 28: 20151028koyama

Possibility of the JudaFX(Data-intensive Sci. by using many

cores)

Page 29: 20151028koyama

Possibility of the JudasFX(Distributed Computing by using

BOINC)

It is essential for numerical models which has so many parameters.

Page 30: 20151028koyama

追記ドメイン研究者以外にも、隣接分野の研究者、データサイエンティスト、開発途上国の研究者、一般市民。

JavaFX + JAXB + IUGONET Metadata により、メタデータを解釈。 (Rendering Hint など )

JavaFX + Jython + (JyNI + scipy)

Page 31: 20151028koyama

31

今行っていること( 海外からの引き合いもあるので)

IDL で作成してしまった、小山作の電離圏電気伝導度モデルの JudasFX への移植を行っている。

EISCAT の可視化、解析動作確認。SuperDARN の可視化、解析動作確認。

Page 32: 20151028koyama

32

ConclusionWe summarized ideal scholarly information

infrastructure.We indicated the current achievement situation

in upper atmospheric research field.We suggest the importance of free data analysis

software.Building the 100% free Data Visualization and

Analyze software which is called “JudasFX”.

Page 33: 20151028koyama

33

RDA のご案内2016/03/01-03: Research Data Alliance が、東京 ( 一ツ橋会館 ) で開かれます。2/29 にプレイベントがあります。

九大の方のイベントと重なる可能性もありますが、お手すきの方は、ぜひ参加することをお勧めします。

キーワード: オープンサイエンス、データ中心科学、 CODATA 、 WDS 、データ出版、データ引用、 provenance