32
The fourth paradigm: a research perspective Data Science Symposium Arnold Bregt

Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Embed Size (px)

Citation preview

Page 1: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

The fourth paradigm: a research

perspective

Data Science Symposium

Arnold Bregt

Page 2: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Outline presentation

The fourth paradigm

Your opinion

The roles of science

● Data producer role

● Data user role

● Data governance/management role

Conclusions/refection

Page 3: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

A short introduction to.....me (Arnold)

Geo-information Science - Wageningen University

MSc Geo-information science

Research topics our group:

● Sensing and measuring

● Modelling and visualization

● Integrated land monitoring

● Human-space interaction

● Empowering communities

Page 4: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

My field: Geo-information science

Page 5: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Paradigms in Science: a classification

Page 6: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Your opinion

Who has made new discoveries by only analysing data?

Who think we collect too much data?

Who beliefs that the fourth paradigm is a new paradigm?

Page 7: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

The fourth paradigm

Data-intensive scientific discovery

(almost) all disciplines are more data intensive

It is a hype (“Big data”)

Page 8: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Papers in Scopus “fourth paradigm”

Page 9: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Papers in Scopus “big data”

Page 10: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

A lot of conferences

Page 11: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Universities and IBM

Page 12: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

eScience in Scopus

Page 13: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Marie Tharp (Oceanography)

1920- 2006

Seafloor mapping (1957)

Envisioning processes from 2D observations

Page 14: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Is it really new?

Data is always used by science for discovery

What is new:

● Volume

● Type of data (more spatial/temporal resolution)

● Data by “accident” or “surprice”

Page 15: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

More for less

Page 16: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data by surprise

Page 17: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

The role of Science

Data producer role

● Past

● Present

● Future

Data user role

● ...

Data governance/management role

● ....

Page 18: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data producer role: Past

Collect own data key part of research

Contextual knowledge of data

Owned by researcher (at least not claimed by university)

Page 19: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data producer role: Present

Own data collection additionally to existing data (data for validation)

Data collection in communities (consortia)

Researcher compile collections (data selection) (example)

Page 20: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data producer role: Future

More producer of aggregated data based on existing data (meta-analysis on data)

Role of scientist as data producer will be reduced

Validation data from small experiments

Data production as an own activity (specialist)

....

Page 21: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data user role: Past

Analyse own data

Direct knowledge of data context

(even) own software for data analysis (example)

Page 22: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data user role: Present

Strong increase of reuse of existing data (example)

More statistical relations (statistically different)

Less understanding of causal relations

Page 23: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Example

Page 24: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data user role: Future

Quest for processing and visualisation algorithms

Strong increase of re-use

More “data-based” science

..

Page 25: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data governance/management role: Past

Researcher manages own data

Stored in paper archives

Collections are important

Role of libraries and museum's

Page 26: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data governance/management role: Present

Increased attention and institutions

Data as part of publications

DANS, 3TU.datacentrum Research data Netherlands

Data management plan (PhD’s)

Page 27: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data management plan

All PhD’s must formulate DMP.

Chair groups are responsible

Critical issue from plan to implementation

Page 28: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data governance/Management

The Availability of Research Data Declines Rapidly with Article Age

Timothy H. et al. 2014, Current Biology

We examined the availability of data from 516 studies between 2 and

22 years old

The odds of a data set being reported as extant fell by 17% per year

Broken e-mails and obsolete storage devices were the main obstacles

to data sharing

Policies mandating data archiving at publication are clearly needed

Page 29: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Data governance/management role: Future

Selection of data to be preserved

Specialist task (in close interaction with the library) “from book to data library”

Key-role for meta-data

Page 30: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Extent description

Target groups Functions

Manage Search Exchange Use

Personal + + - -

Own organisation/researchers ++ ++ ++ ++

Other organisation/researchers - +++ +++ +++

Page 31: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

Conclusions/Refection

Data has always played a key-role in science

The fourth paradigm is not new, but “scale is new”

The role of the scientist is changing from primary data collection to re-use of existing data

Which means that the “data knowledge” is decreasing

Page 32: Dsd int 2014 - data science symposium - 4th paradigm a research perspective, prof. arnold bregt, wageningen ur & amsterdam institute for advanced metropolitan solutions

The fourth

paradigm

For scientists an

evolution

(not a revolution)