17
Software Sustainability Institute www.software.ac. uk Dealing with software: the research data issues http://dx.doi.org/10.6084/m9.figshare.1150298 26 August 2014, Dealng with Data Conference Neil Chue Hong (@npch), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | Where indicated slides licensed under Supported by Project funding from

Software Sustainability Institute Dealing with software: the research data issues 26 August

Embed Size (px)

Citation preview

Page 1: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Dealing with software:the research data issueshttp://dx.doi.org/10.6084/m9.figshare.1150298

26 August 2014, Dealng with Data ConferenceNeil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]

Where indicatedslides licensed under

Supported by Project funding from

Page 2: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

“Re-” is the new black

Page 3: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

The Research Cycle

Create

Test

Interpret

PublishRevise Paper

Data

Software

Research Outputs Research is a continuous cycle.

When we publish we are contributing to the body of knowledge.

Page 4: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Research/Reuse/Reward Cycle

Index

Identify

CiteRewardCreate

Test

Interpret

PublishRevise

Research Reuse Reuse is also a cycle. We build our research on the work of others.

Reward mechanisms should encourage reuse.

Page 5: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

The current process

Startresearch

Writesoftware

Usesoftware

Produceresults

Publishresearch

paper

Releasedata

Releasesoftware

Which mentions software and data

This process is simple but does not reward production orreuse of good software and data.

It also has a long contribution cycle.

Page 6: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

“Re-”positoriesBackup|Sharing|Archivingof software

Page 7: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Differing roles, different repositories

backup sharing archiving

TimescalesPolicyLicensing

IngestMetadataAssurance

Page 8: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Versioning

Personalv1

Personal v2

Personalv3

Personal v2a

Public v1

Personal v3a

Personal v2a

Public v2

Public v3

Why do we version?- To indicate a change- To allow sharing- To confer special status

Version control systems make this easy and conceptof a person and an outputare there but not unique

Page 9: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Algorithm

Function

Prog

ram

Library / Suite / Package

Granularity

What do we define?- Useful units of reuse

Page 10: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

What do we choose to identify:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?

Boundary

Page 11: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

AuthorshipAuthorship• Which authors have had what impact on each version of the software?• Who had the largest contribution to the scientific results in a paper?• Can micro-attribution work? Can track author, but not contribution?

http://beyond-impact.org/?p=175

OGSA-DAI projects statistics from Ohloh

Why do we identify?- To measure- To restrict- To communicate- To include

Page 12: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Code as a Research Object

• What if you could assign DOIs to code easily?

• Could we make software more reusable?• http://mozillascience.org/code-as-a-research-object-a-new-project/• https://guides.github.com/activities/citable-code/

Page 13: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Writesoftware

A better process?

Startresearch

Identifyexisting

software

Usesoftware

Produceresults

Publishresearch

paper

Adapt/extend

software

Releasedata

Releasesoftware

Publishsoftware

paper Publishdata

paper

Which references

softw

are and data papers

Software and data papers are needed as proxies for rewarding reuse.

But it enables a shorter contribution cycle for data and software.

Page 14: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Alternative Metrics

Page 15: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

One-click challenge

• “One-click” archiving of a significant version of software in a code repository to a suitable institutional repository

• “Suitable” repository: Clear access / deposit / preservation policy Adherence to standards Ability to easily “transfer” in / out Allows use of appropriate licenses for code Sustainability of hosting organisation Ability to monitor, check integrity Provides permanent unique identifiers

• Proposing a hackday to make this happen

Page 16: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Summary

• Software is an important output of the research cycle, and should be rewarded

• Repositories play an important role in the research cycle, including software

• But software has specific issues with regards to research data management

• Tooling is needed to lower barriers to deposit

Page 17: Software Sustainability Institute  Dealing with software: the research data issues  26 August

Software Sustainability Institute

www.software.ac.uk

Further information

• This presentation: Slides: http://dx.doi.org/10.6084/m9.figshare.1150298 Abstract: http://dx.doi.org/10.6084/m9.figshare.1150299

• Where does it go from here: the place of software in digital repositories http://www.research.ed.ac.uk/portal/en/publications/where

-does-it-go-from-here-the-place-of-software-in-digital-repositories(ab6130c6-aee6-4972-9256-8ea0eb1862c9).html

• Software Papers: improving the reusability and sustainability of scientific software http://dx.doi.org/10.6084/m9.figshare.795303

• Software Sustainability Institute http://www.software.ac.uk/ Supported by EPSRC

Grant EP/H043160/1