Harvesting Repositories DPLA, Europeana, and Other Case Studies
ALA Conference June 25, 2016
Introductions
Erin Tripp, Bus. Dev.
Staff librarian since 2011. Erin delivers Islandora
training at events worldwide and has managed more than 40 digital repository projects.
Contact Details ● Email: [email protected] ● Twitter: @eeohalloran or @discgarden ● Hashtags: #islandora #ALAAC16
Agenda
Objectives Overview
By Show of Hands & Introductions
Why Should We Care? Repository Requirements
OAI-PMH Overview
Case Studies
Top Takeaways
Objectives for Today
Learn a thing or two about:
● OAI-PMH
● Common Harvesters
● Who to ask for help
● What questions to ask
● Confidence to continue
learning/ try a new tool
By Show of Hands...
Who is interested in ● National Harvester, ● State Harvester, ● Subject Harvester, or ● Proprietary Discovery Service
Harvester? Who has already been involved in a harvesting project? Who has experience using ● XLSTs ● OAI-PMH ● REPOX?
Why should we care? Discoverability.
Why should we care? Discoverability.
February 2015 LITA panelists said Top Technology Trends include enhancing discoverability (Enis, 2015) Making content accessible where the search originates (e.g. Google, Google Scholar, WorldCat, DPLA, Europeana) creates value for digital libraries and users Repositories contributing to aggregators can experience increased site visits from 55-109 per cent (DPLA, n.d.)
Why should we care? Discoverability.
Increased exposure through
● Blogs, social media and Wikipedia,
Provide richer context and increase the visibility of your collections
Make your collections available for re-use by other services (Europeana, n.d.)
Access to valuable skills
Data modelling
Copyright and licensing
Reporting on access usage analytics (Europeana, n.d.)
Why should we care? Discoverability.
Using open source
Linking up to thousands of other collections
Interoperable (no vendor lock in/ proprietary formats)
Access to Wikimedia Commons (Europeana, n.d.)
Expanding your network
Connect with like-minded industry professionals
Identify potential partners and joint funding opportunities
Reach out to other sectors – creatives, education, tourism and more (Europeana, n.d.)
Why should we care? Discoverability.
Anecdotally, repository harvest can: ● Act as incentive for people to deposit content into
the repository / buy-in from stakeholders
● Clean up and normalize metadata resulting in better raw material to support discovery
OAI-PMH Overview
OAI-PMH
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
Low-barrier mechanism for repository interoperability
OAI-PMH is a set of six requests
(aka verbs or services) that are invoked within HTTP
Providers
Data Providers are repositories that expose structured metadata via OAI-PMH = Repository
Service Providers then make OAI-PMH service requests to harvest that metadata = Harvester
Vocabulary
Request/ Verb/ Service The action that the service
provider (harvester) is requesting from the data provider (repository)
Response Size The maximum number of
records to issue per response
Vocabulary… continued
Resumption Token
When a request returns records greater than the response size a resumptionToken is issued such that the service provider can resume harvesting from where it left off
Identify
This request used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Example: YourSite/oai2?verb=Identify
Vocabulary… continued
ListMetadataFormats This request is used to retrieve the metadata formats available from a repository. Example: YourSite/oai2?verb=ListMetadataFormats
ListRecords This request is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp. Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc
Vocabulary… continued
ListSets This request is used to retrieve the set structure of a repository, useful for selective harvesting All Collections Example: YourSite/oai2?verb=ListSets Specific Collection Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc&set=ir_citationCollection
Repository Requirements
Accessible to the web
Storing standards, XML-based descriptive metadata
The ability to apply additional
metadata mapping if needed (rather in or external to repository)
Access to documentation and XSLTs used for metadata mapping
Repository Requirements
Pass XML metadata to service provider from the:
1. Preservation (storage) component or
2. Discovery (index) component
Provide a method to harvest a TN and link back to repository Accommodate customization
Repository Requirements … Continued
For example: University of South Carolina video content model is tiered for preservation, media production and streaming web access. We only want to harvest one of three possible records
Case Study Europeana
Europeana
Our material comes from all over
Europe and the scope of the
collections is really quite
astonishing. [...]
http://www.europeana.eu/
http://pro.europeana.eu/
Intermediate Aggregator
Digibess repo stores digitized objects from 18 Economic and Social Sciences libraries in Italy Europeana requires an intermediate aggregator; a national harvester such as Cultura Italia Cultura Italia harvests custom “Pico” metadata format from Digibess and then is harvested by Europeana
Harvesting Tools
Digibess pre-dated Islandora OAI module and REPOX aggregator
Used Proai servlet oaiprovider-1.2.2
Harvest resulted in examining in general needs and specific applications of the protocol
Digibess on Europeana
REPOX
Since the Digibess project a new intermediate aggregator has been released called REPOX. It aims to provide [...] Europeana partners a simple solution to import, convert and expose their bibliographic data via OAI-PMH http://repox.sysresearch.org/
Case Study Digital Public Library of America (DPLA)
DPLA
The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world.
https://dp.la/info/
Service Hub
Empire State Digital Network (ESDN) is the New York State service hub for the DPLA
Hosted and administered by the Metropolitan New York Library Council in conjunction with eight allied regional library councils working collectively in New York State as the ESLN
Liaise with partners for data aggregation, mapping and licensing
Mapping & Testing
Harvests from partners using OAI-PMH
o Provides all partner metadata to DPLA through one OAI-PMH feed from REPOX
Undertakes data review and QA prior to exposing feed to DPLA for harvest
ESDN on DPLA
Case Study Other Discovery Services
Other Discovery Services
WorldCat, Summon, & Primo are commercial discovery services Local discovery layers can also collocate resources for discovery OAI -PMH modules within your repository framework can allow for these services to harvest your repository
Everyone is Harvesting Everyone
Connecticut State Library aggregating data to Research It State Library harvests University of Connecticut Archives and Special Collections, ILS and other University of Connecticut Library harvests to Summon/ Primo and will be harvested by DPLA
Creating Lots of Portals
University of Connecticut Library started harvesting in mid 2014 Notable increases in access to digital content since harvest (one of many factors) Access statistics available at CTDA Statistics
University of Connecticut on Research It - EBSCOhost
Harvesting Top Takeaways
Top Takeaways -
Data Providers
● Server Load/ Application Load
● Permissions / Copyright
● Relationships with Service
Providers ● Repository Buy-in
● Increased Discovery
● Metadata Normalization
Top Takeaways - Service
Providers
● Knowledge of ○ XSLT, ○ OAI-PMH, and ○ Metadata Schema Knowledge
(DC, MODS, QDC, MARC XML)
● Technical staff to set-up and maintain the aggregator & write scripts to transform harvested metadata
● Relationships with Data Providers
Harvesting Discussion
Discussion
● What are your biggest challenges?
● What Resources do you find helpful?
● What was your AH HA! moment?
● What was most useful in this presentation?
Harvesting Demonstration
Demonstration
To follow along or try it at home, navigate to….
http://sandbox.discoverygarden.ca/ OR
http://islandora.ca/downloads Click Islandora > Islandora Utility Modules > Islandora OAI
Questions? Contact us at: [email protected]