27
1 Preservation Metadata: an introduction to PREMIS and its application in audiovisual archives Karin Bredenberg, The National Archives of Sweden Member of PREMIS Editorial Comittee 2013-05-16

Presentation 16 may keynote karin bredenberg

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Presentation 16 may keynote karin bredenberg

1

Preservation Metadata: an introduction to PREMIS and

its application in audiovisual archives

Karin Bredenberg, The National Archives of Sweden

Member of PREMIS Editorial Comittee

2013-05-16

Page 2: Presentation 16 may keynote karin bredenberg

2

The Challenge of Digital Preservation

Page 3: Presentation 16 may keynote karin bredenberg

3

How to access the material 1 month, 1 year, 10 years from now?

• Information about the material– Intellectual information (Descriptive Metadata)

• Who

• Why

• and so on

– ”Physical” information (Digital Preservation Metadata)

• Which kind of file am I?

• What has happened to me during the years?

• Who can look at me?

• And so on

Metadata = data about data

Digital Preservation Metadata =

metadata that is essential to ensure long-

term accessibility of digital resources

Page 4: Presentation 16 may keynote karin bredenberg

4

• A best guess on the future– little experience validating the longevity of digital objects

– uncertain future technical possibilities

– uncertain future legal framework

• Digital objects must be self-descriptive

• Must be able to exist independently from the systems which were used to create them

– XML (machine and human readable)

What Digital Preservation Metadata to store?

Page 5: Presentation 16 may keynote karin bredenberg

5

OAISOpen Archival Information System or also the ISO OAIS Reference Model for an OAIS

(A simple OAIS explanation by

Richard Pearce-Moses and more)

Page 6: Presentation 16 may keynote karin bredenberg

6

The PREMIS Data Dictionary

• Information you need to know for preserving digital objects

• Available on line through the PREMIS website

• Preservation Metadata: Implementation Strategies– Includes PREMIS Data Dictionary, context/assumptions, data

model, usage examples

– XML schema to support implementation

Page 7: Presentation 16 may keynote karin bredenberg

7

PREMIS Web and PREMIS EC

• Web site:– Permanent Web presence

(http://www.loc.gov/standards/premis/ ), hosted by Library of Congress

– Central destination for PREMIS-related info, announcements, resources

– Home of the PREMIS Implementers’ Group (PIG) discussion list ([email protected])

• PREMIS Editorial Committee:– Set directions/priorities for PREMIS development

– Considers proposals for changes

– Coordinates revisions of Data Dictionary and XML schema

– Consists of members with different affiliations from all over the world.

– Meetings once a month (sometimes more)

– Hosts PREMIS events eg PREMIS Implementation Fair at iPRES

Page 8: Presentation 16 may keynote karin bredenberg

8

OAIS Reference Model and PREMIS

• OAIS reference model specifies the Preservation Description Information (PDI)

• PREMIS used the OAIS information model as a starting point

• PREMIS Data Dictionary consolidated and further developed the conceptual types of information objects into more than 100 structured and logically integrated semantic units.

• PREMIS Data Dictionary provided detailed descriptions and guidelines to implement these semantic units.

• PREMIS Data Dictionary does not provide semantic units for Intellectual Entities, but provides semantic units to link to other metadata sources for Intellectual Entities (this will change in version 3)

• All entities have reference (identification) information.

• No “packaging information” that links content with metadata, but PREMIS can be used with container schemas

• PREMIS deals mostly with representation, context, provenance, and fixity information, in keeping with PREMIS definition of preservation metadata.

Page 9: Presentation 16 may keynote karin bredenberg

9

The PREMIS data model: 5 interacting entitiesIntellectual

Entity

Object

Event

Agent

Rights

identifier

Page 10: Presentation 16 may keynote karin bredenberg

10

1.8.1 environmentCharacteristic

1.8.2 environmentPurpose

1.8.3 environmentNote

1.8.4 dependency

1.8.5 software

1.8.6 hardware

1.8.7 enviromentExtension

For Example: Object Entity semantic units

1.5.1 compositionLevel

1.5.2 fixity

1.5.3 size

1.5.4 format

1.5.5 creatingApplication

1.5.6 inhibitors

1.5.7 objectCharacteristicsExtension

1.1 objectIdentifier

1.2 objectCategory

1.3 preservationLevel

1.4 significantProperties

1.5 objectCharacteristics

1.6 originalname

1.7 storage

1.8 enviroment

1.9 signatureInformation

1.10 relationsship

1.11 linkingEventIdentifier

1.12 linkingIntellectualIdentifier

1.13 linkingRightsStatementIdentifier

Page 11: Presentation 16 may keynote karin bredenberg

11

Sample Data Dictionary EntrySemantic unit size Semantic components

None

Definition The size in bytes of the file or bitstream stored in the repository.

Rationale Size is useful for ensuring the correct number of bytes from storage have been retrieved and that an application has enough room to move or process files. It might also be used when billing for storage.

Data constraint Integer Object category Representation File Bitstream

Applicability Not applicable Applicable Applicable Examples 2038927 Repeatability Not repeatable Not repeatable Obligation Optional Optional Creation/ Maintenance notes

Automatically obtained by the repository.

Usage notes Defining this semantic unit as size in bytes makes it unnecessary to record a unit of measurement. However, for the purpose of data exchange the unit of measurement should be stated or understood by both partners.

Page 12: Presentation 16 may keynote karin bredenberg

12

• What PREMIS DD is:

– Common data model for organizing/thinking about preservation metadata

– Implementable

– Standard for exchanging information packages between repositories

– Technically neutral

– Core metadata

Scope

Page 13: Presentation 16 may keynote karin bredenberg

13

• What PREMIS DD is not:

– Out-of-the-box solution

– All needed metadata

– Lifecycle management of objects outside repository

– Rights management

Scope

Page 14: Presentation 16 may keynote karin bredenberg

14

Technology Dependence

0001.tiff 0002.tiff 0003.tiff 0004.tiff 000156.tiff0005.tiff 0006.tiff

No direct access • Not self-descriptive

• Complex formats

Complex environments

digital

Page 15: Presentation 16 may keynote karin bredenberg

15

Information packages

• Information about owner; what the package is and more

• The files, checksum, filenam, use and more

• Technical information like Digital Preservation Metadata, what has happend to the files and more

– need for detailed rendering information

» Software

» Hardware

» Other dependencies: schemas, style sheets, encodings, etc.

– need for format information

• Information about structure, how are the files related?

Page 16: Presentation 16 may keynote karin bredenberg

16

Standards for Information Packages

• One commonly used standard is METSMetadata Encoding and Transmission Standard

• PREMIS can be used togehter with METS

<metsHdr>

<dmdSec>

<amdSec>

<fileSec>

<structMap>

<structLink>

<behaviorSec>

<mets>

mets Header

descriptive metadata Section

administrative metadata Section

file Section

structural Map section

structural Link section

behavior Section

Page 17: Presentation 16 may keynote karin bredenberg

17

Technical metadata for audio and video

• A “new” need, objects now created digitally and digitization has increased

• Not as fast developed as other technical metadata schemes

• Complexities of file formats require expertise to develop and implement these

• Few standards available for metadata about audio and video– AES (will be briefly introduced)

– audioMD and videoMD (will be briefly introduced)

– Material Exchange Format (MXF)

– Technical metadata in EBUCore, PBCore

– In US the Federal Agencies Digitization Guidelines Initiative (FAGDI)

– MPEG-7 and MPEG-21 for video

• Programs creating audio and/or video often can export metadata.Question: Is this exported information sufficient?Answer: Needs to be evaluated at the archives and a decision taken!

Page 18: Presentation 16 may keynote karin bredenberg

18

AES

• Audio Engineering Society (http://www.aes.org/ )

• AES-X098B supersided by:

– AES57-2011-f (2011)AES standard for audio metadata - Audio object structures for preservation and restoration

– AES60-2011-f (2011)AES standard for audio metadata - Core audio metadata

• Two XML schemas available

• According to earlier know information 98C (video) was planned to be made after 98B had been established

• Some educational orientated presentations can be found.

Page 19: Presentation 16 may keynote karin bredenberg

19

audioMD and videoMD (AMD and VMD)

• Hosted by Library of Congress (http://www.loc.gov/standards/amdvmd/index.html )

• Simple schemas developed during 10 years

• Current version published during spring 2011

• Information about one use case together with METS

• Mailing list exists, but rarely used

• Archives interested in using not too complex schemas for preservation purposes

Page 20: Presentation 16 may keynote karin bredenberg

20

Tools

• PREMIS in METS toolbox

• The controlled vocabularies database

• Some institutions are making repository software available that implements PREMIS– DAITSS Digital Preservation Repository Software

– Archivematica

Page 21: Presentation 16 may keynote karin bredenberg

21

The controlled vocabularies database

• Library of Congress is establishing databases with controlled vocabulary values for standards that it maintains

• http://id.loc.gov

• Now also specific vocabularies for PREMIS semantic units: preservationLevelRole, cryptographicHashAlgorithm, eventType

• Additional PREMIS controlled lists to be made available with the PREMIS OWL ontology

Page 22: Presentation 16 may keynote karin bredenberg

22

PREMIS Web Ontology Language (OWL) ontology

• Initiated by the Archipel project to use PREMIS in Open Archives Initiative

Object Reuse and Exchange (OAI-ORE)(description/exchange of Web resources)

• Resource Description Framework (RDF) serialization of preservation metadata as a data management function in a preservation repository

• Interoperate with other preservation Linked Data efforts such as UDFR (Unified Digital Formats Registry)

• Interoperate with PREMIS controlled vocabularies at http://id.loc.gov

Page 23: Presentation 16 may keynote karin bredenberg

23

PREMIS OWL ontology in a nutshell

• Purpose– Providing the community with an RDF serialization of

the PREMIS data model and dictionary

– While remaining as close as possible to the data dictionary’s clearly defined semantics

RDF modelling in 3 words:

• Everything modelled under the form of subject-verb-object

• But what objects? what verbs? what objects? � role of vocabularies & ontologies

Page 24: Presentation 16 may keynote karin bredenberg

24

Implementation issues: Conformance

• Conformant Implementation of the PREMIS Data Dictionary http://www.loc.gov/standards/premis/premis-

conformance-oct2010.pdf

• What does "being conformant to PREMIS" mean?

• Conformant at which level?– semantic unit: conformant implementation of the

information defined in a particular semantic unit

– data dictionary: conformant implementation of all semantic units

• Conformant from what perspective?– internal: conformant implementation at semantic units and

data dictionary levels

– external (exchanging PREMIS descriptions):import = the repository can manage PREMIS conformant informationexport = the repository can provide others with PREMIS

Page 25: Presentation 16 may keynote karin bredenberg

25

Implementation issues: Technical

• Which semantic units to use besides the mandatory?

• Create own vocabularys?

• Where to store the metadata?– In an XML-document?

– In one or more databases?

• Which event to store?

• How to store agents, rights management?

• In short:A lot of descision making needs to be preformed!

Page 26: Presentation 16 may keynote karin bredenberg

26

Conclusion

• Using PREMIS as the basis for digital preservation metadata is widely implemented

• Both IT and the archives need to work together.Different kind of expertise.

• Complexities of audio and video require increased need for technical and structural metadata

• Increasing use of digital preservation metadata for archiving audio and video is expected

• Examples of use of PREMIS together with audio and video metadata is needed

Page 27: Presentation 16 may keynote karin bredenberg

27

Thank you!

Karin Bredenberg, The National Archives of Sweden

[email protected]

Presentation made with the help of:

Angela Dappert

Sébastien Peyrard

Rebecca Guenther