25
PDF and Long Term Preservation May 17, 2005 Susan J. Sullivan, CRM [email protected]

PDF and Long Term Preservation

Embed Size (px)

DESCRIPTION

PDF and Long Term Preservation. May 17, 2005 Susan J. Sullivan, CRM [email protected]. Introduction. Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF) - PowerPoint PPT Presentation

Citation preview

PDF and Long Term Preservation

May 17, 2005

Susan J. Sullivan, CRM

[email protected]

Introduction

• Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF)– Explain why long term preservation of electronic documents in PDF

is an issue

– Describe the draft PDF/A ISO Standard in the context of NARA’s PDF Transfer Guidance for permanent records in PDF…including NARA’s expectations for PDF/A

– Explain the roles of both PDF/A and the PDF Transfer Guidance in Federal recordkeeping

– Provide an overview of PDF/A and its status in the ISO Process

– Quiz at the end (group participation)

Background – Wide Use of PDF

• PDF is a digital format that electronically reproduces the visual appearance of documents whether they are:

– Converted from other electronic formats, or

– Digitized from paper or microform

• Businesses, governments, libraries, archives, and other institutions and individuals around the world use PDF to:

– Collect and disseminate information over the Internet,

– Store electronic records, and/or

– Make scanned images searchable by embedding OCR’d text.

• As a result, large bodies of important information are maintained in PDF.

Background – PDF Not a Suitable Archival Format

• PDF itself is not suitable as an archival format. – Adobe is under no obligation to continue publishing the

specification for future versions– Can include features incompatible with current archival requirements

• Encryption• Embedded files

– PDF documents not necessarily self-contained• Can depend on system fonts and other content drawn from outside

the file– Multiple PDF development tools on the market

• Inconsistency in the file format (all PDFs are not created equal)• Long-term solution needed to ensure that digital PDF

documents remain accessible for long periods of time – Permanent archival records, in some cases– Administrative Office of U.S. Courts initiated idea for PDF/A

Background – Example Business Case for Long Term Preservation of PDF

• Administrative Office of the U.S. Courts (AOUSC)

– Uses PDF as the electronic format for Electronic Case Filing System

– System accepts filings and provides access to filed PDF documents over the Internet

– Many AOUSC files must be maintained for long periods of time (e.g., 40 years)

Some will be transferred to the National Archives for permanent retention

– Future use of and access to the AOUSC’s PDF documents depends on maintaining the ability to reproduce their visual appearance and other properties over the long term (i.e., across multiple generations of technology)

Background - How NARA is Addressing PDF

Issued PDF Transfer Guidance• NARA partnered with Federal agencies and issued

guidance allowing transfer of permanent records in PDF to NARA (March 2003)– Part of Electronic Records Management E-Gov Initiative– Agency partners identified PDF as one of six priority records

transfer formats

Participating in PDF/A ISO Standard Development• NARA is participating in PDF/A development…

– To influence the process so that PDF/A compliant records can be preserved by NARA over the long term, and

– To provide information used in developing/maintaining NARA guidance for transferring permanent records in PDF

Background - Transfer Format versus File Format

Goal:To ensure that valuable electronic information in PDF is not lost

Purpose:• Transfer Format - NARA’s PDF Transfer Guidance

– Specifies requirements for transferring permanent records in PDF to NARA

– Applies to existing and future records in PDF so that NARA can accept and process these records

• File Format - The PDF/A ISO Standard (PDF/A)– Specifies a file format, based on PDF, that is more suitable than

PDF for long term preservation – Will allow PDF records to be maintained longer as PDF (e.g.,

within agencies)

Scope and Usage - NARA’s PDF Transfer Guidance• Scope

– Applies to records scheduled as permanent– Supplements existing Federal regulations– Covers existing and future electronic records meeting transfer

requirements, but….• Unique circumstances• NARA will work with agencies through their Appraisal Archivist to

ensure that valuable electronic records are not lost

• Usage– Agencies will use NARA’s PDF Transfer Guidance to

Transfer existing permanent PDF records to NARA

Scope and Usage - PDF/A ISO Standard

• Scope– Defines a file format based on PDF, that preserves the visual

appearance of electronic documents over time• Provides a framework for recording and embedding metadata within

PDF files

• Defines a framework for representing the logical structure and other semantic information of electronic documents within PDF/A files

• Usage– Vendors will use the PDF/A Standard to develop applications

that read, write, and otherwise process conforming PDF/A files– Agencies will use PDF/A applications to create and process

PDF/A conformant files• As part of their strategy for long term preservation of electronic

records

• In conjunction with PDF transfer guidance for transferring permanent records to NARA, as applicable

Scope and Usage - Summary

NARA’s PDF Transfer Guidance• Applies to records in PDF scheduled as permanent

• Incorporates file format(s) (e.g., PDF 1.0 - 1.4),

• Incorporates quality criteria, laws and regulations, transfer

documentation, NARA contact information PDF/A ISO Standard • Addresses one aspect of the long term preservation of electronic

records in PDF (i.e., file format)

• Should be used as one component of an organization’s electronic archival environment

• Implementation depends on:– Records management policies and procedures – Additional requirements and conditions necessary to ensure the

persistence of electronic documents over time (e.g., including PDF Transfer Guidance).

– Quality assurance processes necessary to verify conformance with requirements

Requirements - PDF/A and NARA’s PDF Transfer Guidance

Embedded fonts • PDF/A and NARA’s PDF Transfer Guidance both require that all

referenced fonts be embedded– For documents created before 4/1/04, NARA accepts PDFs

that do not have all fonts embedded (i.e., base 14 - resident in operating system)

Encryption • PDF/A and NARA’s PDF Transfer Guidance both prohibit

encryption– For documents created before 4/1/04, NARA accepts PDFs

with encryption that does not prevent opening, viewing, printing

Special Features• PDF/A restricts special features

– Embedded files, external links, Java Script– PDF/A promotes tagged PDF as a higher level of conformance

• NARA’s PDF Transfer Guidance evaluates special features on a case-by-case basis at the time of scheduling– To evaluate recordkeeping implications and to ensure valuable

records are not lost

Metadata/Documentation • PDF/A requires that embedded metadata must be in Adobe

eXtensible Metadata Platform (XMP)• NARA’s PDF Transfer Guidance requires transfer documentation

(e.g., SF-258), and would evaluate embedded metadata during the scheduling process

Requirements - PDF/A and NARA’s PDF Transfer Guidance

Quality Requirements• PDF/A as a file format does not address

quality/creation requirements– Includes recommended guidelines for exact

replication of source material in Informative Annex B– Agencies must implement the guidelines of

Informative Annex B to comply with NARA’s PDF transfer guidance

• NARA’s PDF Transfer Guidance requires minimum scanning quality, prohibits lossy compression and substitution of bitmapped characters with OCR’d text

Requirements - PDF/A and NARA’s PDF Transfer Guidance

NARA’s Expectations for PDF/A

– PDF/A should address some existing archival issues with PDF and enable records in PDF to be maintained for longer periods of time in that format

• Standard maintained by external international organization, not just vendors

• Increased degree of format reliability/decrease in “bells & whistles”

– Agencies will need to implement PDF/A in conjunction with records management policies and procedures and any additional requirements and conditions necessary to ensure the persistence of electronic documents over time

• Examples

– NARA’s PDF Transfer Guidance

– AOUSC’s document management program

PDF/A ISO Process – International Joint Working Group

ISO Joint Working Group (JWG) - PDF/A

TC/46Information &

Documentation

TC/130Graphics

Technology

TC/46 SC11Archives/

Records Mgmt

TC/171*Document Imaging

Applications

TC/171 SC 2Application

Issues

TC/171 SC 2 WG-5PDF/A

PDF/A JWG

TC/42Photography

* JWG formed under the auspices of TC/171

PDF/A ISO Process – Progress and Next Steps

• Early 2002 PDF/A development initiated

• September 2003 Approval of ISO New Work Item (NWI)

• October 2003 TC-171 Meeting - JWG prepared Committee Draft (CD)

• November 2003/February 2004 - CD ballot circulated to National Bodies (NBs)

• March 2004 - JWG reviewed NB comments on CD

• June 2004/September 2004 - Second CD ballot circulated to NBs

• October 2004 - JWG Meeting - JWG prepared Draft International Standard (DIS)

• Winter/Spring 2005 - DIS Balloted to National Bodies– Unanimous affirmative votes - Goes to publication

– Up to 25% negative Votes – Goes to FDIS, then 1 month ballot

• Summer 2005 - TC-171 Meeting - JWG meeting to deal with DIS comments and discuss new work

• Summer - 2005 International Standard/FDIS?

• Software developers create PDF/A compliant applications

PDF/A - Approach

• PDF/A specifies:– The subset of PDF components, from the Adobe published

specification for Version 1.4 (i.e., PDF 1.4 Reference), that are either required, restricted, or prohibited, and

– How these components may be used by software to render the file

PDF/A

PDF 1.4 Reference

Specifies required featuresSpecifies restricted features

Specifies prohibited features

PDF/A - Requirements

• Prohibit or restrict features that could complicate long term preservation, and

• Maximize the following PDF attributes:– Device independence

The degree to which a PDF/A file is independent of the platform on which it is interpreted and rendered

The degree to which a PDF/A file is amenable to direct analysis with basic tools, including human readability

– Self-containment The degree to which a PDF/A file contains all resources necessary for

its reliable and predictable interpretation and rendering– Self-documentation

The degree to which a PDF/A file documents itself in terms of descriptive, administrative, structural, and technical metadata

PDF/A - Table of Contents

• 1 Scope• 2 Normative References• 3 Terms and Definitions• 4 Notation • 5 Conformance Levels• 6 Technical Requirements

– 6.1 File Structure– 6.2 Graphics– 6.3 Fonts

– 6.4 Transparency– 6.5 Annotations– 6.6 Actions– 6.7 Metadata– 6.8 Logical Structure– 6.9 Interactive Forms

• Informative annexes

– Annex A - PDF/A-1 Conformance Summary

– Annex B - Best Practices for PDF/A

• Bibliography

Annexes of the Draft PDF/A Standard – Informative Annexes

• Informative Annexes will provide supplemental information including:– PDF/A-1 Conformance Summary

• Summary tables of PDF objects and keys required, restricted and prohibited in PDF/A

– Best Practices for PDF/A• Guidelines for capturing or converting electronic documents to

PDF/A– For documents created according to specific institutional rules – Replicates the exact quality and content of source documents

within the PDF/A file• Required for compliance with NARA’s PDF Transfer Guidance

PDF/A - Overview of Requirements

• Two levels of conformance– Level A (e.g., Tagged PDF, UNICODE Mapping)– Level B (e.g. No Tagged PDF)

• Uniform file format (header, trailer, no encryption)• Device-independent rendering of graphics• Embedded fonts, character encoding• Transparency prohibited• Annotations restricted, content should be displayed by

readers• External actions restricted, no dependence on external

content • Readers not required to act on hyperlinks, but may• XMP metadata “Adobe XML Metadata Framework” • Forms based on appearance, not data

Quiz - True or False?

• The draft PDF/A ISO Standard… – Provides quality standards for converting electronic

documents to PDF• False

– Should enable electronic documents in PDF to be maintained longer as PDF• True

– Is intended for use as one component of an organization's electronic archival environment for long-term retention of documents• True

• For permanent records in PDF, agencies need to understand that:– Records in PDF/A are guaranteed to be readable forever

• False– PDF/A, by itself, does not guarantee exact replication of source

material• True

– Agencies must implement PDF/A in conjunction with additional requirements to meet NARA standards for transferring permanent records to NARA (i.e., NARA’s PDF Transfer Guidance)

• True• Everyone is now excited to learn more about PDF…..

– True!

Quiz - True or False?

More Information is Available

• More information on NARA’s PDF Transfer Guidance on NARA’s Web Site– http://www.archives.gov/records_management/initiatives/

pdf_records.html

• More information on PDF/A on AIIM Web Site– http://www.aiim.org/standards.asp?ID=25013

• Contact Susan Sullivan at [email protected]

Questions/Discussion