Upload
russell-odom
View
58
Download
2
Embed Size (px)
DESCRIPTION
PDF and Long Term Preservation. May 17, 2005 Susan J. Sullivan, CRM [email protected]. Introduction. Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF) - PowerPoint PPT Presentation
Citation preview
Introduction
• Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF)– Explain why long term preservation of electronic documents in PDF
is an issue
– Describe the draft PDF/A ISO Standard in the context of NARA’s PDF Transfer Guidance for permanent records in PDF…including NARA’s expectations for PDF/A
– Explain the roles of both PDF/A and the PDF Transfer Guidance in Federal recordkeeping
– Provide an overview of PDF/A and its status in the ISO Process
– Quiz at the end (group participation)
Background – Wide Use of PDF
• PDF is a digital format that electronically reproduces the visual appearance of documents whether they are:
– Converted from other electronic formats, or
– Digitized from paper or microform
• Businesses, governments, libraries, archives, and other institutions and individuals around the world use PDF to:
– Collect and disseminate information over the Internet,
– Store electronic records, and/or
– Make scanned images searchable by embedding OCR’d text.
• As a result, large bodies of important information are maintained in PDF.
Background – PDF Not a Suitable Archival Format
• PDF itself is not suitable as an archival format. – Adobe is under no obligation to continue publishing the
specification for future versions– Can include features incompatible with current archival requirements
• Encryption• Embedded files
– PDF documents not necessarily self-contained• Can depend on system fonts and other content drawn from outside
the file– Multiple PDF development tools on the market
• Inconsistency in the file format (all PDFs are not created equal)• Long-term solution needed to ensure that digital PDF
documents remain accessible for long periods of time – Permanent archival records, in some cases– Administrative Office of U.S. Courts initiated idea for PDF/A
Background – Example Business Case for Long Term Preservation of PDF
• Administrative Office of the U.S. Courts (AOUSC)
– Uses PDF as the electronic format for Electronic Case Filing System
– System accepts filings and provides access to filed PDF documents over the Internet
– Many AOUSC files must be maintained for long periods of time (e.g., 40 years)
Some will be transferred to the National Archives for permanent retention
– Future use of and access to the AOUSC’s PDF documents depends on maintaining the ability to reproduce their visual appearance and other properties over the long term (i.e., across multiple generations of technology)
Background - How NARA is Addressing PDF
Issued PDF Transfer Guidance• NARA partnered with Federal agencies and issued
guidance allowing transfer of permanent records in PDF to NARA (March 2003)– Part of Electronic Records Management E-Gov Initiative– Agency partners identified PDF as one of six priority records
transfer formats
Participating in PDF/A ISO Standard Development• NARA is participating in PDF/A development…
– To influence the process so that PDF/A compliant records can be preserved by NARA over the long term, and
– To provide information used in developing/maintaining NARA guidance for transferring permanent records in PDF
Background - Transfer Format versus File Format
Goal:To ensure that valuable electronic information in PDF is not lost
Purpose:• Transfer Format - NARA’s PDF Transfer Guidance
– Specifies requirements for transferring permanent records in PDF to NARA
– Applies to existing and future records in PDF so that NARA can accept and process these records
• File Format - The PDF/A ISO Standard (PDF/A)– Specifies a file format, based on PDF, that is more suitable than
PDF for long term preservation – Will allow PDF records to be maintained longer as PDF (e.g.,
within agencies)
Scope and Usage - NARA’s PDF Transfer Guidance• Scope
– Applies to records scheduled as permanent– Supplements existing Federal regulations– Covers existing and future electronic records meeting transfer
requirements, but….• Unique circumstances• NARA will work with agencies through their Appraisal Archivist to
ensure that valuable electronic records are not lost
• Usage– Agencies will use NARA’s PDF Transfer Guidance to
Transfer existing permanent PDF records to NARA
Scope and Usage - PDF/A ISO Standard
• Scope– Defines a file format based on PDF, that preserves the visual
appearance of electronic documents over time• Provides a framework for recording and embedding metadata within
PDF files
• Defines a framework for representing the logical structure and other semantic information of electronic documents within PDF/A files
• Usage– Vendors will use the PDF/A Standard to develop applications
that read, write, and otherwise process conforming PDF/A files– Agencies will use PDF/A applications to create and process
PDF/A conformant files• As part of their strategy for long term preservation of electronic
records
• In conjunction with PDF transfer guidance for transferring permanent records to NARA, as applicable
Scope and Usage - Summary
NARA’s PDF Transfer Guidance• Applies to records in PDF scheduled as permanent
• Incorporates file format(s) (e.g., PDF 1.0 - 1.4),
• Incorporates quality criteria, laws and regulations, transfer
documentation, NARA contact information PDF/A ISO Standard • Addresses one aspect of the long term preservation of electronic
records in PDF (i.e., file format)
• Should be used as one component of an organization’s electronic archival environment
• Implementation depends on:– Records management policies and procedures – Additional requirements and conditions necessary to ensure the
persistence of electronic documents over time (e.g., including PDF Transfer Guidance).
– Quality assurance processes necessary to verify conformance with requirements
Requirements - PDF/A and NARA’s PDF Transfer Guidance
Embedded fonts • PDF/A and NARA’s PDF Transfer Guidance both require that all
referenced fonts be embedded– For documents created before 4/1/04, NARA accepts PDFs
that do not have all fonts embedded (i.e., base 14 - resident in operating system)
Encryption • PDF/A and NARA’s PDF Transfer Guidance both prohibit
encryption– For documents created before 4/1/04, NARA accepts PDFs
with encryption that does not prevent opening, viewing, printing
Special Features• PDF/A restricts special features
– Embedded files, external links, Java Script– PDF/A promotes tagged PDF as a higher level of conformance
• NARA’s PDF Transfer Guidance evaluates special features on a case-by-case basis at the time of scheduling– To evaluate recordkeeping implications and to ensure valuable
records are not lost
Metadata/Documentation • PDF/A requires that embedded metadata must be in Adobe
eXtensible Metadata Platform (XMP)• NARA’s PDF Transfer Guidance requires transfer documentation
(e.g., SF-258), and would evaluate embedded metadata during the scheduling process
Requirements - PDF/A and NARA’s PDF Transfer Guidance
Quality Requirements• PDF/A as a file format does not address
quality/creation requirements– Includes recommended guidelines for exact
replication of source material in Informative Annex B– Agencies must implement the guidelines of
Informative Annex B to comply with NARA’s PDF transfer guidance
• NARA’s PDF Transfer Guidance requires minimum scanning quality, prohibits lossy compression and substitution of bitmapped characters with OCR’d text
Requirements - PDF/A and NARA’s PDF Transfer Guidance
NARA’s Expectations for PDF/A
– PDF/A should address some existing archival issues with PDF and enable records in PDF to be maintained for longer periods of time in that format
• Standard maintained by external international organization, not just vendors
• Increased degree of format reliability/decrease in “bells & whistles”
– Agencies will need to implement PDF/A in conjunction with records management policies and procedures and any additional requirements and conditions necessary to ensure the persistence of electronic documents over time
• Examples
– NARA’s PDF Transfer Guidance
– AOUSC’s document management program
PDF/A ISO Process – International Joint Working Group
ISO Joint Working Group (JWG) - PDF/A
TC/46Information &
Documentation
TC/130Graphics
Technology
TC/46 SC11Archives/
Records Mgmt
TC/171*Document Imaging
Applications
TC/171 SC 2Application
Issues
TC/171 SC 2 WG-5PDF/A
PDF/A JWG
TC/42Photography
* JWG formed under the auspices of TC/171
PDF/A ISO Process – Progress and Next Steps
• Early 2002 PDF/A development initiated
• September 2003 Approval of ISO New Work Item (NWI)
• October 2003 TC-171 Meeting - JWG prepared Committee Draft (CD)
• November 2003/February 2004 - CD ballot circulated to National Bodies (NBs)
• March 2004 - JWG reviewed NB comments on CD
• June 2004/September 2004 - Second CD ballot circulated to NBs
• October 2004 - JWG Meeting - JWG prepared Draft International Standard (DIS)
• Winter/Spring 2005 - DIS Balloted to National Bodies– Unanimous affirmative votes - Goes to publication
– Up to 25% negative Votes – Goes to FDIS, then 1 month ballot
• Summer 2005 - TC-171 Meeting - JWG meeting to deal with DIS comments and discuss new work
• Summer - 2005 International Standard/FDIS?
• Software developers create PDF/A compliant applications
PDF/A - Approach
• PDF/A specifies:– The subset of PDF components, from the Adobe published
specification for Version 1.4 (i.e., PDF 1.4 Reference), that are either required, restricted, or prohibited, and
– How these components may be used by software to render the file
PDF/A
PDF 1.4 Reference
Specifies required featuresSpecifies restricted features
Specifies prohibited features
PDF/A - Requirements
• Prohibit or restrict features that could complicate long term preservation, and
• Maximize the following PDF attributes:– Device independence
The degree to which a PDF/A file is independent of the platform on which it is interpreted and rendered
The degree to which a PDF/A file is amenable to direct analysis with basic tools, including human readability
– Self-containment The degree to which a PDF/A file contains all resources necessary for
its reliable and predictable interpretation and rendering– Self-documentation
The degree to which a PDF/A file documents itself in terms of descriptive, administrative, structural, and technical metadata
PDF/A - Table of Contents
• 1 Scope• 2 Normative References• 3 Terms and Definitions• 4 Notation • 5 Conformance Levels• 6 Technical Requirements
– 6.1 File Structure– 6.2 Graphics– 6.3 Fonts
– 6.4 Transparency– 6.5 Annotations– 6.6 Actions– 6.7 Metadata– 6.8 Logical Structure– 6.9 Interactive Forms
• Informative annexes
– Annex A - PDF/A-1 Conformance Summary
– Annex B - Best Practices for PDF/A
• Bibliography
Annexes of the Draft PDF/A Standard – Informative Annexes
• Informative Annexes will provide supplemental information including:– PDF/A-1 Conformance Summary
• Summary tables of PDF objects and keys required, restricted and prohibited in PDF/A
– Best Practices for PDF/A• Guidelines for capturing or converting electronic documents to
PDF/A– For documents created according to specific institutional rules – Replicates the exact quality and content of source documents
within the PDF/A file• Required for compliance with NARA’s PDF Transfer Guidance
PDF/A - Overview of Requirements
• Two levels of conformance– Level A (e.g., Tagged PDF, UNICODE Mapping)– Level B (e.g. No Tagged PDF)
• Uniform file format (header, trailer, no encryption)• Device-independent rendering of graphics• Embedded fonts, character encoding• Transparency prohibited• Annotations restricted, content should be displayed by
readers• External actions restricted, no dependence on external
content • Readers not required to act on hyperlinks, but may• XMP metadata “Adobe XML Metadata Framework” • Forms based on appearance, not data
Quiz - True or False?
• The draft PDF/A ISO Standard… – Provides quality standards for converting electronic
documents to PDF• False
– Should enable electronic documents in PDF to be maintained longer as PDF• True
– Is intended for use as one component of an organization's electronic archival environment for long-term retention of documents• True
• For permanent records in PDF, agencies need to understand that:– Records in PDF/A are guaranteed to be readable forever
• False– PDF/A, by itself, does not guarantee exact replication of source
material• True
– Agencies must implement PDF/A in conjunction with additional requirements to meet NARA standards for transferring permanent records to NARA (i.e., NARA’s PDF Transfer Guidance)
• True• Everyone is now excited to learn more about PDF…..
– True!
Quiz - True or False?
More Information is Available
• More information on NARA’s PDF Transfer Guidance on NARA’s Web Site– http://www.archives.gov/records_management/initiatives/
pdf_records.html
• More information on PDF/A on AIIM Web Site– http://www.aiim.org/standards.asp?ID=25013
• Contact Susan Sullivan at [email protected]