Introduction to PDF/A
David van Driessche Chief Technology Officer, Four Pees Treasurer, GWG
The Problem
Better when electronic?
• I made my university thesis in 1994 - MS-Dos 3.31 & Windows for Workgroups 3.11 - WordPerfect 5 & Word for Windows 6.0 - Saved on a floppy-disk - Backup on an Iomega ZIP disk
• That is only 19 years ago…
What is long enough?
• For businesses in Belgium - Direct tax documents: 5 years - Added-value tax: 7 years - Medical records for employees: 15 years
• For engineers - Lifetime of a building / construction
• For libraries - Forever…
The Solution!
Why is this the solution?
• Invented by Adobe in 1993 - Originally as electronic product documentation
and Internet format - But rapid adoption elsewhere from 1996
• A good format! - Focuses on exact visual representation - Printer and platform independent - Compact and complete - Random access
And standardized!
• Adopted by ISO - ISO 32000
• So also vendor independent!
Oooops
• Missing fonts • Font problems • Complex features • Incorrect file structure • Corrupt PDF files • …
• Room for interpretation!
The Solution!
ISO PDF/A
• A subset of the PDF format • Developed and maintained by ISO • Designed to remove ambiguities from the
PDF file format - With the aim to create documents that are
archivable for 50 years (at least) • An ISO standard that will never expire!
Vanilla or Strawberry?
• Different archives require different things…
• Different flavors of the PDF/A standard cater to that: - PDF/A-1b, the basic flavor
• Guarantees visual reproduction - PDF/A-1a, the accessible (or advanced) flavor
• Incorporates all requirements of the basic flavor • Also focuses on the meaning of the document content
PDF/A-1b
Parijs Visual reproduction
PDF/A-1a
Parijs Visual reproduction +
Meaning
PDF/A-1a, Structure
Parijs Document Title Paragraph Paragraph Paragraph
PDF/A-1a, Tagging
Parijs Description
“WWII: American soldiers watch as the Tricolor flies
from the Eiffel Tower again”
Usability / Complexity
• PDF/A-1b - Relatively easy to make - Easy to make automatically (without human
intervention) • PDF/A-1a
- Contains much more usable content -> great for searchability of archives and additional intelligence about archived pieces
- Very hard to create automatically unless source already contains a lot of information
Demo
• PDF/A-1b versus PDF/A-1a
Evolution
• PDF/A-1a and PDF/A-1b will always remain valid standards
• But new versions of the PDF/A standard have been developed as well.
PDF/A-2
• Allows more modern PDF features - Transparency - Layers (optional content) - JPEG 2000 compression
• Adds support for embedded PDF/A files • Adds a new flavor: “u” for Unicode
- Intermediate in between “a” and “b” - Requires only fonts to be correctly unicode
mapped but no structure or tagging
PDF/A-3
• Adds support for embedding arbitrary files • Examples
- An email archived with its attachments - An Excel spreadsheet archived with the original
spreadsheet embedded
Thanks! Questions?