View
424
Download
1
Category
Preview:
DESCRIPTION
Citation preview
1PalGov © 2011 1PalGov © 2011
أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy
www.egovacademy.ps
Dr. Ismail M. Romi
Palestine Polytechnic University
Tutorial II: Data Integration and Open Information Systems
Session 2
XML DTD’s
2PalGov © 2011 2PalGov © 2011
About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
University of Trento, Italy
University of Namur, Belgium
Vrije Universiteit Brussel, Belgium
TrueTrust, UK
Birzeit University, Palestine
(Coordinator )
Palestine Polytechnic University, Palestine
Palestine Technical University, PalestineUniversité de Savoie, France
Ministry of Local Government, Palestine
Ministry of Telecom and IT, Palestine
Ministry of Interior, Palestine
Project Consortium:
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.edu
3PalGov © 2011 3PalGov © 2011
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
4PalGov © 2011
Tutorial Map
Topic h
Session 1: XML Basics and Namespaces 3
Session 2: XML DTD’s 3
Session 3: XML Schemas 3
Session 4: Lab-XML Schemas 3
Session 5: RDF and RDFs 3
Session 6: Lab-RDF and RDFs 3
Session 7: OWL (Ontology Web Language) 3
Session 8: Lab-OWL 3
Session 9: Lab-RDF Stores -Challenges and Solutions 3
Session 10: Lab-SPARQL 3
Session 11: Lab-Oracle Semantic Technology 3
Session 12_1: The problem of Data Integration 1.5
Session 12_2: Architectural Solutions for the Integration Issues 1.5
Session 13_1: Data Schema Integration 1
Session 13_2: GAV and LAV Integration 1
Session 13_3: Data Integration and Fusion using RDF 1
Session 14: Lab-Data Integration and Fusion using RDF 3
Session 15_1: Data Web and Linked Data 1.5
Session 15_2: RDFa 1.5
Session 16: Lab-RDFa 3
Intended Learning Objectives
A: Knowledge and Understanding
2a1: Describe tree and graph data models.
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath.
2a4: Explain the concepts of identity management and Linked data.
2a5: Demonstrate knowledge about Integration &fusion of
heterogeneous data.
B: Intellectual Skills
2b1: Represent data using tree and graph data models (XML &
RDF).
2b2: Describe data semantics using RDFS and OWL.
2b3: Manage and query data represented in RDF, XML, OWL.
2b4: Integrate and fuse heterogeneous data.
C: Professional and Practical Skills
2c1: Using Oracle Semantic Technology and/or Virtuoso to store
and query RDF stores.
D: General and Transferable Skills2d1: Working with team.
2d2: Presenting and defending ideas.
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities.
5PalGov © 2011 5PalGov © 2011
Session ILO’s:
After completing this session students will be able to:
•Manage data represented in XML.
•Represent data using tree and graph data models.
6PalGov © 2011 6PalGov © 2011
Session2: Document Type Definition-DTD
Session Overview:
</Create DTDs>
< Validate an XML document
against a DTD />
<Use DTDs to create XML documents
from multiple files />
7PalGov © 2011 7PalGov © 2011
XML Schemas
A quality control tool.
Describes the structure of an XML document.
Ensures that a document fulfills a minimum set of
requirements.
Serve as away to formalize an application to be
publishable object.
XML schema is like a program that tells a processor how
to read the document.
8PalGov © 2011 8PalGov © 2011
A history of schema Language
1. Document Type Definition – DTD:
– The oldest and most widely supported schema language.
2. The W3C Built XML Schema:
– XML Schemas are themselves XML documents.
3. RELAX NG
4. Schemarton
9PalGov © 2011 9PalGov © 2011
Validation Steps
A "Valid" XML document is a "Well Formed" XML document, which also
conforms to the rules of a Document Type Definition.
1. The processor reads the rules and declaration in the schema.
2. Build a specific type of parser (validating parser)
3. The validating parser take an XML instance as input.
4. Produces a validation report.
10PalGov © 2011 10PalGov © 2011
Document Type Definition - DTD
Defines the legal building blocks of an XML document.
Defines the document structure with a list of legal elements and
attributes.
DTD's are extensible - meaning they can be extended to meet the
needs of the current task.
A DTD can be specified within an XML document (internal) or in a
separate file (external).
Many free DTD's exist on the internet today and can be freely
downloaded.
DTD's declare a set of allowed elements.
11PalGov © 2011 11PalGov © 2011
Document Type Definition - DTD
DTD's define a content model for each element: This
describes what elements or data can go inside an
element, in what order, in what number, and whether they
are required or optional.
DTD's declare a set of allowed attributes for each element
with data types and default values.
DTD's provide mechanisms to manage the model,
providing links to other components.
The Document Type Declaration Internal DTD declaration:
The DTD declared inside the XML file.
External DTD declaration:
The DTD declared in an external file.
12PalGov © 2011 12PalGov © 2011
Internal DTD Declaration
<!DOCTYPE root-element [element-declaration ]>
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
13PalGov © 2011 13PalGov © 2011
External DTD Declarations
You can refer to an external DTD in one of the
following two ways:
– System identifiers
– Public identifiers
14PalGov © 2011 14PalGov © 2011
External DTD Declarations using System
Identifiers
<!DOCTYPE root-element SYSTEM “system identifier” [...]>
System identifier is a file reference, consists of:
– The keyword SYSTEM
– URI reference pointing to the document‘s location.
• A URI can be a file on your local hard drive, a file on your intranet or
network, or even a file available on the Internet:
Examples:
<!DOCTYPE name SYSTEM ―/user/local/dtds/name.dtd‖ [ ]>
<!DOCTYPE name SYSTEM ―http://wiley.com/hr/name.dtd‖ [ ]>
<!DOCTYPE name SYSTEM ―name.dtd‖>
15PalGov © 2011 15PalGov © 2011
External DTD Declarations using Public
Identifiers
<!DOCTYPE root-element PUBLIC “public identifier” [...]>
Public identifiers are used to identify an entry in a catalog.
A commonly used format is called Formal Public Identifiers (FPIs).
The syntax for an FPI is defined in the document ISO9070.
FPI Syntax:
“-//Owner//Class Description//Language//Version”
Example:
<!DOCTYPE name PUBLIC ―-//Beginning XML//DTD Name Example//EN‖>
Recommended list of DOCTYPE at:
http://www.w3.org/QA/2002/04/valid-dtd-list.html
16PalGov © 2011 16PalGov © 2011
Sharing Vocabularies
It is often better to share vocabularies and use DTDs that are widely
accepted.
Sharing DTDs enables you to more easily integrate with other
companies and XML developers who use the shared vocabularies.
Many individuals and industries have developed DTDs.
Examples:
– Chemical Markup Language (CML) DTD
– XHTML, maintains three DTDs (Transitional, Strict, and Frameset).
You can check many places when trying to find a DTD for a specific
industry.
– http://xml.coverpages.org/.
– http://www.dublincore.org.
17PalGov © 2011 17PalGov © 2011
Anatomy of a DTD
DTDs consist of three basic parts:
1. Element declarations
2. Attribute declarations
3. Entity declarations
Those declarations must follow DOCTYPE
declaration as follow:
<?xml version 1.0, standalone = “yes”>
<!DOCTYPE root-element [
declarations
declarations
]>
18PalGov © 2011 18PalGov © 2011
Element Declarations
ELEMENT declaration is used to indicate to the parser that
you are about to define an element.
The declaration can appear only within the context of the
DTD.
Syntax
Element declarations consist of three basic parts:
– ELEMENT Key word (<!ELEMENT)
– Element name
– Element content model
<!ELEMENT element-name (content model)>
19PalGov © 2011 19PalGov © 2011
Element Declarations…Cont
An element‘s content model defines the allowable
content within the element.
An element may contain element children, text, a
combination of children and text, or the element
may be empty.
Four kinds of content models exist:
– Element content
– Mixed content
– Empty content
– Any content
20PalGov © 2011 20PalGov © 2011
Element Content
Include the allowable elements within
parentheses.
Example:
<!ELEMENT contact (name, location, phone)>
Each element that you specify within this
element‘s content model must also have its own
definition within the DTD.
21PalGov © 2011 21PalGov © 2011
Element Content…Cont
The processor needs this information so that it knows how
to handle each element when it is encountered.
Name in the content model must appear exactly as it will in
the document.
Ways of specifying the element children:
– Sequences
– Choices
22PalGov © 2011 22PalGov © 2011
Element Content - Sequences
The elements within these documents must appear in a
distinct order.
If your XML document were missing one of the elements
within the sequence, or if your document contained more
elements, the parser would raise an error.
If all of the specified elements were included within the
XML document but appeared in another order processor
would raise an error.
whitespace doesn‘t matter.
23PalGov © 2011 23PalGov © 2011
Element Content - Choices
Sometimes you needed to allow one element or
another, but not both.
You would need a choice mechanism of some sort.
Example:
<!ELEMENT location (address | GPS)>
This declaration would allow the <location> element to
contain one <address> or one <GPS> element.
If the <location> element were empty, or if it contained
more than one of these elements, the parser would
raise an error.
24PalGov © 2011 24PalGov © 2011
Mixed Content
The XML Recommendation specifies that any element with
text in its content is a mixed content model element.
Within mixed content models, text can appear by itself or it
can be interspersed between elements.
The simplest mixed content model—text only:
<!ELEMENT element-name (#PCDATA)>
#PCDATA keyword, (Parsed Character DATA):
– indicates that the character data within the content model
should be parsed by the parser.
– Used for text or character data.
25PalGov © 2011 25PalGov © 2011
Mixed Content - Cont
Every time you declare elements within a mixed
content model, they must follow four rules:
– They must use the choice mechanism (the vertical bar |
character) to separate elements.
– The #PCDATA keyword must appear first in the list of
elements.
– There must be no inner content models.
– If there are child elements, the * cardinality indicator
must appear at the end of the model.
26PalGov © 2011 26PalGov © 2011
Mixed Content-Example
DTD:
<!ELEMENT description (#PCDATA | em | strong | br)*>
XML Document:
<description>Jeff is a developer and author for Beginning XML <em>4th
edition</em>.<br/>Jeff <strong>loves</strong> XML!</description>
The text may appear every where, and the em, strong, br can appear
any time.
Note:
em: italic, strong:bold, br: line break
27PalGov © 2011 27PalGov © 2011
Empty Content
Empty element doesn‘t have content.
<!ELEMENT element-name EMPTY>
The most common used empty element is:
<br/> (line break).
28PalGov © 2011 28PalGov © 2011
Element with ANY content
<!ELEMENT element-name ANY>
Can contain any combination of parsable data (text, or
elements).
ANY: a keyword indicates that any elements declared
within the DTD can be used within the content of the
element and that they can be used in any order any
number of times.
29PalGov © 2011 29PalGov © 2011
Cardinality
An element‘s cardinality defines how many times it will appear within a content model.
Each element within a content model can have an
indicator following the element name that tells the parser
how many times it will appear.
30PalGov © 2011 30PalGov © 2011
Cardinality…Cont
DescriptionIndicator
when no cardinality indicator is used, it indicates
that the element must appear once and only
once.
None
Indicates that the element may appear either
once or not at all
?
Indicates that the element may appear one or
more times
+
Indicates that the element may appear zero or
more times
*
Example:
<!ELEMENT name (first+, middle?, last), Tel*>
31PalGov © 2011 31PalGov © 2011
Attribute Declarations
<!ATTLIST element-name attribute-name attribute-type ―attribute-value‖
DTD example:
<!ATTLIST payment type CDATA ―check‖>
XML example:
<payment type=―check‖>
32PalGov © 2011 32PalGov © 2011
Attribute Types
DescriptionType
Indicates that the attribute value is character data
(unparsed).
CDATA
Indicates that the attribute value uniquely identifies the
containing element.
ID
The value is the id of another element.IDREF
The value is a list of other idsIDREFS
The value is an entity ENTITY
The value is a list of entitiesENTITIES
The value is a valid XML nameNMTOKEN
The value is a list of valid XML namesNMTOKENS
The value must be an enumerated value (val1 | val2 | ….)Enumerated List
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
33PalGov © 2011 33PalGov © 2011
CDATA
• It specifies that the attribute value is character
data (any text).
• Unparsed content
DTD example: <!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
XML example:
<square width="100">
</square>
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
34PalGov © 2011 34PalGov © 2011
ID, IDREF, and IDREFS
Attributes of type ID can be used to uniquely identify an
element within an XML document.
Once you have uniquely identified the element, you can
later use an IDREF to refer to that element.
Remember several rules when using ID attributes:
– The value of an ID attribute must be unique within the entire
XML document.
– Only one attribute of type ID may be declared per element.
– The attribute value declaration for an ID attribute must be
#IMPLIED or #REQUIRED.
The value of an IDREF attribute must match the value of some ID within the XML
document.
To refer to a list of elements:
– Use an IDREFS attribute store with a list of whitespace-separated IDREF values that refer to
an ID attributes defined in the document.
35PalGov © 2011 35PalGov © 2011
ENTITY and ENTITIES
• Attributes can also include references to unparsed entities.
• An unparsed entity is an entity reference to an external file
that the processor cannot parse (external images..).
• Instead of actually including the image inside the
document, you use special attributes to refer to the
external resource.
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
36PalGov © 2011 36PalGov © 2011
Enumerated Attribute Types
• Used to restrict attribute values
• An enumerated list allows you to specify a list of allowable
values.
• Each value must be a valid XML name
• Example:
DTD:<!ATTLIST phone kind (Home | Work | Cell | Fax) #IMPLIED>
XML:
<phone kind=―Cell‖ > Valid
<phone kind=―cell‖ > Invalid
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
37PalGov © 2011 37PalGov © 2011
Attribute Value Declarations
Within each attribute declaration you must specify how
the value will appear in the document.
The XML Recommendation allows you to specify that the
attribute:
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
DescriptionValue
The attribute has a default value#DEFAULT
The attribute value must be included in the element#REQUIRED
The attribute does not have to be included#IMPLIED
The attribute value is fixed#FIXED
38PalGov © 2011 38PalGov © 2011
Specifying Multiple Attributes
Declaring each attribute:
<!ATTLIST contacts version CDATA #FIXED ―1.0‖>
<!ATTLIST contacts source CDATA #IMPLIED>
Using one declaration:
<!ATTLIST contacts version CDATA #FIXED ―1.0‖
source CDATA #IMPLIED>
39PalGov © 2011 39PalGov © 2011
Entities
• Place holder in XML
• Types:
– Built-in entities
– Character entities
– General entities
– Parameter entities
40PalGov © 2011 40PalGov © 2011
Built-in Entities
• & The & character
• < The < character
• > The > character
• ' The ‗ character
• " The ― character
41PalGov © 2011 41PalGov © 2011
References to Built-in Entities
To use an entity, you must include an entity
reference within the document.
An entity reference refers to an entity that
represents a character, some text, or even an
external file.
A reference to a built-in entity takes the following
form:
&entity-name;
Example:
<CheckAvg> Avg < ―85‖ </CheckAvg>
42PalGov © 2011 42PalGov © 2011
Character Entities
• Used for characters that are difficult to type.
• Not found on the keyboard.
&#unicode-value;
• Example:
© === character c
• Using Hexadecimal values:
• Example:
© === character cyou must include a lowercase x
before the value, so that the
XML parser knows how it
should handle the reference.
43PalGov © 2011 43PalGov © 2011
General Entities ( Internal Entities)
Variables used to define shortcuts to standard text
or special characters.
General entities must be declared within the DTD
before they can be used within the XML
document.
Declaration:
– <!ENTITY entity-name ―value‖>
Example:– <!ENTITY address ―Palestine, Hebron, POBox 198‖>
– <ppu-address> &address; </ppu-address>
DTD
XML
44PalGov © 2011 44PalGov © 2011
External Entities
• Entity whose replacement text exists in another file.
• Useful for:
– Importing content that is shared by many documents.
– Importing content that is changed frequently.
– Breaking the document into multiple physical parts.
• External entities must be declared in order to enable the
parser find the replacement text.
45PalGov © 2011 45PalGov © 2011
External Entities…Cont
• Declaration:– <!ENTITY entity-name SYSTEM ―Physical location‖>
• Example:
– <!ENTITY countries System ―d://countries.xml‖>
46PalGov © 2011 46PalGov © 2011
Unparsed Entities
• Holds content that should not be parsed
because it contains something other than
text or xml.
• Useful for:
– Importing graphics, sound files.
– None character data.
• Declaration:<!ENTITY entity-name SYSTEM ―physical location‖ NDATA file-format>
47PalGov © 2011 47PalGov © 2011
Unparsed Entities…Cont
• Example:
DTD<!ENTITY pic1 SYSTEM ―c://pic.git‖ NDATA GIF>
XML
<picture> &pic1; </picture>
48PalGov © 2011 48PalGov © 2011
DTD Limitations
• Differences between DTD syntax and XML syntax.
• Poor support for XML namespaces
• Poor data typing.
• Limited content model descriptions.
49PalGov © 2011 49PalGov © 2011
Summary
• By using DTDs, you can easily validate your XML
documents against a defined vocabulary of
elements and attributes. This reduces the amount
of code needed within your application.
• An XML parser can be used to check whether the
contents of an XML document are valid according
to the declarations within a DTD.
50PalGov © 2011 50PalGov © 2011
Refrences
• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt,
A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing Inc: Indiana, USA.
• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.
• Amiano, M., D'Cruz, C., Ethier, K., Thomas, M., (2006), XML:
Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.
• http://www.w3.org
• http://www.w3schools.com
• http://www.xml.com
• http://www.xml.org
51PalGov © 2011 51PalGov © 2011
<e-Gov> Thank you </e-Gov>
Recommended