51
1 PalGov © 2011 1 PalGov © 2011 فلسطينيةلكترونية الية الحكومة ا أكاديمThe Palestinian eGovernment Academy www.egovacademy.ps Dr. Ismail M. Romi Palestine Polytechnic University Tutorial II: Data Integration and Open Information Systems Session 2 XML DTD’s

Pal gov.tutorial2.session2.xml dtd's

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Pal gov.tutorial2.session2.xml dtd's

1PalGov © 2011 1PalGov © 2011

أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy

www.egovacademy.ps

Dr. Ismail M. Romi

Palestine Polytechnic University

Tutorial II: Data Integration and Open Information Systems

Session 2

XML DTD’s

Page 2: Pal gov.tutorial2.session2.xml dtd's

2PalGov © 2011 2PalGov © 2011

About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the

Commission of the European Communities, grant agreement 511159-TEMPUS-1-

2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps

University of Trento, Italy

University of Namur, Belgium

Vrije Universiteit Brussel, Belgium

TrueTrust, UK

Birzeit University, Palestine

(Coordinator )

Palestine Polytechnic University, Palestine

Palestine Technical University, PalestineUniversité de Savoie, France

Ministry of Local Government, Palestine

Ministry of Telecom and IT, Palestine

Ministry of Interior, Palestine

Project Consortium:

Coordinator:

Dr. Mustafa Jarrar

Birzeit University, P.O.Box 14- Birzeit, Palestine

Telfax:+972 2 2982935 [email protected]

Page 3: Pal gov.tutorial2.session2.xml dtd's

3PalGov © 2011 3PalGov © 2011

© Copyright Notes

Everyone is encouraged to use this material, or part of it, but should

properly cite the project (logo and website), and the author of that part.

No part of this tutorial may be reproduced or modified in any form or by

any means, without prior written permission from the project, who have

the full copyrights on the material.

Attribution-NonCommercial-ShareAlike

CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-

commercially, as long as they credit you and license their new creations

under the identical terms.

Page 4: Pal gov.tutorial2.session2.xml dtd's

4PalGov © 2011

Tutorial Map

Topic h

Session 1: XML Basics and Namespaces 3

Session 2: XML DTD’s 3

Session 3: XML Schemas 3

Session 4: Lab-XML Schemas 3

Session 5: RDF and RDFs 3

Session 6: Lab-RDF and RDFs 3

Session 7: OWL (Ontology Web Language) 3

Session 8: Lab-OWL 3

Session 9: Lab-RDF Stores -Challenges and Solutions 3

Session 10: Lab-SPARQL 3

Session 11: Lab-Oracle Semantic Technology 3

Session 12_1: The problem of Data Integration 1.5

Session 12_2: Architectural Solutions for the Integration Issues 1.5

Session 13_1: Data Schema Integration 1

Session 13_2: GAV and LAV Integration 1

Session 13_3: Data Integration and Fusion using RDF 1

Session 14: Lab-Data Integration and Fusion using RDF 3

Session 15_1: Data Web and Linked Data 1.5

Session 15_2: RDFa 1.5

Session 16: Lab-RDFa 3

Intended Learning Objectives

A: Knowledge and Understanding

2a1: Describe tree and graph data models.

2a2: Understand the notation of XML, RDF, RDFS, and OWL.

2a3: Demonstrate knowledge about querying techniques for data

models as SPARQL and XPath.

2a4: Explain the concepts of identity management and Linked data.

2a5: Demonstrate knowledge about Integration &fusion of

heterogeneous data.

B: Intellectual Skills

2b1: Represent data using tree and graph data models (XML &

RDF).

2b2: Describe data semantics using RDFS and OWL.

2b3: Manage and query data represented in RDF, XML, OWL.

2b4: Integrate and fuse heterogeneous data.

C: Professional and Practical Skills

2c1: Using Oracle Semantic Technology and/or Virtuoso to store

and query RDF stores.

D: General and Transferable Skills2d1: Working with team.

2d2: Presenting and defending ideas.

2d3: Use of creativity and innovation in problem solving.

2d4: Develop communication skills and logical reasoning abilities.

Page 5: Pal gov.tutorial2.session2.xml dtd's

5PalGov © 2011 5PalGov © 2011

Session ILO’s:

After completing this session students will be able to:

•Manage data represented in XML.

•Represent data using tree and graph data models.

Page 6: Pal gov.tutorial2.session2.xml dtd's

6PalGov © 2011 6PalGov © 2011

Session2: Document Type Definition-DTD

Session Overview:

</Create DTDs>

< Validate an XML document

against a DTD />

<Use DTDs to create XML documents

from multiple files />

Page 7: Pal gov.tutorial2.session2.xml dtd's

7PalGov © 2011 7PalGov © 2011

XML Schemas

A quality control tool.

Describes the structure of an XML document.

Ensures that a document fulfills a minimum set of

requirements.

Serve as away to formalize an application to be

publishable object.

XML schema is like a program that tells a processor how

to read the document.

Page 8: Pal gov.tutorial2.session2.xml dtd's

8PalGov © 2011 8PalGov © 2011

A history of schema Language

1. Document Type Definition – DTD:

– The oldest and most widely supported schema language.

2. The W3C Built XML Schema:

– XML Schemas are themselves XML documents.

3. RELAX NG

4. Schemarton

Page 9: Pal gov.tutorial2.session2.xml dtd's

9PalGov © 2011 9PalGov © 2011

Validation Steps

A "Valid" XML document is a "Well Formed" XML document, which also

conforms to the rules of a Document Type Definition.

1. The processor reads the rules and declaration in the schema.

2. Build a specific type of parser (validating parser)

3. The validating parser take an XML instance as input.

4. Produces a validation report.

Page 10: Pal gov.tutorial2.session2.xml dtd's

10PalGov © 2011 10PalGov © 2011

Document Type Definition - DTD

Defines the legal building blocks of an XML document.

Defines the document structure with a list of legal elements and

attributes.

DTD's are extensible - meaning they can be extended to meet the

needs of the current task.

A DTD can be specified within an XML document (internal) or in a

separate file (external).

Many free DTD's exist on the internet today and can be freely

downloaded.

DTD's declare a set of allowed elements.

Page 11: Pal gov.tutorial2.session2.xml dtd's

11PalGov © 2011 11PalGov © 2011

Document Type Definition - DTD

DTD's define a content model for each element: This

describes what elements or data can go inside an

element, in what order, in what number, and whether they

are required or optional.

DTD's declare a set of allowed attributes for each element

with data types and default values.

DTD's provide mechanisms to manage the model,

providing links to other components.

The Document Type Declaration Internal DTD declaration:

The DTD declared inside the XML file.

External DTD declaration:

The DTD declared in an external file.

Page 12: Pal gov.tutorial2.session2.xml dtd's

12PalGov © 2011 12PalGov © 2011

Internal DTD Declaration

<!DOCTYPE root-element [element-declaration ]>

Example:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note

[

<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

]>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Page 13: Pal gov.tutorial2.session2.xml dtd's

13PalGov © 2011 13PalGov © 2011

External DTD Declarations

You can refer to an external DTD in one of the

following two ways:

– System identifiers

– Public identifiers

Page 14: Pal gov.tutorial2.session2.xml dtd's

14PalGov © 2011 14PalGov © 2011

External DTD Declarations using System

Identifiers

<!DOCTYPE root-element SYSTEM “system identifier” [...]>

System identifier is a file reference, consists of:

– The keyword SYSTEM

– URI reference pointing to the document‘s location.

• A URI can be a file on your local hard drive, a file on your intranet or

network, or even a file available on the Internet:

Examples:

<!DOCTYPE name SYSTEM ―/user/local/dtds/name.dtd‖ [ ]>

<!DOCTYPE name SYSTEM ―http://wiley.com/hr/name.dtd‖ [ ]>

<!DOCTYPE name SYSTEM ―name.dtd‖>

Page 15: Pal gov.tutorial2.session2.xml dtd's

15PalGov © 2011 15PalGov © 2011

External DTD Declarations using Public

Identifiers

<!DOCTYPE root-element PUBLIC “public identifier” [...]>

Public identifiers are used to identify an entry in a catalog.

A commonly used format is called Formal Public Identifiers (FPIs).

The syntax for an FPI is defined in the document ISO9070.

FPI Syntax:

“-//Owner//Class Description//Language//Version”

Example:

<!DOCTYPE name PUBLIC ―-//Beginning XML//DTD Name Example//EN‖>

Recommended list of DOCTYPE at:

http://www.w3.org/QA/2002/04/valid-dtd-list.html

Page 16: Pal gov.tutorial2.session2.xml dtd's

16PalGov © 2011 16PalGov © 2011

Sharing Vocabularies

It is often better to share vocabularies and use DTDs that are widely

accepted.

Sharing DTDs enables you to more easily integrate with other

companies and XML developers who use the shared vocabularies.

Many individuals and industries have developed DTDs.

Examples:

– Chemical Markup Language (CML) DTD

– XHTML, maintains three DTDs (Transitional, Strict, and Frameset).

You can check many places when trying to find a DTD for a specific

industry.

– http://xml.coverpages.org/.

– http://www.dublincore.org.

Page 17: Pal gov.tutorial2.session2.xml dtd's

17PalGov © 2011 17PalGov © 2011

Anatomy of a DTD

DTDs consist of three basic parts:

1. Element declarations

2. Attribute declarations

3. Entity declarations

Those declarations must follow DOCTYPE

declaration as follow:

<?xml version 1.0, standalone = “yes”>

<!DOCTYPE root-element [

declarations

declarations

]>

Page 18: Pal gov.tutorial2.session2.xml dtd's

18PalGov © 2011 18PalGov © 2011

Element Declarations

ELEMENT declaration is used to indicate to the parser that

you are about to define an element.

The declaration can appear only within the context of the

DTD.

Syntax

Element declarations consist of three basic parts:

– ELEMENT Key word (<!ELEMENT)

– Element name

– Element content model

<!ELEMENT element-name (content model)>

Page 19: Pal gov.tutorial2.session2.xml dtd's

19PalGov © 2011 19PalGov © 2011

Element Declarations…Cont

An element‘s content model defines the allowable

content within the element.

An element may contain element children, text, a

combination of children and text, or the element

may be empty.

Four kinds of content models exist:

– Element content

– Mixed content

– Empty content

– Any content

Page 20: Pal gov.tutorial2.session2.xml dtd's

20PalGov © 2011 20PalGov © 2011

Element Content

Include the allowable elements within

parentheses.

Example:

<!ELEMENT contact (name, location, phone)>

Each element that you specify within this

element‘s content model must also have its own

definition within the DTD.

Page 21: Pal gov.tutorial2.session2.xml dtd's

21PalGov © 2011 21PalGov © 2011

Element Content…Cont

The processor needs this information so that it knows how

to handle each element when it is encountered.

Name in the content model must appear exactly as it will in

the document.

Ways of specifying the element children:

– Sequences

– Choices

Page 22: Pal gov.tutorial2.session2.xml dtd's

22PalGov © 2011 22PalGov © 2011

Element Content - Sequences

The elements within these documents must appear in a

distinct order.

If your XML document were missing one of the elements

within the sequence, or if your document contained more

elements, the parser would raise an error.

If all of the specified elements were included within the

XML document but appeared in another order processor

would raise an error.

whitespace doesn‘t matter.

Page 23: Pal gov.tutorial2.session2.xml dtd's

23PalGov © 2011 23PalGov © 2011

Element Content - Choices

Sometimes you needed to allow one element or

another, but not both.

You would need a choice mechanism of some sort.

Example:

<!ELEMENT location (address | GPS)>

This declaration would allow the <location> element to

contain one <address> or one <GPS> element.

If the <location> element were empty, or if it contained

more than one of these elements, the parser would

raise an error.

Page 24: Pal gov.tutorial2.session2.xml dtd's

24PalGov © 2011 24PalGov © 2011

Mixed Content

The XML Recommendation specifies that any element with

text in its content is a mixed content model element.

Within mixed content models, text can appear by itself or it

can be interspersed between elements.

The simplest mixed content model—text only:

<!ELEMENT element-name (#PCDATA)>

#PCDATA keyword, (Parsed Character DATA):

– indicates that the character data within the content model

should be parsed by the parser.

– Used for text or character data.

Page 25: Pal gov.tutorial2.session2.xml dtd's

25PalGov © 2011 25PalGov © 2011

Mixed Content - Cont

Every time you declare elements within a mixed

content model, they must follow four rules:

– They must use the choice mechanism (the vertical bar |

character) to separate elements.

– The #PCDATA keyword must appear first in the list of

elements.

– There must be no inner content models.

– If there are child elements, the * cardinality indicator

must appear at the end of the model.

Page 26: Pal gov.tutorial2.session2.xml dtd's

26PalGov © 2011 26PalGov © 2011

Mixed Content-Example

DTD:

<!ELEMENT description (#PCDATA | em | strong | br)*>

XML Document:

<description>Jeff is a developer and author for Beginning XML <em>4th

edition</em>.<br/>Jeff <strong>loves</strong> XML!</description>

The text may appear every where, and the em, strong, br can appear

any time.

Note:

em: italic, strong:bold, br: line break

Page 27: Pal gov.tutorial2.session2.xml dtd's

27PalGov © 2011 27PalGov © 2011

Empty Content

Empty element doesn‘t have content.

<!ELEMENT element-name EMPTY>

The most common used empty element is:

<br/> (line break).

Page 28: Pal gov.tutorial2.session2.xml dtd's

28PalGov © 2011 28PalGov © 2011

Element with ANY content

<!ELEMENT element-name ANY>

Can contain any combination of parsable data (text, or

elements).

ANY: a keyword indicates that any elements declared

within the DTD can be used within the content of the

element and that they can be used in any order any

number of times.

Page 29: Pal gov.tutorial2.session2.xml dtd's

29PalGov © 2011 29PalGov © 2011

Cardinality

An element‘s cardinality defines how many times it will appear within a content model.

Each element within a content model can have an

indicator following the element name that tells the parser

how many times it will appear.

Page 30: Pal gov.tutorial2.session2.xml dtd's

30PalGov © 2011 30PalGov © 2011

Cardinality…Cont

DescriptionIndicator

when no cardinality indicator is used, it indicates

that the element must appear once and only

once.

None

Indicates that the element may appear either

once or not at all

?

Indicates that the element may appear one or

more times

+

Indicates that the element may appear zero or

more times

*

Example:

<!ELEMENT name (first+, middle?, last), Tel*>

Page 31: Pal gov.tutorial2.session2.xml dtd's

31PalGov © 2011 31PalGov © 2011

Attribute Declarations

<!ATTLIST element-name attribute-name attribute-type ―attribute-value‖

DTD example:

<!ATTLIST payment type CDATA ―check‖>

XML example:

<payment type=―check‖>

Page 32: Pal gov.tutorial2.session2.xml dtd's

32PalGov © 2011 32PalGov © 2011

Attribute Types

DescriptionType

Indicates that the attribute value is character data

(unparsed).

CDATA

Indicates that the attribute value uniquely identifies the

containing element.

ID

The value is the id of another element.IDREF

The value is a list of other idsIDREFS

The value is an entity ENTITY

The value is a list of entitiesENTITIES

The value is a valid XML nameNMTOKEN

The value is a list of valid XML namesNMTOKENS

The value must be an enumerated value (val1 | val2 | ….)Enumerated List

<!ATTLIST element-name attribute-name attribute-type “attribute-value”

Page 33: Pal gov.tutorial2.session2.xml dtd's

33PalGov © 2011 33PalGov © 2011

CDATA

• It specifies that the attribute value is character

data (any text).

• Unparsed content

DTD example: <!ELEMENT square EMPTY>

<!ATTLIST square width CDATA "0">

XML example:

<square width="100">

</square>

<!ATTLIST element-name attribute-name attribute-type “attribute-value”

Page 34: Pal gov.tutorial2.session2.xml dtd's

34PalGov © 2011 34PalGov © 2011

ID, IDREF, and IDREFS

Attributes of type ID can be used to uniquely identify an

element within an XML document.

Once you have uniquely identified the element, you can

later use an IDREF to refer to that element.

Remember several rules when using ID attributes:

– The value of an ID attribute must be unique within the entire

XML document.

– Only one attribute of type ID may be declared per element.

– The attribute value declaration for an ID attribute must be

#IMPLIED or #REQUIRED.

The value of an IDREF attribute must match the value of some ID within the XML

document.

To refer to a list of elements:

– Use an IDREFS attribute store with a list of whitespace-separated IDREF values that refer to

an ID attributes defined in the document.

Page 35: Pal gov.tutorial2.session2.xml dtd's

35PalGov © 2011 35PalGov © 2011

ENTITY and ENTITIES

• Attributes can also include references to unparsed entities.

• An unparsed entity is an entity reference to an external file

that the processor cannot parse (external images..).

• Instead of actually including the image inside the

document, you use special attributes to refer to the

external resource.

<!ATTLIST element-name attribute-name attribute-type “attribute-value”

Page 36: Pal gov.tutorial2.session2.xml dtd's

36PalGov © 2011 36PalGov © 2011

Enumerated Attribute Types

• Used to restrict attribute values

• An enumerated list allows you to specify a list of allowable

values.

• Each value must be a valid XML name

• Example:

DTD:<!ATTLIST phone kind (Home | Work | Cell | Fax) #IMPLIED>

XML:

<phone kind=―Cell‖ > Valid

<phone kind=―cell‖ > Invalid

<!ATTLIST element-name attribute-name attribute-type “attribute-value”

Page 37: Pal gov.tutorial2.session2.xml dtd's

37PalGov © 2011 37PalGov © 2011

Attribute Value Declarations

Within each attribute declaration you must specify how

the value will appear in the document.

The XML Recommendation allows you to specify that the

attribute:

<!ATTLIST element-name attribute-name attribute-type “attribute-value”

DescriptionValue

The attribute has a default value#DEFAULT

The attribute value must be included in the element#REQUIRED

The attribute does not have to be included#IMPLIED

The attribute value is fixed#FIXED

Page 38: Pal gov.tutorial2.session2.xml dtd's

38PalGov © 2011 38PalGov © 2011

Specifying Multiple Attributes

Declaring each attribute:

<!ATTLIST contacts version CDATA #FIXED ―1.0‖>

<!ATTLIST contacts source CDATA #IMPLIED>

Using one declaration:

<!ATTLIST contacts version CDATA #FIXED ―1.0‖

source CDATA #IMPLIED>

Page 39: Pal gov.tutorial2.session2.xml dtd's

39PalGov © 2011 39PalGov © 2011

Entities

• Place holder in XML

• Types:

– Built-in entities

– Character entities

– General entities

– Parameter entities

Page 40: Pal gov.tutorial2.session2.xml dtd's

40PalGov © 2011 40PalGov © 2011

Built-in Entities

• &amp; The & character

• &lt; The < character

• &gt; The > character

• &apos; The ‗ character

• &quot; The ― character

Page 41: Pal gov.tutorial2.session2.xml dtd's

41PalGov © 2011 41PalGov © 2011

References to Built-in Entities

To use an entity, you must include an entity

reference within the document.

An entity reference refers to an entity that

represents a character, some text, or even an

external file.

A reference to a built-in entity takes the following

form:

&entity-name;

Example:

<CheckAvg> Avg &lt; ―85‖ </CheckAvg>

Page 42: Pal gov.tutorial2.session2.xml dtd's

42PalGov © 2011 42PalGov © 2011

Character Entities

• Used for characters that are difficult to type.

• Not found on the keyboard.

&#unicode-value;

• Example:

&#169; === character c

• Using Hexadecimal values:

• Example:

&#x00A9; === character cyou must include a lowercase x

before the value, so that the

XML parser knows how it

should handle the reference.

Page 43: Pal gov.tutorial2.session2.xml dtd's

43PalGov © 2011 43PalGov © 2011

General Entities ( Internal Entities)

Variables used to define shortcuts to standard text

or special characters.

General entities must be declared within the DTD

before they can be used within the XML

document.

Declaration:

– <!ENTITY entity-name ―value‖>

Example:– <!ENTITY address ―Palestine, Hebron, POBox 198‖>

– <ppu-address> &address; </ppu-address>

DTD

XML

Page 44: Pal gov.tutorial2.session2.xml dtd's

44PalGov © 2011 44PalGov © 2011

External Entities

• Entity whose replacement text exists in another file.

• Useful for:

– Importing content that is shared by many documents.

– Importing content that is changed frequently.

– Breaking the document into multiple physical parts.

• External entities must be declared in order to enable the

parser find the replacement text.

Page 45: Pal gov.tutorial2.session2.xml dtd's

45PalGov © 2011 45PalGov © 2011

External Entities…Cont

• Declaration:– <!ENTITY entity-name SYSTEM ―Physical location‖>

• Example:

– <!ENTITY countries System ―d://countries.xml‖>

Page 46: Pal gov.tutorial2.session2.xml dtd's

46PalGov © 2011 46PalGov © 2011

Unparsed Entities

• Holds content that should not be parsed

because it contains something other than

text or xml.

• Useful for:

– Importing graphics, sound files.

– None character data.

• Declaration:<!ENTITY entity-name SYSTEM ―physical location‖ NDATA file-format>

Page 47: Pal gov.tutorial2.session2.xml dtd's

47PalGov © 2011 47PalGov © 2011

Unparsed Entities…Cont

• Example:

DTD<!ENTITY pic1 SYSTEM ―c://pic.git‖ NDATA GIF>

XML

<picture> &pic1; </picture>

Page 48: Pal gov.tutorial2.session2.xml dtd's

48PalGov © 2011 48PalGov © 2011

DTD Limitations

• Differences between DTD syntax and XML syntax.

• Poor support for XML namespaces

• Poor data typing.

• Limited content model descriptions.

Page 49: Pal gov.tutorial2.session2.xml dtd's

49PalGov © 2011 49PalGov © 2011

Summary

• By using DTDs, you can easily validate your XML

documents against a defined vocabulary of

elements and attributes. This reduces the amount

of code needed within your application.

• An XML parser can be used to check whether the

contents of an XML document are valid according

to the declarations within a DTD.

Page 50: Pal gov.tutorial2.session2.xml dtd's

50PalGov © 2011 50PalGov © 2011

Refrences

• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt,

A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing Inc: Indiana, USA.

• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.

• Amiano, M., D'Cruz, C., Ethier, K., Thomas, M., (2006), XML:

Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.

• http://www.w3.org

• http://www.w3schools.com

• http://www.xml.com

• http://www.xml.org

Page 51: Pal gov.tutorial2.session2.xml dtd's

51PalGov © 2011 51PalGov © 2011

<e-Gov> Thank you </e-Gov>