17
1 PalGov © 2011 فلسطينيةلكترونية الديمية الحكومة ا أكاThe Palestinian eGovernment Academy www.egovacademy.ps Tutorial II: Data Integration and Open Information Systems Module 13.3 Data Integration and Fusion using RDF Dr. Mustafa Jarrar University of Birzeit [email protected] www.jarrar.info

Pal gov.tutorial2.session13 3.data integration and fusion using rdf

Embed Size (px)

Citation preview

Page 1: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

1PalGov © 2011

أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy

www.egovacademy.ps

Tutorial II: Data Integration and Open Information Systems

Module 13.3

Data Integration and Fusion using RDF

Dr. Mustafa Jarrar

University of Birzeit

[email protected]

www.jarrar.info

Page 2: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

2PalGov © 2011

About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the

Commission of the European Communities, grant agreement 511159-TEMPUS-1-

2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps

University of Trento, Italy

University of Namur, Belgium

Vrije Universiteit Brussel, Belgium

TrueTrust, UK

Birzeit University, Palestine

(Coordinator )

Palestine Polytechnic University, Palestine

Palestine Technical University, PalestineUniversité de Savoie, France

Ministry of Local Government, Palestine

Ministry of Telecom and IT, Palestine

Ministry of Interior, Palestine

Project Consortium:

Coordinator:

Dr. Mustafa Jarrar

Birzeit University, P.O.Box 14- Birzeit, Palestine

Telfax:+972 2 2982935 [email protected]

Page 3: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

3PalGov © 2011

© Copyright Notes

Everyone is encouraged to use this material, or part of it, but should

properly cite the project (logo and website), and the author of that part.

No part of this tutorial may be reproduced or modified in any form or by

any means, without prior written permission from the project, who have

the full copyrights on the material.

Attribution-NonCommercial-ShareAlike

CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-

commercially, as long as they credit you and license their new creations

under the identical terms.

Page 4: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

PalGov © 2011 4

Tutorial Map

Topic h

Session 1: XML Basics and Namespaces 3

Session 2: XML DTD’s 3

Session 3: XML Schemas 3

Session 4: Lab-XML Schemas 3

Session 5: RDF and RDFs 3

Session 6: Lab-RDF and RDFs 3

Session 7: OWL (Ontology Web Language) 3

Session 8: Lab-OWL 3

Session 9: Lab-RDF Stores -Challenges and Solutions 3

Session 10: Lab-SPARQL 3

Session 11: Lab-Oracle Semantic Technology 3

Session 12_1: The problem of Data Integration 1.5

Session 12_2: Architectural Solutions for the Integration Issues 1.5

Session 13_1: Data Schema Integration 1

Session 13_2: GAV and LAV Integration 1

Session 13_3: Data Integration and Fusion using RDF 1

Session 14: Lab-Data Integration and Fusion using RDF 3

Session 15_1: Data Web and Linked Data 1.5

Session 15_2: RDFa 1.5

Session 16: Lab-RDFa 3

Intended Learning Objectives

A: Knowledge and Understanding

2a1: Describe tree and graph data models.

2a2: Understand the notation of XML, RDF, RDFS, and OWL.

2a3: Demonstrate knowledge about querying techniques for data

models as SPARQL and XPath.

2a4: Explain the concepts of identity management and Linked data.

2a5: Demonstrate knowledge about Integration &fusion of

heterogeneous data.

B: Intellectual Skills

2b1: Represent data using tree and graph data models (XML &

RDF).

2b2: Describe data semantics using RDFS and OWL.

2b3: Manage and query data represented in RDF, XML, OWL.

2b4: Integrate and fuse heterogeneous data.

C: Professional and Practical Skills

2c1: Using Oracle Semantic Technology and/or Virtuoso to store

and query RDF stores.

D: General and Transferable Skills2d1: Working with team.

2d2: Presenting and defending ideas.

2d3: Use of creativity and innovation in problem solving.

2d4: Develop communication skills and logical reasoning abilities.

Page 5: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

5PalGov © 2011

Module ILOs

After completing this module students will be able to:

- Explain the concepts of identity management and linked data.

- Integrate and fuse heterogeneous data.

- Represent data using the graph data model (RDF).

- Manage and query data represented in RDF.

Page 6: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

6PalGov © 2011

Example from the Government Domain

• Consider this simplified example from the Government domain.

Consider three governmental agencies that record information about

companies.

• In this example, we will integrate the three databases by transforming

each one into RDF and then concatenating the resultant RDF tables

into one table. After that, we investigate the concatenated data and link

the different resources.

• Data integration is simply achieved through concatenation of RDF

graphs and linking different resources. It is also achieved when

building and executing the queries over the concatenated dataset.

Companies DB in

Ministry of Justice

Companies DB in

Ministry of Economy

Companies DB in

Chamber of Commerce

Page 7: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

7PalGov © 2011

Ministry of Justice

• Ministry of Justice records some information about companies in

addition to the advocates that represent the companies.

Company

Advocate

Page 8: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

8PalGov © 2011

Ministry of Justice: To RDF

Company

Advocate

To RDF …

Page 9: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

9PalGov © 2011

Chamber of Commerce

• Chamber of Commerce records information about companies in

addition to information about companies’ owners.

Company

Owner

Company_Owner

Page 10: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

10PalGov © 2011

Chamber of Commerce: To RDF

To RDF …

Page 11: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

11PalGov © 2011

Ministry of Economy

• Ministry of Economy records information about companies, their

owners, and their advocates.

Company

Owner

Lawyer

Page 12: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

12PalGov © 2011

Ministry of Economy: To RDF

To RDF …

Page 13: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

13PalGov © 2011

Integration of RDF Data

As simple as …

S P O S P O S P O

Page 14: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

14PalGov © 2011

In our example

Page 15: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

15PalGov © 2011

Linking resources

• How are same entities described in different datasets linked?

• By linking the Global Identifier, that is, the URI**!

• Let’s have a look:

:YH852 owl:sameAs :8327848

:YH852 owl:sameAs :4354JU

**

Note that in our example we used colons to distinguish URIs. For example :JK452, :H782YU,

:Country, and :Name are all URIs.

For example: “:H782YU” might actually be something like: http://www.palgov.ps//H782YU

: H782YU owl:sameAs :L85652r

- Links the company called “Palestine

Antiques” in the three databases.

- This is called entity resolution/

disambiguation.

- Links the lawyer called “Tony Deik” recorded in

the ministry of Justice and the ministry of

national economy.

- This is called entity resolution/ disambiguation.

Page 16: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

16PalGov © 2011

Data Integration and Fusion

• Concatenating RDF graphs

and linking entities in different

datasets forms an integrated

view where applications see

all datasets as one integrated

database.

Source: Christian Bizer

Page 17: Pal gov.tutorial2.session13 3.data integration and fusion using rdf

17PalGov © 2011

References

• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI

International, Artificial Intelligence Center. Menlo Park, USA. 2009.