1PalGov © 2011
أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Module 13.3
Data Integration and Fusion using RDF
Dr. Mustafa Jarrar
University of Birzeit
www.jarrar.info
2PalGov © 2011
About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
University of Trento, Italy
University of Namur, Belgium
Vrije Universiteit Brussel, Belgium
TrueTrust, UK
Birzeit University, Palestine
(Coordinator )
Palestine Polytechnic University, Palestine
Palestine Technical University, PalestineUniversité de Savoie, France
Ministry of Local Government, Palestine
Ministry of Telecom and IT, Palestine
Ministry of Interior, Palestine
Project Consortium:
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 [email protected]
3PalGov © 2011
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 4
Tutorial Map
Topic h
Session 1: XML Basics and Namespaces 3
Session 2: XML DTD’s 3
Session 3: XML Schemas 3
Session 4: Lab-XML Schemas 3
Session 5: RDF and RDFs 3
Session 6: Lab-RDF and RDFs 3
Session 7: OWL (Ontology Web Language) 3
Session 8: Lab-OWL 3
Session 9: Lab-RDF Stores -Challenges and Solutions 3
Session 10: Lab-SPARQL 3
Session 11: Lab-Oracle Semantic Technology 3
Session 12_1: The problem of Data Integration 1.5
Session 12_2: Architectural Solutions for the Integration Issues 1.5
Session 13_1: Data Schema Integration 1
Session 13_2: GAV and LAV Integration 1
Session 13_3: Data Integration and Fusion using RDF 1
Session 14: Lab-Data Integration and Fusion using RDF 3
Session 15_1: Data Web and Linked Data 1.5
Session 15_2: RDFa 1.5
Session 16: Lab-RDFa 3
Intended Learning Objectives
A: Knowledge and Understanding
2a1: Describe tree and graph data models.
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath.
2a4: Explain the concepts of identity management and Linked data.
2a5: Demonstrate knowledge about Integration &fusion of
heterogeneous data.
B: Intellectual Skills
2b1: Represent data using tree and graph data models (XML &
RDF).
2b2: Describe data semantics using RDFS and OWL.
2b3: Manage and query data represented in RDF, XML, OWL.
2b4: Integrate and fuse heterogeneous data.
C: Professional and Practical Skills
2c1: Using Oracle Semantic Technology and/or Virtuoso to store
and query RDF stores.
D: General and Transferable Skills2d1: Working with team.
2d2: Presenting and defending ideas.
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities.
5PalGov © 2011
Module ILOs
After completing this module students will be able to:
- Explain the concepts of identity management and linked data.
- Integrate and fuse heterogeneous data.
- Represent data using the graph data model (RDF).
- Manage and query data represented in RDF.
6PalGov © 2011
Example from the Government Domain
• Consider this simplified example from the Government domain.
Consider three governmental agencies that record information about
companies.
• In this example, we will integrate the three databases by transforming
each one into RDF and then concatenating the resultant RDF tables
into one table. After that, we investigate the concatenated data and link
the different resources.
• Data integration is simply achieved through concatenation of RDF
graphs and linking different resources. It is also achieved when
building and executing the queries over the concatenated dataset.
Companies DB in
Ministry of Justice
Companies DB in
Ministry of Economy
Companies DB in
Chamber of Commerce
7PalGov © 2011
Ministry of Justice
• Ministry of Justice records some information about companies in
addition to the advocates that represent the companies.
Company
Advocate
8PalGov © 2011
Ministry of Justice: To RDF
Company
Advocate
To RDF …
9PalGov © 2011
Chamber of Commerce
• Chamber of Commerce records information about companies in
addition to information about companies’ owners.
Company
Owner
Company_Owner
10PalGov © 2011
Chamber of Commerce: To RDF
To RDF …
11PalGov © 2011
Ministry of Economy
• Ministry of Economy records information about companies, their
owners, and their advocates.
Company
Owner
Lawyer
12PalGov © 2011
Ministry of Economy: To RDF
To RDF …
13PalGov © 2011
Integration of RDF Data
As simple as …
S P O S P O S P O
14PalGov © 2011
In our example
15PalGov © 2011
Linking resources
• How are same entities described in different datasets linked?
• By linking the Global Identifier, that is, the URI**!
• Let’s have a look:
:YH852 owl:sameAs :8327848
:YH852 owl:sameAs :4354JU
**
Note that in our example we used colons to distinguish URIs. For example :JK452, :H782YU,
:Country, and :Name are all URIs.
For example: “:H782YU” might actually be something like: http://www.palgov.ps//H782YU
: H782YU owl:sameAs :L85652r
- Links the company called “Palestine
Antiques” in the three databases.
- This is called entity resolution/
disambiguation.
- Links the lawyer called “Tony Deik” recorded in
the ministry of Justice and the ministry of
national economy.
- This is called entity resolution/ disambiguation.
16PalGov © 2011
Data Integration and Fusion
• Concatenating RDF graphs
and linking entities in different
datasets forms an integrated
view where applications see
all datasets as one integrated
database.
Source: Christian Bizer
17PalGov © 2011
References
• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.