Upload
mustafa-jarrar
View
960
Download
1
Embed Size (px)
Citation preview
1PalGov © 2011 1PalGov © 2011
أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session 11
Oracle Semantic Technologies
Dr. Mustafa Jarrar
University of Birzeit
www.jarrar.info
2PalGov © 2011 2PalGov © 2011
About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
University of Trento, Italy
University of Namur, Belgium
Vrije Universiteit Brussel, Belgium
TrueTrust, UK
Birzeit University, Palestine
(Coordinator )
Palestine Polytechnic University, Palestine
Palestine Technical University, PalestineUniversité de Savoie, France
Ministry of Local Government, Palestine
Ministry of Telecom and IT, Palestine
Ministry of Interior, Palestine
Project Consortium:
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 [email protected]
3PalGov © 2011 3PalGov © 2011
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 4
Tutorial Map
Topic h
Session 1: XML Basics and Namespaces 3
Session 2: XML DTD‟s 3
Session 3: XML Schemas 3
Session 4: Lab-XML Schemas 3
Session 5: RDF and RDFs 3
Session 6: Lab-RDF and RDFs 3
Session 7: OWL (Ontology Web Language) 3
Session 8: Lab-OWL 3
Session 9: Lab-RDF Stores -Challenges and Solutions 3
Session 10: Lab-SPARQL 3
Session 11: Lab-Oracle Semantic Technology 3
Session 12_1: The problem of Data Integration 1.5
Session 12_2: Architectural Solutions for the Integration Issues 1.5
Session 13_1: Data Schema Integration 1
Session 13_2: GAV and LAV Integration 1
Session 13_3: Data Integration and Fusion using RDF 1
Session 14: Lab-Data Integration and Fusion using RDF 3
Session 15_1: Data Web and Linked Data 1.5
Session 15_2: RDFa 1.5
Session 16: Lab-RDFa 3
Intended Learning Objectives
A: Knowledge and Understanding
2a1: Describe tree and graph data models.
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath.
2a4: Explain the concepts of identity management and Linked data.
2a5: Demonstrate knowledge about Integration &fusion of
heterogeneous data.
B: Intellectual Skills
2b1: Represent data using tree and graph data models (XML &
RDF).
2b2: Describe data semantics using RDFS and OWL.
2b3: Manage and query data represented in RDF, XML, OWL.
2b4: Integrate and fuse heterogeneous data.
C: Professional and Practical Skills
2c1: Using Oracle Semantic Technology and/or Virtuoso to store
and query RDF stores.
D: General and Transferable Skills2d1: Working with team.
2d2: Presenting and defending ideas.
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities.
5PalGov © 2011 5PalGov © 2011
Module ILOs
After completing this module students will be
able to:
- Using Oracle Semantics Technology to store and
query RDF stores.
PalGov © 2011 6
Introduction
• Oracle Semantic Technologies enables you to:
– store RDF data and ontologies,
– query RDF data,
– perform ontology-assisted query of relational data,
– use supplied or user-defined “inferencing”.
Inference
OW
L S
ub
se
t
RD
F/S
Us
er-
de
fin
ed
Query
Source: Oracle.com
Query RDF/
OWL data and
ontologies
Ontology-assisted
query of relational
data
Database
RDF/OWL
data and
ontologies
Relational
data
Store
Bulk Load
Incremental
load & DML
PalGov © 2011 7
Querying RDF data using Oracle
• Oracle introduced an SQL-based scheme to query RDF
data.
• They introduced an SQL table function called
“SEM_MATCH” which is part of Oracle‟s Semantic
Technologies.
• SEM_MATCH takes a SPARQL-like syntax as arguments,
and returns a table of results that can be further queried
using SQL.
PalGov © 2011 8
How RDF Data is stored in Oracle.
• The physical organization of RDF data is a bit different
from its logical organization as a single <S,P,O> table.
RDF triples are stored after normalization in two tables:
– IdTriples(ModelID, subjectID, propertyID, objectID)
(triples in the identifier format)
– URIMap(UriID, UriValue)
(uri to identifier mapping)
• The core implementation of RDF_MATCH query translates
to a self-join query on IdTriples table.
PalGov © 2011 9
Query Optimization
• Optimization of SEM_MATCH queries on RDF data:
– Depends on Oracle‟s RDBMS optimizer to efficiently
speed up the execution of the query.
– Uses of a set of B-tree indexes and materialized views
(e.g. the subject-property matrix described previously).
PalGov © 2011 10
Architectural Overview
PalGov © 2011 11
Core Entities in Oracle Database Semantic Store
Sem. Network Dictionary and
data tables for storage and
management of asserted and
inferred RDF triples. OWL and
RDFS rule bases are preloaded.
Model: A model holds an RDF
graph (set of S<P<O triples).
Rulebase: A rulebase is a set of
rules used for inferencing.
Entailments: An entailment stores
triples derived via inferencing.
Application Table: Contains a
column of type sdo_rdf_triple_s,
associated with an RDF model, to
allow DML and access to RDF
triples, and storing ancillary
values.
Source: Oracle.com
PalGov © 2011 12
Core Functionality: Load / Query / Inference
• Load– Bulk load
– Incremental load
• Query and DML
– SPARQL
(from Java/endpoint/Oracle)
• Inference – Native support for OWL 2 RL,
SNOMED (OWL 2 EL subset),
OWLprime, OWLSIF, RDFS++.
– Named Graph Local Inference
– User-defined rules
Source: Oracle.com
PalGov © 2011 13
Architectural Overview: Interfaces
Note that there are three interfaces for Oracle
Semantic Technologies:
• SQL-based (SQL and PL/SQL)
• Java-based:
– Jena (Using Jena adapter from Oracle).
– Sesame (Using Jena adapter from Oracle).
• SPARQL Endpoints:
– Joseki
– OpenRDF Workbench
PalGov © 2011 14
Architectural OverviewSource: Oracle.com
PalGov © 2011 15
Installation and Configuration of Oracle Database
Semantic Technologies
PalGov © 2011 16
Installation and Configuration (1)
• Load the PL/SQL packages and jar file
– cd $ORACLE_HOME/md/admin
– As sysdba
– SQL> @catsem
• Create a tablespace for semantic network
create bigfile tablespace semts
datafile '?/dbs/semts01.dat' size 512M reuse
autoextend on next 512M maxsize unlimited
extent management local
segment space management auto;
PalGov © 2011 17
Installation and Configuration (2)
• Create a temporary tablespacecreate bigfile temporary tablespace semtmpts
tempfile ‘?/dbs/semtmpts.dat'
size 512M reuse
autoextend on next 512M maxsize unlimited
EXTENT MANAGEMENT LOCAL;
ALTER DATABASE DEFAULT TEMPORARY TABLESPACE semtmpts;
• Create an undo tablespaceCREATE bigfile UNDO TABLESPACE semundots
DATAFILE ‘?/dbs/semundots.dat' SIZE 512M REUSE
AUTOEXTEND ON next 512M maxsize unlimited
EXTENT MANAGEMENT LOCAL ;
ALTER SYSTEM SET UNDO_TABLESPACE=semundots;
PalGov © 2011 18
Installation and Configuration (3)
• Create a semantic network to enable semantic data
management:
– As sysdba
– SQL> exec sem_apis.create_sem_network(„semts‟);
• Create Semantic Model
– As scott (or other)
– SQL> create table test_tpl(id number, triple sdo_rdf_triple_s);
– SQL> exec sem_apis.create_sem_model(„test‟,‟test_tpl‟,‟triple‟);
PalGov © 2011 19
Loading RDF Triples
PalGov © 2011 20
Loading Semantic Data: APIs
• Incremental DMLs (small number of changes)
• SQL: Insert
• SQL: Delete
• Java API (Jena): GraphOracleSem.add, delete
• Java API (Sesame): OracleSailConnection.addStatement,
removeStatements
• Bulk Loader (large number of changes)
• PL/SQL: sem_apis.bulk_load_from_staging_table(…)
• Java API (Jena): OracleBulkUpdateHandler.addInBulk(…),
prepareBulk
• Java API (Sesame): OracleBulkUpdateHandler.addInBulk,
prepareBulk ...
Recommended for
very small number
of triples
Recommended for
very large number
of triples
PalGov © 2011 21
PL/SQL Bulk Loader
• STEP 1: Load data into Staging Table using SQL*Loader:
(a) Create a staging table:
CREATE TABLE stage_table
(
RDF$STC_sub varchar2(4000) not null,
RDF$STC_pred varchar2(4000) not null,
RDF$STC_obj varchar2(4000) not null,
RDF$STC_sub_ext varchar2(64),
RDF$STC_pred_ext varchar2(64),
RDF$STC_obj_ext varchar2(64),
RDF$STC_canon_ext varchar2(64)
) COMPRESS Tablespace TS_Name;
PalGov © 2011 22
PL/SQL Bulk Loader
• STEP 1: Load data into Staging Table using SQL*Loader:
(b) Load into Staging Table
sqlldr userid=testuser/testuser
control=bulkload.ctl data=dblp.nt
direct=true skip=0 load=6000000
discardmax=0 bad=d0.bad discard=d0.rej
log=d0.log errors=100000000
PalGov © 2011 23
PL/SQL Bulk Loader
• STEP 1: Load data into Staging Table using SQL*Loader:
(b) Load into Staging Table (from cmd)
sqlldr userid=testuser/testuser
control=bulkload.ctl data=dblp.nt
direct=true skip=0 load=6000000
discardmax=0 bad=d0.bad discard=d0.rej
log=d0.log errors=100000000
Control file where
we specify the name
of the staging table
in the DB.
Maximum number of
rows to bulk-load.
Delete to remove
limitations.
Input Data (the path
to the input data
file)
PalGov © 2011 24
PL/SQL Bulk Loader
• STEP 2: Create a semantic model and run bulk load from
staging table API:
– Create SEM Model (if not created already):
CREATE TABLE myrdf_tpl (id number, triple
SDO_RDF_TRIPLE_S) COMPRESS nologging tablespace
semts;
exec sem_apis.create_sem_model(‘myrdf',‘myrdf_tpl',
'triple');
– Bulk Load:
grant select on stage_table to mdsys;
grant insert on myrdf_tpl to mdsys;
exec sem_apis.bulk_load_from_staging_table(‘myrdf’,
‘scott‘, stage_table‘, flags=>’PARALLEL_CREATE_INDEX
PARALLEL=4');
PalGov © 2011 25
After Data is loaded
• Check number of triples in the model and application table– select count(1) from mdsys.rdfm_<ModelName>;
– select count(1) from <AppTable>;
• Analyze the semantic model if there is enough change to
the model– exec sem_apis.analyze_model(‘<ModelName>’);
• Analyze the semantic network if there is enough change to
the whole network– exec sem_perf.gather_stats(true, 4); -- just on value$
-- table
– exec sem_perf.gather_stats(false, 4); -- whole network
• Start Querying
PalGov © 2011 26
Querying RDF Data using SEM_MATCH
• SPARQL Query Architecture
Source: Oracle.com
PalGov © 2011 27
SEM_MATCH: Adding SPARQL to SQLSource: Oracle.com
PalGov © 2011 28
SEM_MATCH: Adding SPARQL to SQLSource: Oracle.com
PalGov © 2011 29
SEM_MATCH: Adding SPARQL to SQLSource: Oracle.com
PalGov © 2011 30
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- The query attribute is required.
- The other attributes are optional (that is, each can be a null
value).
- The query attribute is a string literal (or concatenation of
string literals) with one or more triple patterns, usually
containing variables.
EXAMPLE:
‘SELECT ?directorName
WHERE{
:M1 :directedBy ?director .
?director :name ?directorName
}’
PalGov © 2011 31
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- The models attribute identifies the model or models to use.
- Its data type is SEM_MODELS, which has the following
definition: TABLE OF VARCHAR2(25).
- If you are querying a virtual model, specify only the name of
the virtual model and no other models.
-Name of the model:SEM_Models(‘model_name'),
PalGov © 2011 32
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- The rulebases attribute identifies one or more rulebases
whose rules are to be applied to the query.
- If you are querying a virtual model, this attribute must be
null.
- A rulebase is an object that can contain rules, and a rule is
an object that can be applied to draw inferences from
semantic data.
PalGov © 2011 33
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- EXAMPLE: creates a rulebase named family_rb , and then inserts a rule
named grandparent_rule into the family_rb rulebase. This rule says
that if a person is the parent of a child who is the parent of a child, that
person is a grandparent of.
EXECUTE SEM_APIS.CREATE_RULEBASE('family_rb'); INSERT INTO mdsys.semr_family_rb VALUES( 'grandparent_rule', '(?x :parentOf ?y) (?y :parentOf ?z)', NULL, '(?x :grandParentOf ?z)', SEM_ALIASES(SEM_ALIAS('','http://www.example.org/family/')));
PalGov © 2011 34
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- The aliases attribute identifies one or more namespaces, in addition to the
default namespaces, to be used for expansion of qualified names in the
query pattern.
- The following default namespaces are used:('orardf', 'http://xmlns.oracle.com/rdf/') ('orageo', 'http://xmlns.oracle.com/rdf/geo/') ('owl', 'http://www.w3.org/2002/07/owl#') ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#') ('rdfs', 'http://www.w3.org/2000/01/rdf-schema#') ('xsd', 'http://www.w3.org/2001/XMLSchema#')
PalGov © 2011 35
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
-The filter attribute identifies any additional selection criteria.
- If this attribute is not null, it should be a string in the form of a
WHERE clause without the WHERE keyword.
- For example: '(h >= 6)' to limit the result to cases where the height
of the grandfather's grandchild is 6 or greater
PalGov © 2011 36
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- The index_status attribute lets you query semantic data even
when the relevant entailment does not have a valid status.
- If this attribute is null, the query returns an error if the entailment
does not have a valid status. If this attribute is not null, it must be
the string INCOMPLETE or INVALID.
- An Entailment is an object containing precomputed triples that
can be inferred from applying a specified set of rulebases to a
specified set of models.
PalGov © 2011 37
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- The options attribute identifies options that can affect
the results of queries.
- Options are expressed as keyword-value pairs.
PalGov © 2011 38
SEM_MATCH Table Function Arguments
SEM_MATCH(
query VARCHAR2,
models SEM_MODELS,
rulebases SEM_RULEBASES,
aliases SEM_ALIASES,
filter VARCHAR2,
index_status VARCHAR2,
options VARCHAR2 ) RETURN ANYDATASET;
- For more details about using the SEM_MATCH
Function and the its different arguments, consult
Oracle Semantic Technologies Developer’s Guide:http://www.oracle.com/technetwork/database/options/semantic
-tech/documentation-087054.html
SEM_MATCH DOSUMENTATION:
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e25609/sdo_rdf_concepts.htm#CHDJACII
PalGov © 2011 39
Examples
Exercises:
(1) What is the name of director D3?
(2) What is the name of the director of the movie M1?
(3) List all the movies who have directors from the USA and their directors.
(4) List all the names of the directors from Lebanon who have won prizes and the
prizes they have won.
M1
M2
M3
2007
directedByD1
Michael Moore
P1
Emmy Awards
2009
C1 USA
Washington DCdirectedBy
D2actedIn
1995
Brave Heart
P2 Oscars
1996
M4directedBy
D3actedIn
2007
Caramel
Mel Gibson
C2 Lebanon
Beirut
Nadine Labaki
P3 Stockholm Festival
2007
name
C3Sweden
Stockholm
location
1995
Capitalism
Sicko
PalGov © 2011 40
Examples
• Q1: What is the name of director D3?
Select directorName
From
Table(
SEM_MATCH(
‘SELECT ?directorName
WHERE {:D3 :name ?directorName}’,
SEM_MODELS(‘movies_model’), null, null, null,
null, null)
)
);
PalGov © 2011 41
Examples
• Q2: What is the name of the director of the movie M1?
Select directorName
From
Table(
SEM_MATCH(
‘SELECT ?directorName
WHERE{
:M1 :directedBy ?director .
?director :name ?directorName}’,
SEM_MODELS(‘movies_model’), null, null, null,
null, null)
)
);
PalGov © 2011 42
Examples
• Q3: List all the movies who have directors from the USA
and their directors.
Select movie, director
From
Table(
SEM_MATCH(
‘Select ?movie ?director
Where {?movie :directedBy ?director.
?director :country ?country.
?country :name ‘USA’}’ ,
SEM_MODELS(‘movies_model’), null, null, null,
null, null)
)
);
PalGov © 2011 43
Examples
• Q4: List all the names of the directors from Lebanon who
have won prizes and the prizes they have won.
Select directorName, prize
From
Table(
SEM_MATCH(
‘Select ?directorName ?prize
Where { ?director :name ?directorName.
?director :country ?c.
?c :name ‘Lebanon’.
?director :hasWonPrizeIn ?prize}’,
SEM_MODELS(‘movies_model’), null, null, null,
null, null)
));
44PalGov © 2011 44PalGov © 2011
Practical Session
Tutorial II: Data Integration and Open Information Systems
Data Integration and Open Information Systems (Tutorial II)
The Palestinian e-Government Academy
January, 2012
PalGov © 2011 45
Practical Session
PART1: Given the previous movies example and the accompanying four
queries, do the following:
(1) Write the data graph using any suitable RDF syntax (N3, or Turtle). Note:
Avoid using XML syntax as it might need additional effort to bulk-load.
(2) Bulk Load your RDF file into Oracle.
(3) Write the four queries accompanying the example using the SEM_MATCH
function and execute them over the bulk-loaded file.
PART2: Given the RDF graph of Practical Session I and II (also included in the
next slide), do the following:
(1) Write the data graph using any suitable RDF syntax (N3, or Turtle) and bulk-
load it into Oracle.
(2) Write the following queries using the SEM_MATCH function and execute them
over the bulk-loaded file:
• List all the authors born in a country which has the name Palestine.
• List the names of all authors with the name of their affiliation who are born in a
country whose capital‟s population is14M. Note that the author must have an
affiliation.
• List the names of all books whose authors are born in Lebanon along with the name
of the author.
PalGov © 2011 46
BK1
BK2
BK3
BK4
AU1
AU2
AU3
AU4
CN1
CN2
CA1
CA2
Viswanathan
CN3 CA3
Said
Wamadat
The Prophet
Naima
Gibran
Colombia University
Palestine
India
Lebanon
Jerusalem
New Delhi
Beirut
7.6K
2.0M
14.0M
CU
Author
Author
Author
Name
Capital
Capital
Name
Name
This data graph is about books. It talks about four books (BK1-BK4).
Information recorded about a book includes data such as; its author,
affiliation, country of birth including its capital and the population of
its capital.
Practical Session
PalGov © 2011 47
• Each student should work alone.
• In part 2 of this practical session, the student is strongly recommended
to write two additional queries, execute them on the data graph, and
hand them along with the required queries.
• In part 2 of this practical session, the student is encouraged to compare
the results of the queries with those from Practical Session I and II.
• Each student must expect to present and discuss his/her queries at
class and compare them with the work of other students.
• The final delivery should include for every part of the practical session:
(i) A link to the RDF file, (ii) a snapshot of the table where the file was
bulk-loaded, (iii) A snapshot of every query and its results. These must
all be delivered in a report form in PDF format.
Practical Session - Instructions
PalGov © 2011 48
References
• http://www.oracle.com
• Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query
Optimization for the Data Web - Two Disk-Based algorithms: Trace
Equivalence and Bisimilarity.
PalGov © 2011 49
Thank you!