Upload
vivien-moody
View
219
Download
2
Embed Size (px)
Citation preview
Creating Knowledge out of Interlinked Data
Introduction
FP7 project (2010-2014) 15 partners (technology researchers, companies and
service providers) from 11 European countries plus 1 associated partner from Korea
Coordinated by the AKSW research group at the University of Leipzig
Creating Knowledge out of Interlinked Data
Achievements1. Extension of the Web
with a data commons (currently amounting 25 Billion facts)
2. vibrant, global RTD community
3. Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)
4. Emerging governmental adoption in sight
5. Establishing Linked Data as a deployment path for the Semantic Web.
The emerging Web of Data achievements and challenges
Challenges1. Coherence: Relatively
few, expensively maintained links
2. Quality: partly low quality data and inconsistencies
3. Performance: Still substantial penalties compared to relational
4. Data consumption: large-scale processing, schema mapping and data fusion still in its infancy
5. Usability: Missing direct end-user tools and network effect
• Web - a global, distributed platform for data, information and knowledge integration
• exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF
July 2007 April 2008 September 2008
July 2009
Creating Knowledge out of Interlinked Data
Web server
Web server
Problem: Try to search for these things on the current Web:• Apartments near German-Russian bilingual childcare in Leipzig.• ERP service providers with offices in Vienna and London.• Researchers working on multimedia topics in Eastern Europe.Information is available on the Web, but opaque to current Web search.
Why Linked Open Data?
berlin.deHas everything about childcare in Berlin.
Immobilienscout.deKnows all about real estate offers in GermanyDB
Web serverWeb
server
DB
Web server
Search engineSearch engineHTML HTML
RDF RDF
Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:
Creating Knowledge out of Interlinked Data
Objectives of LOD2
Creating Knowledge out of Interlinked Data
Linked DataLifecycle
Challenges
LOD2 in a Nutshell
7
Research focus• Very large RDF data
management• Knowledge Enrichment &
Interlinking• Fusion & Information
Quality• Adaptive, semantic user
interfaces
Use Cases• Media & Publishing• Enterprise Data Webs• Open Gov Data
Main Result• Integrated LOD2-Stack
for Linked Data lifecycle management
PartnerUni Leipzig, CWI, DERI
Galway, FU Berlin, Semantic Web Company, OpenLink, Tenforce, Exalead, Wolters Kluwer, OKFN
LOD2 STACK
Creating Knowledge out of Interlinked Data
LOD2 stack as Debian package repository
LOD2 stack repository is a Debian package repository http://http://stack.lod2.eu/deb/distributions/dists/.
We have chosen a new reference OS: Ubuntu12.04 LTS o This version is supported for the next 5 years.
Changes in repository management system for o enabling quality control (development -> test -> stable)
enabling architecture dependent distribution support (e.g. Virtuoso RDF store) o Public access to documentation
• http://wiki.lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 stack contribution process
Creating Knowledge out of Interlinked Data
LOD2 stack components
Creating Knowledge out of Interlinked Data
Linked Data publishing capabilities currently offered
Covers most of the LOD publishing cycle o Combination of
• locally installed software, • online available software, and • online available data sources as well as data packages • about page in the LOD demonstrator (http://demo.lod2.eu/lod2demo)
LOD2 STACK – ExtractionVirtuoso SpongerD2RQ
Creating Knowledge out of Interlinked Data
Virtuoso Sponger
An RDFizer introduced in Virtuoso 5.0 Provides built-in RDF middleware for transforming
non-RDF data into RDF "on the fly“. You can use non-RDF data sources as Semantic Web
data sources. Inputs: Wide variety of non-RDF Web data sources,
e.g:o (X)HTML Web Pages (including hosted microformats)o Web services (Google, Del.icio.us, Flickr etc.)o Binary files (MS Office, PDF, OpenDocument etc.)
Output: RDF structured data
Creating Knowledge out of Interlinked Data
Inputs: Supported Data Sources
RDF (inc. N3, Turtle)o SIOC, SKOS, FOAF, AtomOWL, Annotea …
(X)HTML pageso HTML header metadata: Dublin Coreo Microformats: eRDF, RDFa, hCard, hCalendar, XFN, xFolk …
Syndication formatso RSS 2.0, Atom, OPML, OCS, XBEL
GRDDL Web service APIs: Google Base, Flickr, Del.icio.us, Ning … Files:
o Binary files: MS Office, OpenOffice, images, audio, video …o Data exchange formats: iCalendar, vCard
3rd party metadata extractors: Aperture, Spotlight, SIMILE RDFizers or add your own!
Creating Knowledge out of Interlinked Data
Output: Structured Data
In the context of the Semantic Data Web:“Data organized into semantic chunks or entities, with similar entities
grouped together into relations or classes”Michael Bergman (http://www.mkbergman.com)Article: “More Structure, More Terminology and (hopefully) More Clarity”
Creating Knowledge out of Interlinked Data
Sponger Benefits
Majority of the world's data resides in non-RDF form at the current time
Sponger provides a “Swiss army knife” for RDF structured data generation from non-RDF sources
Extracting data from non-RDF Web sources and converting it to RDFo helps “bootstrap” the Semantic Webo helps drive the transition of the traditional Document-Web into the
emerging Semantic Data-Webo exposes the data in a canonical form for querying and inference
Creating Knowledge out of Interlinked Data
Sponger Inputs & Outputs
Creating Knowledge out of Interlinked Data
Sponger Architecture
Sponger is comprised of Sponger Cartridges Default cartridge collection is bundled as a Virtuoso VAD Cartridge = Metadata Extractor + Ontology Mapper Metadata extracted from non-RDF resources is mapped to a
suitable ontology by Ontology Mapper to produce Structured Data
Sponger is highly customizable Custom cartridges can be developed
o Using any language (e.g. Virtuoso PL, C/C++, Java) supported by Virtuoso Server Extensions API
Creating Knowledge out of Interlinked Data
D2RQ Platform
System for accessing relational databases as virtual RDF graphs
Offers RDF-based access to the content of relational databases without having to replicate it into an RDF store
Features:• query a non-RDF database using SPARQL• access the content of the database as Linked Data over the
Web• create custom dumps of the database in RDF • access information using the Apache Jena API
Creating Knowledge out of Interlinked Data
D2RQ Platform : Components
The D2RQ Platform consists of: D2RQ Mapping Language, a declarative mapping
language for describing the relation between an ontology and an relational data model.
D2RQ Engine, uses the mappings to rewrite SQL queries against the database and passes query results up to the higher layers of the frameworks
D2R Server, an HTTP server that provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.
Creating Knowledge out of Interlinked Data
Mapping Examples
map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".
map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".
map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”;d2rq:class foaf:Person .
map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”;d2rq:class foaf:Person .
map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .
map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .
LOD2 STACK - OntoWiki
Creating Knowledge out of Interlinked Data
OntoWiki
Ontowiki enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents.o Knowledge Bases (aka. graphs, Linked Data optional)o Generic list and resource viewso Versioningo Commenting on arbitrary resourceso User management + access controlo Inline editingo Navigation hierarchies (e.g. Class hierarchies)
Creating Knowledge out of Interlinked Data
OntoWiki Screenshots
LOD2 STACK - InterlinkingLIMES, SILK
Creating Knowledge out of Interlinked Data
LIME
Declarative Link Discovery Framework Tuned towards efficiency and extensibility Set-theoretical grammar for specifying links Time-efficient mappers for single data types Machine learning for detecting link specs
Creating Knowledge out of Interlinked Data
LIME : Architecture
Creating Knowledge out of Interlinked Data
LIMES Link Specifications
1. Metadata 2. SourceandTarget 3. SimilarityMeasure 4. AcceptanceConditions 5. ReviewConditions 6. ExecutionMode 7. OutputFormat
Creating Knowledge out of Interlinked Data
Silk : Link Discovery Framework
Tool for discovering links between data items within different Linked Data sources.
The Silk Link Specification Language (Silk-LSL) allows to express complex linkage rules
Can be used to generate owl:sameAs links as well as other relationships
Scalability and high performance through efficient data handling
Creating Knowledge out of Interlinked Data
Silk Versions
Silk Single Machine o Generate links on a single machine o Local or remote data sets
Silk MapReduce o Generate RDF links using a cluster of multiple machines o Based on Hadoop (usable with Amazon Elastic MapReduce)
Silk Server o Provides an HTTP API for matching instances from an incoming stream of
RDF data o Can be used as an identity resolution component within applications that
consume Linked Data from the Web
Creating Knowledge out of Interlinked Data
SILK : Linking Workflow
Creating Knowledge out of Interlinked Data
SILK : Linkage Rule Components
LOD2 STACK - InterlinkingLIMES, SILK
Creating Knowledge out of Interlinked Data
LODRefine
LOD-enabled OpenRefine Google Refine ==> OpenRefine LODGrefine ==> LODRefine
o Supporting DBpedia (and Freebase) o Supporting crowdsourcing o Exporting RDF o Extracting named entities
Creating Knowledge out of Interlinked Data
OpenRefine
Creating Knowledge out of Interlinked Data
The Extensions
Extend functionalities of OpenRefineo RDF Refine extension
• Reconciliation and interlinking • Exporting RDF
o DBpedia extension • Extending reconciled data with columns from DBpedia • Extracting Named Entities using Zemanta API
o NER extension • Extracts named entities from unstructured text
o Crowdsourcing extension Developed by
o Zemanta: DBpedia extension, Crowdsourcing o DERI: RDF Refine o Free Your Metadata Group: Named Entity Extraction extension
Creating Knowledge out of Interlinked Data
References
LOD2 Webinar: The 2nd release of the LOD2 stack LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: LIMES LOD2 Webinar Series: SILK LOD2 Webinar Series: OntoWiki Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources LOD2 Webinar Series: D2R and Sparqlify
LOD2 HomePage, http://stack.lod2.eu/blog/ LOD2 Prototype, http://demo.lod2.eu/lod2demo