38
LOD2 Introduction [email protected] 서서서서서 BIKE lab

LOD2 Introduction [email protected] 서울대학교 BIKE lab

Embed Size (px)

Citation preview

Page 1: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 Introduction

[email protected]서울대학교 BIKE lab

Page 2: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Introduction

FP7 project (2010-2014) 15 partners (technology researchers, companies and

service providers) from 11 European countries plus 1 associated partner from Korea

Coordinated by the AKSW research group at the University of Leipzig

Page 3: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Achievements1. Extension of the Web

with a data commons (currently amounting 25 Billion facts)

2. vibrant, global RTD community

3. Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)

4. Emerging governmental adoption in sight

5. Establishing Linked Data as a deployment path for the Semantic Web.

The emerging Web of Data achievements and challenges

Challenges1. Coherence: Relatively

few, expensively maintained links

2. Quality: partly low quality data and inconsistencies

3. Performance: Still substantial penalties compared to relational

4. Data consumption: large-scale processing, schema mapping and data fusion still in its infancy

5. Usability: Missing direct end-user tools and network effect

• Web - a global, distributed platform for data, information and knowledge integration

• exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF

July 2007 April 2008 September 2008

July 2009

Page 4: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Web server

Web server

Problem: Try to search for these things on the current Web:• Apartments near German-Russian bilingual childcare in Leipzig.• ERP service providers with offices in Vienna and London.• Researchers working on multimedia topics in Eastern Europe.Information is available on the Web, but opaque to current Web search.

Why Linked Open Data?

berlin.deHas everything about childcare in Berlin.

Immobilienscout.deKnows all about real estate offers in GermanyDB

Web serverWeb

server

DB

Web server

Search engineSearch engineHTML HTML

RDF RDF

Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:

Page 5: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Objectives of LOD2

Page 6: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Linked DataLifecycle

Challenges

Page 7: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 in a Nutshell

7

Research focus• Very large RDF data

management• Knowledge Enrichment &

Interlinking• Fusion & Information

Quality• Adaptive, semantic user

interfaces

Use Cases• Media & Publishing• Enterprise Data Webs• Open Gov Data

Main Result• Integrated LOD2-Stack

for Linked Data lifecycle management

PartnerUni Leipzig, CWI, DERI

Galway, FU Berlin, Semantic Web Company, OpenLink, Tenforce, Exalead, Wolters Kluwer, OKFN

Page 8: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 STACK

Page 9: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LOD2 stack as Debian package repository

LOD2 stack repository is a Debian package repository http://http://stack.lod2.eu/deb/distributions/dists/.

We have chosen a new reference OS: Ubuntu12.04 LTS o This version is supported for the next 5 years.

Changes in repository management system for o enabling quality control (development -> test -> stable)

enabling architecture dependent distribution support (e.g. Virtuoso RDF store) o Public access to documentation

• http://wiki.lod2.eu

Page 10: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LOD2 stack contribution process

Page 11: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LOD2 stack components

Page 12: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Linked Data publishing capabilities currently offered

Covers most of the LOD publishing cycle o Combination of

• locally installed software, • online available software, and • online available data sources as well as data packages • about page in the LOD demonstrator (http://demo.lod2.eu/lod2demo)

Page 13: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 STACK – ExtractionVirtuoso SpongerD2RQ

Page 14: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Virtuoso Sponger

An RDFizer introduced in Virtuoso 5.0 Provides built-in RDF middleware for transforming

non-RDF data into RDF "on the fly“. You can use non-RDF data sources as Semantic Web

data sources. Inputs: Wide variety of non-RDF Web data sources,

e.g:o (X)HTML Web Pages (including hosted microformats)o Web services (Google, Del.icio.us, Flickr etc.)o Binary files (MS Office, PDF, OpenDocument etc.)

Output: RDF structured data

Page 15: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Inputs: Supported Data Sources

RDF (inc. N3, Turtle)o SIOC, SKOS, FOAF, AtomOWL, Annotea …

(X)HTML pageso HTML header metadata: Dublin Coreo Microformats: eRDF, RDFa, hCard, hCalendar, XFN, xFolk …

Syndication formatso RSS 2.0, Atom, OPML, OCS, XBEL

GRDDL Web service APIs: Google Base, Flickr, Del.icio.us, Ning … Files:

o Binary files: MS Office, OpenOffice, images, audio, video …o Data exchange formats: iCalendar, vCard

3rd party metadata extractors: Aperture, Spotlight, SIMILE RDFizers or add your own!

Page 16: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Output: Structured Data

In the context of the Semantic Data Web:“Data organized into semantic chunks or entities, with similar entities

grouped together into relations or classes”Michael Bergman (http://www.mkbergman.com)Article: “More Structure, More Terminology and (hopefully) More Clarity”

Page 17: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Sponger Benefits

Majority of the world's data resides in non-RDF form at the current time

Sponger provides a “Swiss army knife” for RDF structured data generation from non-RDF sources

Extracting data from non-RDF Web sources and converting it to RDFo helps “bootstrap” the Semantic Webo helps drive the transition of the traditional Document-Web into the

emerging Semantic Data-Webo exposes the data in a canonical form for querying and inference

Page 18: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Sponger Inputs & Outputs

Page 19: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Sponger Architecture

Sponger is comprised of Sponger Cartridges Default cartridge collection is bundled as a Virtuoso VAD Cartridge = Metadata Extractor + Ontology Mapper Metadata extracted from non-RDF resources is mapped to a

suitable ontology by Ontology Mapper to produce Structured Data

Sponger is highly customizable Custom cartridges can be developed

o Using any language (e.g. Virtuoso PL, C/C++, Java) supported by Virtuoso Server Extensions API

Page 20: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

D2RQ Platform

System for accessing relational databases as virtual RDF graphs

Offers RDF-based access to the content of relational databases without having to replicate it into an RDF store

Features:• query a non-RDF database using SPARQL• access the content of the database as Linked Data over the

Web• create custom dumps of the database in RDF • access information using the Apache Jena API

Page 21: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

D2RQ Platform : Components

The D2RQ Platform consists of: D2RQ Mapping Language, a declarative mapping

language for describing the relation between an ontology and an relational data model.

D2RQ Engine, uses the mappings to rewrite SQL queries against the database and passes query results up to the higher layers of the frameworks

D2R Server, an HTTP server that provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.

Page 22: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Mapping Examples

map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".

map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".

map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”;d2rq:class foaf:Person .

map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”;d2rq:class foaf:Person .

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .

Page 23: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 STACK - OntoWiki

Page 24: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

OntoWiki

Ontowiki enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents.o Knowledge Bases (aka. graphs, Linked Data optional)o Generic list and resource viewso Versioningo Commenting on arbitrary resourceso User management + access controlo Inline editingo Navigation hierarchies (e.g. Class hierarchies)

Page 25: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

OntoWiki Screenshots

Page 26: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 STACK - InterlinkingLIMES, SILK

Page 27: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LIME

Declarative Link Discovery Framework Tuned towards efficiency and extensibility Set-theoretical grammar for specifying links Time-efficient mappers for single data types Machine learning for detecting link specs

Page 28: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LIME : Architecture

Page 29: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LIMES Link Specifications

1. Metadata 2. SourceandTarget 3. SimilarityMeasure 4. AcceptanceConditions 5. ReviewConditions 6. ExecutionMode 7. OutputFormat

Page 30: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Silk : Link Discovery Framework

Tool for discovering links between data items within different Linked Data sources.

The Silk Link Specification Language (Silk-LSL) allows to express complex linkage rules

Can be used to generate owl:sameAs links as well as other relationships

Scalability and high performance through efficient data handling

Page 31: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

Silk Versions

Silk Single Machine o Generate links on a single machine o Local or remote data sets

Silk MapReduce o Generate RDF links using a cluster of multiple machines o Based on Hadoop (usable with Amazon Elastic MapReduce)

Silk Server o Provides an HTTP API for matching instances from an incoming stream of

RDF data o Can be used as an identity resolution component within applications that

consume Linked Data from the Web

Page 32: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

SILK : Linking Workflow

Page 33: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

SILK : Linkage Rule Components

Page 34: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

LOD2 STACK - InterlinkingLIMES, SILK

Page 35: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

LODRefine

LOD-enabled OpenRefine Google Refine ==> OpenRefine LODGrefine ==> LODRefine

o Supporting DBpedia (and Freebase) o Supporting crowdsourcing o Exporting RDF o Extracting named entities

Page 36: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

OpenRefine

Page 37: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

The Extensions

Extend functionalities of OpenRefineo RDF Refine extension

• Reconciliation and interlinking • Exporting RDF

o DBpedia extension • Extending reconciled data with columns from DBpedia • Extracting Named Entities using Zemanta API

o NER extension • Extracts named entities from unstructured text

o Crowdsourcing extension Developed by

o Zemanta: DBpedia extension, Crowdsourcing o DERI: RDF Refine o Free Your Metadata Group: Named Entity Extraction extension

Page 38: LOD2 Introduction jordse@gmail.com 서울대학교 BIKE lab

Creating Knowledge out of Interlinked Data

References

LOD2 Webinar: The 2nd release of the LOD2 stack LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: LIMES LOD2 Webinar Series: SILK LOD2 Webinar Series: OntoWiki Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources LOD2 Webinar Series: D2R and Sparqlify

LOD2 HomePage, http://stack.lod2.eu/blog/ LOD2 Prototype, http://demo.lod2.eu/lod2demo