View
114
Download
5
Category
Preview:
Citation preview
LODInterLinking
선행기술연구팀장 이경욱
국가적차원공공 Data 공개Linked Open Data 가무엇인지?Linked Data의기본원칙RDF GRAPH MODEL국외의경우LOD CLOUD현황
CONTENTSInterLinking에앞서01
InterLinking예시
InterLinking이란02
Data 중복구축방지Data 중복구축방지활용예잠재적지식발견및지식의확장
InterLinking의필요성03
인터링킹방법인터링킹시스템인터링킹목표
InterLinking의자동화04
국가 DB open 투명성확보데이터의창조적활용
InterLinking에앞서 -국가적차원공공 Data 공개
InterLinking에앞서 -국가적차원공공 Data 공개
국가 DB open 투명성확보데이터의창조적활용
InterLinking에앞서 -
개방
공공정보를적극적으로개방
누구나자유롭게활용하도록
Linked Open Data 로제공 융합
융·복합연계체계구축원천데이터를 Linked Open
Data(LOD) 기반으로통합
재활용
정보제공환경마련
Linked Open Data 개방·연계·
활용플랫폼제공창조
새로운콘텐츠창출
개방된공공정보를민간에서
타분야지식정보와 Cross-Over
하여신규서비스개발
정부
국가적차원공공 Data 공개
InterLinking에앞서 - HTML Linked Open Data 가무엇인지?
Resource
Resource
Resource
Resource
Resource
Resource
Resource
링크
링크 링크
링크
링크링크
링크
문서중심의Web(Web of Documents) – HTML (Hyperlink)
InterLinking에앞서 - HTML Linked Open Data 가무엇인지?
Human Readable
예종
숙종
1054.07
문종
왕옹
nikh:hasFather nikh:hasGrandFather
nikh:realName
Nikh:hasFather
nikh:birthDate 경릉(景陵)
nikh:tombPlace
InterLinking에앞서 - RDF Linked Open Data 가무엇인지?
Data(Things) 중심의Web(Web of data)– RDF (데이터간의연계, 의미부여)
InterLinking에앞서 - RDF Linked Open Data 가무엇인지?
Machine Readable
1) Use URIs as names for things
2) Use HTTP URIs so that people can look up those names.
3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
4) Include links to other URIs. so that they can discover more things.
InterLinking에앞서 -Linked Open Data 가무엇인지?
Linked Data의기본원칙 4가지 –팀버너스리
주어(Subject)
목적어(Object)
술어(Predicate)
주어(Subject) 술어(Predicate) 목적어(Object) 예종의 아버지는 숙종이다
<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>
</RDF:Description>
Subject
Predicate Object
InterLinking에앞서 - RDF GRAPH MODELLinked Open Data 가무엇인지?
InterLinking에앞서 - RDF GRAPH MODELLinked Open Data 가무엇인지?
InterLinking에앞서 –국외의경우Linked Open Data 가무엇인지?
InterLinking에앞서 –국외의경우Linked Open Data 가무엇인지?
InterLinking에앞서 –국외의경우Linked Open Data 가무엇인지?
2008
ADDTITLE
ADDTITLE
20092010
2011
InterLinking에앞서 – LOD CLOUD 현황Linked Open Data 가무엇인지?
Version 0.3, 09/19/2011
InterLinking에앞서 – LOD CLOUD현황Linked Open Data 가무엇인지?
Version 0.3, 09/19/2011
InterLinking이란
<http://dbpedia.org/resource/Amsterdam>
owl:sameAs <http://rdf.freebase.com/ns/...> ;
owl:sameAs <http://sws.geonames.org/2759793> ;
...
<http://sws.geonames.org/2759793>
owl:sameAs <http://dbpedia.org/resource/Amsterdam>
wgs84_pos:lat “52.3666667” ;
wgs84_pos:long “4.8833333” ;
geo:inCountry <http://www.geonames.org/countries/#NL> ;
...
Dbpedia DataSet에있는암스테르담과Geonames DataSet에있는 2759793(암스테르담)을owl:sameAs로인스턴스동일화
<http://dbpedia.org/resource/Amsterdam>
owl:sameAs <http://sws.geonames.org/2759793>;
InterLinking의예시
LOD 구축에있어서인터링킹의필요성Data 중복구축방지
Relational databases: primary keys
Books
TitleAuthorYear
IDAuthors
NameYear
ID
Primary key Primary key
Foreign key
Authors record
Dan Brown1964
456IDNameYear
The Da Vinci Code
Books record
4562003
1289TitleID
Author
Year
Data 중복구축방지
LOD 구축에있어서인터링킹의필요성
Relational databases and applications
Select title, year from booksSelect name, year from authors where books.author=authors.id
Title: The Da Vinci CodeAuthor: Dan Brown, 1964Year: 2003
Database
Application
User interface
Authors record
Dan Brown1964
456IDNameYear
The Da Vinci Code
Books record
4562003
1289TitleID
AuthorYear
SQL
Data 중복구축방지
LOD 구축에있어서인터링킹의필요성
OpenLibrary
TitleAuthorYear
URI
VIAF
NameYear
URI
Primary key Primary key
Foreign key
Authors record
Dan Brown
1964
http://viaf.org/viaf/102403515 URI
Name
Year
The Da Vinci Code
Books record
http://viaf.org/viaf/102403515
2003
http://openlibrary.org/works/OL76837W
Title
URI
Author
Year
Data 중복구축방지
LOD 구축에있어서인터링킹의필요성
Triple Repository: URIs(primary keys)
Linked data and applications
Select ?title ?year …Select ?name ?year WHERE …..
Title: The Da Vinci CodeAuthor: Dan Brown, 1964Year: 2003
Database
Application
User interface
SPARQL
Authors record
Dan Brown
1964
http://viaf.org/viaf/102403515 URI
Name
Year
The Da Vinci Code
Books record
http://viaf.org/viaf/102403515
2003
http://openlibrary.org/works/OL76837W
Title
URI
Author
Year
Data 중복구축방지
LOD 구축에있어서인터링킹의필요성
Data 중복구축방지활용예 – BBC Music Site
Artist Profile
Artist Biography
LOD 구축에있어서인터링킹의필요성
잠재적지식발견및지식의확장
LOD 구축에있어서인터링킹의필요성
<RDF:Description RDF:about="http://www.history.go.kr/ontology/사건_ 거란, 만주족 전쟁 "><nikh:isCausedBy RDF:datatype="http://www.w3.org/2001/XMLSchema#string">매(海東靑)</nikh:title><nikh:hasStartAge RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:hasStartAge><nikh:beginDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:beginDate><nikh:hasEventPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:hasEventPlace><RDF:type RDF:resource="http://www.history.go.kr/ontology/event"/>
</RDF:Description>
지식의확장: 매사냥으로인한거란, 만주족간의전쟁유발
<RDF:Description RDF:about="http://www.biology.go.kr/ontology/조류"><nikh:hasName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">매(海東靑)</nikh:title><nikh:isCategory RDF:datatype="http://www.w3.org/2001/XMLSchema#string">척삭동물</nikh:hasStartAge><nikh:isSpecies RDF:datatype="http://www.w3.org/2001/XMLSchema#string">매과</nikh:beginDate><nikh:isLivedIn RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:hasEventPlace><RDF:type RDF:resource="http://www. biology.go.kr/ontology/event"/>
</RDF:Description>
owl:sameAs
역사
의약특허
생물약초
잠재적지식발견및지식의확장
LOD 구축에있어서인터링킹의필요성
엄청난양의 LOD Cloud01
비효율적인 LOD Link02
InterLinking자동화란
효율적인 Linking 추천03자동으로 Source DataSet에서의미있는인스턴스를추출하고 Target DataSet로부터최대로유사한인스턴스를찾아추천해주는시스템필요
InterLinking자동화란인터링킹방법
Schema DependentRDF Predicate의의미에관한지식이필요Ex) Source DataSet의 Predicate #PreLable와 Target DataSet의 Predicate #Name과같다는것을알아야한다
Publisher 마다다른 Schema 구조로데이터를저장발행
Schema Independent스키마에대한인간의지식을필요하지않음
Ontology Matching Graph Matching
Instance Matching Data Matching
인터링킹방법
인터링킹시스템 - SERIMI
시스템 비교 KEY 차별성 알고리즘 절차 예시
SERIMI Predicate String Matching(RWSA) Algorithm
1) Source DataSet의 Class를 선택2) Class의 인스턴스를 선택3) 그 인스턴스의 Predicate를 선
택4) High Entropy 들만 선택5) Property List를 생성6) Target DataSet도 동일 수행7) Predicate으로 같거나 비슷한
Predicate를 탐색8) 탐색된 Property의 값을 본 후
Interlinking 할지 말지 결정9) 결정되면 sameAs
4,5,6,7,9) 다음페이지 참고
InterLinking자동화란
Schema Independent
<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>
</RDF:Description>
Subject
Predicate Object
Property List 생성realNamebirthDatedeathDate
High EntropyPredicate 선택
realName (High Entropy)birthDate (High Entropy)deathDate (High Entropy)
tombPlace (Low Entropy)
Target DataSet에서도동일수행
namebDatedDate
InterLinking<http://source.dataset.org/resource/왕우> owl:sameAs <http://target.dataset.org/왕우>;
같거나비슷한Predicate 탐색
realName = namebirthDate = bDatedeathDate = dDate
1Step 2Step 3Step 4Step 5Step
<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>
</RDF:Description>
Subject
Predicate Object
인터링킹시스템 - SERIMI
InterLinking자동화란
시스템 비교 KEY 차별성 알고리즘 절차 예시
SLINT Predicate BlockingStep
CoverageDiscriminabilityDice CoefficientTF-IDFInverted-Indexing(Weighted Co-occurrence)
1) 중요한 Predicate를 선택 -Coverage & Discriminability
2) Source DataSet과 Target DataSet에서 선택된 Predicate들은 같은 Type 끼리 결합해서Predicate Alignment를 생성
3) Predicate Alignment의 신뢰도평가 – Dice Coefficent
4) 각각의 Source, Target DataSet으로 부터 Object의 값을 추출해서 Inverted-Indexing
5) URI, String – TF-IDF6) Decimal, Integer, Date – 0/17) 적정 Threshold 이상 sameAs
3) 유사한 Predicate는 유사한 정보를 의미한다Ex) title <-> titleKor
인터링킹시스템 - SLINT
InterLinking자동화란
Schema Independent
<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>
</RDF:Description>
Subject
Predicate Object
Predicate Alignment를생성Type:string || S:realName, D:nameType:date || S:birthDate, D:bDateType:Integer || S:age, D:age
중요한Predicate 선택
realName (Coverage) birthDate(Coverage) deathDate (Coverage)
tombPlace (Discriminability)
Predicate Alignment의신뢰도평가
Ex) title <-> titleKorString: TokenURI: ‘/’ Split
Decimal: 2-decimal digitInteger, Date: 변경없음
InterLinking<http://source.dataset.org/resource/왕우> owl:sameAs <http://target.dataset.org/왕우>;
Object의값추출Ex) 숙종, 이순신, 강감찬, …
Inverted-IndexingURI, String 이면 TF-IDF
Decimal, Integer, Date 이면 0/1
1Step 2Step 3Step 4Step 5Step
<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>
</RDF:Description>
Subject
Predicate Object
인터링킹시스템 - SLINT
InterLinking자동화란
시스템 비교 KEY 차별성 알고리즘 절차 예시
SILK Predicate
AgreeMaker Predicate
인터링킹시스템 – SILK, AgreeMaker
InterLinking자동화란
Schema Independent
InterLinking자동화란인터링킹시스템 – SILK 관리도구
InterLinking자동화란인터링킹시스템 – SILK 관리도구
InterLinking의목표
InterLinking의목표
THANK YOU
Recommended