Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
KnowledgeGraphCurationwith DeepLearningandSemanticReasoning
JiaoyanChen(陈矫彦)DepartmentofComputerScience,UniversityofOxford
[email protected]://www.cs.ox.ac.uk/isg/people/jiaoyan.chen/
08/11/2019 1
KnowledgeGraph(KG)
• The KnowledgeGraph isa knowledgebase usedby Google anditsservicestoenhanceits searchengine'sresultswithinformationgatheredfromavarietyofsources.
• Proposedaround2012
-------- FromWikipedia
08/11/2019 2
Semantic Web• The SemanticWeb isanextensionofthe WorldWideWeb throughstandardsbythe WorldWideWebConsortium (W3C)
• URI:UniformResourceIdentification• http://dbpedia.org/resource/University_of_Oxford
• RDF (ResourceDescription Framework)• Triple:<Subject, Property, Object>
• RDF Schema• e.g., class hierarchy, property domain and range
• Web Ontology Language (OWL)• SPARQL (RDF query language)• SemanticReasoning
08/11/2019 3
KG: RDF triples + RDF Schema orOWLOntology
IanHorrocks
UniversityofOxford Oxford
worksFor
locatedIn
ProfessorisA Person
subClassOf
UKlocatedIn
KGApplications
• Searchengines
• Chatrobotse.g.,AlexaandSiri
• Dataanalyticsandmachinelearning e.g., explanation
• Intelligentfinance
• Knowledgeengineering
• ...
08/11/2019 4
From DataCurationtoKnowledgeGraphCuration
• Datacuration (策展):Aseriesofdatamanagementandcleaningtaskse.g.,organization,integration, annotation,publication,maintenance,reuse,preservation,etc.
• FromdatacurationtoKG Curation:• Knowledge boosting
• E.g.,extractRDFtriplesfromtext,tabulardata• Refinement
• Canonicalization,error detection and correction• Qualitymeasurement• Etc.
08/11/2019 5
CaseStudy
StockTicker Name GICSSector
AMZN Amazon …
GOOG Alphabet …
Basic_Info
Stock URL CEOName Country
AMZN www.amazon.com JeffBezos USA
Background_Info
Ticker Date Income($) Liabilities
AMZN 06/01/2018 177.86 billion ...
Finance_Result
(Soft)ForeignKeyDiscovery
• W3CRDBtoRDF(Ontology)Mapping• table“Basic_Info”intoontologyclass,• foreignkeyintoobjectproperty• column“Income”intodataproperty,• column“CEO_Name”intodataproperty,• Etc.
Basic_Info
Background_Info
has_background
has_name
has_ceo_name
Finance_Result
has_finance_result
xsd:inthas_income
xsd:string
xsd:string
08/11/2019 7
However, suchstraightforwardmatchingbrings limitedsemantics.e.g.,Whatdoes“JeffBezos”mean?
CaseStudy
StockTicker Name GICSSector
AMZN Amazon …
GOOG Alphabet …
Basic_Info
Stock URL CEOName Country
AMZN www.amazon.com JeffBezos USA
Background_Info
Ticker Date Income($) Liabilities
AMZN 06/01/2018 177.86 billion ...
Finance_Result
(Soft)ForeignKeyDiscovery
• W3CRDBtoRDF(Ontology)Mapping (e.g.,BootOX)• table“Basic_Info”intoontologyclass,• foreignkeyintoobjectproperty• column Incomeintodataproperty,• column“CEO_Name”intodataproperty,• Etc.
• Semanticmatching:• Column"ColumnName”byclassCompany,column“CEO_Name”by
classPerson• Columns“Stock”and“CEO_Name”bypropertyhas_ceo,• Cell“JeffBezos” byentityJeff_Preston_Bezos• Etc.
Basic_Info
Background_Info
has_background
Companyhas_company
Personhas_ceo
Finance_Result
has_finance_result
xsd:inthas_income
ExternalKGs
works_for(LinkPrediction)
08/11/2019 8
TasksofTable Matching
• Task #1:EntityColumnà Class• e.g.,Column“CEOName”toClassCEOandClassPerson• equaltoTableà Classmatching iftheentitycolumnisa primarykey
• Task #2:Cellà Entity• e.g.,“Jeff Bezos”toEntityJeff_Preston_Bezos• equalto Rowà Class matching if the column is a primary key
• Task #3:ColumnPairà Property• e.g.,“Stock”and“CEOName”à propertyhas_ceo• includesdatapropertyandobjectproperty• two columns can come from two tables• columnordermatters
08/11/2019 9
RelatedWork(1)-- ProbabilisticGraphicalModel(PGM)
• Steps• Modelstasksof#1,#2and#3withaPGM (e.g., Markov Random Fields):variablefortablecomponent;valueforKGcomponent
• E.g., celltoentitybystringsimilarity,columnheadertoclass• Searchesforanassignment ofvaluestothevariablestomaximizethejointprobability
• Examples• [Limayeetal.2010,Mulwad etal.2013,Bhagavatula etal.2015,Takeoka etal.2019]
• Limitations:• Mostlyrelyonmetadata• Lowscalabilityandefficiency
• Limaye,G.,Sarawagi,S.,&Chakrabarti,S.(2010).Annotatingandsearchingwebtablesusingentities,typesandrelationships.ProceedingsoftheVLDBEndowment,3(1–2),1338–1347.• Mulwad,V.,Finin,T.,&Joshi,A.(2013).SemanticMessagePassingforGeneratingLinkedDatafromTables.InISWC.• Bhagavatula,C.,Noraset,T.,&Downey,D.(2015).TabEL:EntityLinkinginWebTables.ISWC (Vol.9366).• Venetis,P.,Halevy,A.,Madhavan,J.,Paşca,M.,Shen,W.,Wu,F.,…Wu,C.(2011).Recoveringsemanticsoftablesontheweb.ProceedingsoftheVLDBEndowment,4(9),528–538.• Chu,X.,Morcos,J.,Ilyas,I.F.,Ouzzani,M.,Papotti,P.,Tang,N.,&Ye,Y.(2015).KATARA: ADataClearning SystemPoweredbyKnowledgeBaseandCrowdsourcing.InSIGMOD (Vol.8,
pp.1247–1261).NewYork,NewYork,USA:ACMPress.• Takeoka,K.,Oyamada,M.,Nakadai,S.,&Okadome,T.(2019).Meimei:AnEfficientProbabilisticApproachforSemanticallyAnnotatingTables.InAAAI.
1008/11/2019
RelatedWork(2)-- Iterative Approach
• Abootstrappingapproachwithtwoiterations(TableMiner+ [Zhang 2017]):• Getsomeinitialmatchingsusingpartialdatafromthetable e.g.,columntypebycolumnheader
• Usestheinitialmatchingsasconstraintstointerprettherestmatchingsofthetable
• Multipleiterationswithanoverallscore(T2KMatch[Ritze 2015])
• Limitations:• Performancedropswithoutmetadata• Hardtocapturethecontextualsemantics
• Zhang,Z.(2017).EffectiveandefficientSemanticTableInterpretationusingTableMiner+.SemanticWeb,8(6),921–957.• Ritze,D.,Lehmberg,O.,&Bizer,C.(2015).MatchingHTMLTablestoDBpedia.InProceedingsofthe5thInternationalConferenceonWebIntelligence,MiningandSemantics-WIMS
’15 (pp.1–6).NewYork,NewYork,USA:ACMPress.• Syed,Z.,Finin,T.,Mulwad,V.,&Joshi,A.(2010).ExploitingaWebofSemanticDataforInterpretingTables.ProceedingsoftheWebSci10:ExtendingtheFrontiersofSocietyOn-Line,
April26-27th,2010,Raleigh,NC:US,(April),26–27.• Mulwad,V.,Finin,T.,Syed,Z.,&Joshi,A.(2010).Usinglinkeddatatointerprettables.ProceedingsofthetheFirstInternationalWorkshoponConsumingLinkedData,665.
1108/11/2019
RelatedWork(3)--DeepLearningfortheSemantics
• Disambiguate by learning representation of the context of the KG and the table• [Efthymiou 2017]: semantic embeddings of KG entities• [Luo 2018]: learn table features of cells
• Limitations• Thecontexthasnotbeenfullyexplored(e.g.,semanticsofsurroundingrelations)• Focus more on cell to entity matching, but lesson columntype matching and relationmatching
• Need labeledsamplesets(laborcost,generalization, transferability)
• Efthymiou,V.,Hassanzadeh,O.,Rodriguez-Muro,M.,&Christophides,V.(2017).Matchingwebtableswithknowledgebaseentities:Fromentitylookupstoentityembeddings.InISWC(Vol.10587LNCS,pp.260–277).
• Luo,X.,Luo,K.,Chen,X.,&Zhu,K.Q.(2018).Cross-LingualEntityLinkingforWebTables.InAAAI.• Nishida,K.,Sadamitsu,K.,Higashinaka,R.,&Matsuo,Y.(2017).UnderstandingtheSemanticStructuresofTableswithaHybridDeepNeuralNetworkArchitecture.AAAI,168–174.
Retrievedfromwww.aaai.org• Fetahu,etal.(2019),TableNet:AnApproachforDeterminingFine-grainedRelationsforWikipediaTables.WWW
1208/11/2019
RelatedWork(4)-- From cell to entity and knowledge gap
• A straight forward solution: match cells to entities, and then determine thetype by majority voting
• e.g., [Zwicklbauer 2013]
• Knowledge gap: When(apartof)cellshavenoentitycorrespondences• [Pham 2016]:compare a target column with a set of labeled columns by machine learning
• It’s hard togetenough labeled columns• [Quercini 2013]:Predict the type with web page queried by a search engine
• It introduces additional noise and irrelevant content, both of which changesovertime
• Zwicklbauer,S.,Einsiedler,C.,Granitzer,M.,&Seifert,C.(2013).Towardsdisambiguatingwebtables.InISWC(Poster) (Vol.1035,pp.205–208).• Efthymiou,V.,Hassanzadeh,O.,Rodriguez-Muro,M.,&Christophides,V.(2017).Matchingwebtableswithknowledgebaseentities:Fromentitylookupstoentityembeddings.InISWC
(Vol.10587LNCS,pp.260–277).• Luo,X.,Luo,K.,Chen,X.,&Zhu,K.Q.(2018).Cross-LingualEntityLinkingforWebTables.InAAAI.• Pham,M.,Alse,S.,Knoblock,C.A.,&Szekely,P.(2016).SemanticLabeling:ADomain-IndependentApproach.InISWC (pp.446–462).• Quercini,G.,Reynaud-Delaître,C.,&Reynaud,C.(2013).EntityDiscoveryandAnnotationinTables.EDBT/ICDT.
1308/11/2019
Problem: ColumnTypePrediction• Problem
• Annotatingacolumnwhosecellsaretextphrases(i.e.,entitymentions)withclassesofaKG,withoutassuminganymetadata(e.g.,columnheaders)
• Example• Annotate the followingcolumn “Name” by classes Company and ITCompany
08/11/2019 15
Name Employee # CEO Income($)Google 85,050 Sundar Pichar …
Amazon 613,300 Jeff Bezos …
Apple 132,000 Arthur D. Levinson ...
BlackBerry 4,044 John S. Chen ...
Orange 155, 202 Stéphane Richard ... A Web TableExample
Difficulty in Column TypePrediction
• Multipleandhierarchicaloutput classes• Identifyfine-grainedbut completely correct classes
• Intheaboveexample:Company (√),ITCompany (√√),USCompany(⨉ )
• Disambiguation• “BlackBerry”to“BlackBerry Limited (Company)”or“BlackBerry (MobilePhone)” or “Blackberry (Fruit)”
• ColumncellsmayhavefeworevenemptyKBentitycorrespondences(KnowledgeGap)
08/11/2019 16
ColNetinaNutshell
• Doesnot assumetheexistenceofmetadata
• ColumnClassifier• ConvolutionalNeuralNetworks(CNNs)toexplorecontextualsemantics i.e.,inter-cellcorrelations
• Knowledge-basedlearning• ReliesonKGlookupandSPARQLquery(reasoning)forautomaticsampling• Transferlearningtobridgetheknowledgegap
08/11/2019 17
ColNetFramework
08/11/2019 18
HierarchicalClassesKG
Entities&Facts
Entities&CandidateClasses
ColumnClassifiers
Samples
TabularData
ClassAnnotations
EstimatedMatching
Predict&Type
Step#1:EntityLookup
08/11/2019 19
MatchedEntities
CandidateClassesLookupColumns
Entities&Triples Schema KnowledgeGraph
Amazon
Apple
BlackBerry
Orange
Example
“Google”,“Amazon.com”,“Amazonrainforest”“AppleInc.”,“Apple(fruit)”,“OSX”“BlackBerryLimited”,“Blackberry(fruit)”“Orange(fruit)”,“OrangeCounty”,“OrangeS.A.”,etc.
Company,ITCompany,USCompany, Fruit,Software,Forest,AdministrativeRegion,Etc.
Step#2:Sampling&Training
08/11/2019 20
Entities&Triples Schema KnowledgeGraph
CandidateClasses
SamplingMatchedEntities
ParticularEntities
GeneralEntities
NegativeEntities
SyntheticColumns CNNs
Particularentities:“Amazon.com”,“Google”,“AppleInc.”,“BlackBerryLimited”,etc.
CandidateClass
Example:USCompany
Generalentities:“TwitterInc.”,“Facebook,Inc.”,“USAirways”,“FordMotorCompany”,etc.
Negativeentities:“OSX”,“OrangeS.A.”,“Orange(fruit)”,“Apple(fruit)”,“Roll-RoyceLimited”,etc.
”Google”
“Amazon.com”
“AppleInc.”
“BlackBerryLimited”
SyntheticColumn
AbinaryclassifierforUS
Company
Step#2:Sampling&Training(somedetails)• One-vs-rest
• Unfixedcandidateclasses,multipleand hierarchical outputclasses
• Syntheticcolumn(sizefixed):learninter-cellcorrelationsbyCNNs• E.g.,Givenacolumn[“Google”,“BlackBerry”,“Apple”],predictingcellbycellgivesafinalscorefrom0.33to0.66toCompany,whileconsideringtheircorrelationsincreasesthescoretoaround1.0
• Transferlearningandautomaticsampling• Pre-trainingwithgeneralentities;fine-tuningwithparticularentities
• Negativesampling• Entitiesofdisjointclasses• Givehigherprioritytoclassesthatappeartogetherinacolumn
08/11/2019 21
Step#3:Prediction&Typing(Example1)
08/11/2019 23
Amazon
Apple
BlackBerry
Orange
Amazon
Apple
BlackBerry
Amazon
Apple
BlackBerry
OrangeSyntheticColumns
Company:0.9ITCompany:0.78USCompany:0.38Fruit:0.08Software: 0.03Forest:0.02AdministrativeRegion:0.01Etc.
Predictedscore:𝑝"
Predict&Average
KGLookup&Vote
Company:1.0ITCompany:1.0USCompany:0.8Fruit:0.6Forest:0.2Etc.
Votedscore:𝑣"
Ensemble
Company:1.0ITCompany:1.0USCompany:0.38Fruit:0.32Software: 0.03Forest:0.02AdministrativeRegion:0.01Etc.
Threshold:0.5
top-2matchingoverDBpedia
𝜎% = 0.9𝜎* = 0.1
AtargetcolumnofITcompanies
Lowknowledgegap
Step#3:Prediction&Typing(Example2withbigknowledgegap)
08/11/2019 24
Oxford SemanticTechnology
DeepReason.ai
Oxstem
Oxbotica
Tripadvisor
Company:0.65ITCompany:0.45University:0.21ResearchInstitute:0.51Etc.
Predictedscore:𝑝"
Predict&Average
KGLookup&Vote
Company:0.2ITCompany:0.2University:0ResearchInstitute:0Etc.
Votedscore:𝑣"
Ensemble
Company: 0.65ITCompany: 0.45University:0ResearchInstitute:0Etc.
Threshold:0.5
top-2matchingoverDBPedia 𝜎% = 0.9
𝜎* = 0.1
Atargetcolumnwith largeknowledgegap
Highknowledgegap
Evaluation #1: WebTable
• DBpedia as the KG, DBPedia lookup service, SPARQL endpoint• Word2vec, trained byWikipedia article dump
• Limaye (tables fromWikipedia pages),T2Dv2 (tables from the Web)• Both have twosetsofground truth classes-- “Best” and “Okay”,resultingtotwoevaluationmodels– “Strict”and“Tolerant”.
08/11/2019 25
Name Columns Avg. Cells Different “Best” (“Tolerant”) Classes
Limaye 428 23 21 (24)
T2Dv2 411 124 56 (35)
OverallResultsonLimaye
• Predictionimpact• ColNetensembleandColNet>Lookup-Vote• Improverecalldramatically
• Ensembleimpact• ColNetensemble >ColNet• Improveprecisiondramatically
• Comparisonwiththestate-of-the-art• ColNetensembleandColNet>T2KMatch• ColNetensembleandColNethascompetitiveprecisionasEfthymiou17-Vote,butmuchhigherrecallandF1score
08/11/2019 26
Precision, recall,F1score
• Baselines• Lookup-Vote:DBPedia entitylookup+Voting• T2KMatch(iterativeapproach)• Efthymiou17-Vote:entitymatchingbyEfthymiou
etal.2017 (entityembedding) +Voting
(PK = Primary Key)
OverallResultsonT2Dv2• Predictionimpact√• Ensembleimpact√• Comparisonwiththestate-of-the-art√
• Knowledgegapimpact• Limayehasshortercolumnsinaverage,whichleadstolargerknowledgegap
• TheabsoluteprecisionandrecallishigheronT2Dv2thanonLimaye
• ImprovementsofColNetensembleandColNetaremoresignificantonLimayethanonT2Dv2,especiallyw.r.t.Lookup-Vote
08/11/2019 27
Precision, recall,F1score (PK = Primary Key)
ResultsonLimaye
ResultsonT2Dv2
Evaluation #2:CSV Files(AIDA)
• ThreeAIDAdatasets:• TundraTraits,Broadband,HomeElectricitySurvey• 20 different CSV files• 81 entity columns
08/11/2019 28
ColumnClassifierNutshell
• Semantics from surrounding rowsandcolumns
• Hybrid Neural Network (HNN): tablefeaturelearningthatcapturesthecontextualsemantics– cell phrases and cell correlations
• AKGlookupandsemanticqueryalgorithm for column features
08/11/2019 32
Col 1.A Col 1.B
Apple Arthur D. Levinson
BlackBerry John S. Chen
Orange Stéphane Richard
Col 2.A Col 2.B
Apple Vitamin C 7%
BlackBerry Vitamin C 35%
Orange Vitamin C 88%
Impact of the surrounding column: Col 1.A belongs to Company while Col 2.A belongs to Fruit
HNNInsights and Architecture• BidirectionalRecurrentNeuralNetworks(BiRNNs)
• Cellphraseembeddingwithwordcorrelations
• Attention Layer• Higherweighton“Limited”thanon“BlackBerry” w.r.t.Company;
• Convolution Layer• Encodethecorrelationbetweencellswithinacolumnorarow;• e.g.,Conv(“Apple”,“GoogleInc.”)>“Apple”
08/11/2019 33
Table CellEmbedding
ColumnFeatures
RowFeatures
CellFeatures
Convolution layeroverthecolumnandrow
Fullyconnected layeroverthemaincell
Features
Maxpooling+Concatenation+
Dropout
FullyConnected Layer+Softmax Layer
Scores
BiRNNs+AttentionLayer
RemainingChallenges
• Unfixedrelativecolumn positionmakesitdifferentfromtraditionalfeaturelearningfortext and images
• Thepotentialrelations(properties)betweencolumnsarequitepredictiveandtransferable
• E.g.,“has_CEO”vs.“has_Vitamin”
• However,theneuralnetworkfails tocapturesuchsemanticrelations
08/11/2019 34
Col 1.A Col 1.B
Apple Arthur D. Levinson
BlackBerry John S. Chen
Orange Stéphane Richard
Col 2.A Col 2.B
Apple Vitamin C 7%
BlackBerry Vitamin C 35%
Orange Vitamin C 88%
PropertyVector(P2Vec)
• P2Vecisamulti-hotvector:one slotrepresentsthelikelihoodofoneproperty
• Example:assumethathas_CEO correspondstothefirstslot,has_Vitamin correspondstothesecondlastslot
08/11/2019 35
Col 1.A Col 1.B
Apple Arthur D. Levinson
BlackBerry John S. Chen
Orange Stéphane Richard
Col 2.A Col 2.B
Apple Vitamin C 7%
BlackBerry Vitamin C 35%
Orange Vitamin C 88%vs
[1,0,0,…,0,0]vs[0,0,0,…,1,0]
PropertyVector(P2Vec)
08/11/2019 36
BlackBerry John S.Chen
…
… … …
Entitiese.g.,e
Propertiese.g.,p
Matchinge.g.,e’vs”JohnS.Chen” P2Vec
KBLookup Query Triples
e.g.,<e,p,e’>
ABriefProcedureforP2VecCalculation
Ensemble:HNN+P2Vec
I. Train a classifier • with the input of HNN features (intermediate output) and P2Vec • using e.g., Multiple Layer Perception (MLP) and Logistic Regression (LR)
II. Averagethepredictedscoreby• a classifier trained with the input of P2Vec using e.g., MLP and LR• the HNN
08/11/2019 37
Experiment
• Data• KG: DBpedia• Table sets: Limaye (8 classes, 114 columns), Efthymiou (31 classes,
620 columns), T2Dv2 (37 classes, 411 columns)
• Settings• Transfer:T2D-Tr(70%ofT2Dv2)fortraining;Limaye,EfthymiouandT2D-Te(30%ofT2Dv2)fortesting
• Independent:foreachtableset,70%fortraining;30%fortesting
• Evaluation• PerformanceofHNN,P2Vec,ensembleapproaches• Overallaccuracyincomparisonwithbaselines
08/11/2019 38
Overallresults(accuracy)• HNN+P2Vec>CNNs usedinColNet,inbothindependentandtransfersettings• HNN+P2Vec>>lexicalmatchingbasedbaselines(Lookup-VoteandT2KMatch),intheindependentsetting
• Transfersetting:• HNN+P2Vec>Lookup-Vote,T2KMatchonLimaye• Lookup-Vote>HNN+P2Vec>T2KMatchonEfthymiou• Agoodcompromiseisusing10%oflocaldataplustransferreddatafromtraining
08/11/2019 39
Transfer
Transfer
Independent
Independent
Compromise
Formore
• Papers• Chen,Jiaoyan,etal."ColNet:Embedding thesemanticsofwebtablesforcolumntypeprediction."AAAI,2019.
• Chen,Jiaoyan,etal."LearningSemanticAnnotations forTabularData." IJCAI, 2019.
• Moreprojectinformation:• https://github.com/alan-turing-institute/SemAIDA/• https://www.turing.ac.uk/research/research-projects/artificial-intelligence-data-analytics-aida
• SemanticWebChallengeonTabularDatatoKnowledgeGraphMatching• ISWC2019• Withlargesyntheticandrealdataforallthreetasks• http://www.cs.ox.ac.uk/isg/challenges/sem-tab/
08/11/2019 40
Problem
• Tripleassertionswithliteralobjects(entitymentions)• e.g.,<Yangtze_River,passesArea,“threegorgesdistrict”>;• Suchliteralobjectsfailtocapturethesemanticsliketypeandlinkage;• Quite common:
• inKGs fromwikilike DBpediaandzhishi.me• inKGsfromtableslikeLinkedGeoData• whenmultiple KGs arealigned
• Tripleassertionswitherroneousobjects• e.g., <Sergio_Aguero,playsFor,Manchester_United>(Aguero actuallyplaysforManchesterCity)
• sucherrorsareoftencausedbysemanticorlexicalconfusions
08/11/2019 42
Solution
• Step#1:Detection• Downstreamapplications• Logicrulesandconstraints(e.g.,SPARQLquerytemplate[Kontokostas etal.WWW2014])• Machinelearningmethodse.g.,semanticembedding,outlierdetectionandsupervisedlearning
• Step#2:Correction• FindanentityfromtheKBforsubstitution,
• “threegorgesdistrict” toThree_Gorges_Reservoir_Region• <Sergio_Aguero,playsFor,Manchester_United>to<Sergio_Aguero,playsFor,Manchester_City>
• Step#3:Canonicalization• CreateanewentityforsubstitutionifnoexistingentitycanbefoundinStep#2
08/11/2019 43
Typeoftheentitysubstituteis useful.ChenJiaoyan,etal.“CanonicalizingKnowledgeBaseLiterals”,ISWC2019
Nutshell for Typing
• Aknowledgebasedlearningframeworkforliteralobjectclassification
• BiRNNs +attention forexploringthecontextualsemantics– literalandtriple
• Therole of constraints• Highqualitynegativesampling
• Getnegativesamples(entities) fromthesiblingclassesofatargetclass• Ensurethattheentitycannotbeinferred
• Hierarchicalprediction(utilizingdisjointness)• Place: 0.9, Professor: 0.8 à adoptPlace;• Place: 0.9, Town: 0.8à adoptbothPlaceandTown
08/11/2019 44
Framework
08/11/2019 45
<Subject,Property,Literal>’s-,passesArea,“Three Gorges District”,-,passesArea,“London,England”
…
Entities𝐸-
Entities𝐸.
Classes𝐶-
Classes𝐶.
CandidateClasses𝐶-.
LiteralMatching
PropertyObjects
Query
Foreachclass𝑐 in𝐶-.
Sampling
Sample:(triple< 𝑠, 𝑝, 𝑙 >,label)Particular&Generalsamples
Samplerefinement
NeuralNetwork(Classifier)
Triple< 𝑠, 𝑝, 𝑙 > toSequence[𝑤𝑜𝑟𝑑%,…,𝑤𝑜𝑟𝑑= ]
BidirectionalRNNs+AttentionLayer
Training
Foreachliteral,predicttypes,lookupentities
LiteralTyping&Canonicalization
“ThreeGorgesDistrict”–Locationand BodyOfWater (Types)
“London,England”–London (Entity)
Query Merge
AttentiveRecurrentNeuralNetwork
08/11/2019 46
“Yangtze” “River”
Subject𝑠
“passes” “Area”
Predicate𝑝
“Three” “Gorges” “District”
Literal𝑙
WordVectors
BidirectionalRNNs
AttentionLayer
Word AttentionsFCLayer+
LogisticRegression
𝑓(𝑠,𝑝, 𝑙)
Query
𝑥C, 𝑡 ∈ [1, 𝑇]
ℎC, 𝑡 ∈ [1, 𝑇]
ℎC, 𝑡 ∈ [𝑇, 1]
𝑢I
𝛼C,𝑡 ∈ [1,𝑇]
𝑣
Discussion and Outlook
• Semantic table annotation and KGboosting• Column type matching• Propertymatching• Robust feature learning
• KGcorrection• Detection• Correctionand canonicalization• HeterogeneousKG (alignmentof multiple sources)
• KG &Machine learning• Machine learning explanation
• Chen,Jiaoyan,etal."Knowledge-basedtransferlearningexplanation." KR2018.• Integrationof learning and reasoning (neural-symbolic)
08/11/2019 48
• Main contributors:• IanHorrocks(UniversityofOxford)• ErnestoJimenez-Ruiz(City,UniversityofLondon&UniversityofOslo)• CharlesSutton(TheUniversityofEdinburgh&Google)• Maximilian Pflueger (Ph.D student,University ofOxford)
• PartnersandSponsor:• SIRUS,UniversityofOslo• AlanTuringInstitute,London• SamsungResearch,London(new)• Siemens,München (new)
08/11/2019 49