30
Propagation of Policies in Rich Data Flows 1 Enrico DagaMathieu d’AquinAldo GangemiEnrico Motta† Knowledge Media Ins2tute, The Open University (UK) ‡ Université Paris13 (France) and ISTCCNR (Italy) The 8th Interna2onal Conference on Knowledge Capture (KCAP 2015) October 10th, 2015 Palisades, NY (USA) hRp://www.kcap2015.org/ Feedback welcome: @enridaga #kmiou

Propagation of Policies in Rich Data Flows

Embed Size (px)

Citation preview

Propagation of Policies in Rich Data Flows

1

Enrico Daga†  Mathieu d’Aquin†  Aldo Gangemi‡  Enrico Motta†    †  Knowledge  Media  Ins2tute,  The  Open  University  (UK)  ‡  Université  Paris13  (France)  and  ISTC-­‐CNR  (Italy)  

The  8th  Interna2onal  Conference  on  Knowledge  Capture  (K-­‐CAP  2015)  October  10th,  2015  -­‐  Palisades,  NY  (USA)  hRp://www.k-­‐cap2015.org/

Feedback  welcome:  @enridaga  #kmiou

Motivation

• Governing the life cycle of data on the web is a challenging issue for organisations and users.

• Assessing what policies on input data propagate to the output of a process is a crucial problem.

2

Policies

3

Foundations

Constraints and permissions set by the data owner regarding the reuse of the data, ie. mostly licences.We can describe licences/policies (RDF+ODRL).

RDF License Databasehttp://datahub.io/dataset/rdflicense

Describes ~140 licenses using RDF and the Open Digital Rights Language (ODRL).Reports 113 policies.

lic:cc-by-nc4.0 a odrl:Policy ; rdfs:label "CC-BY-NC" ; odrl:permission [ odrl:action cc:Distribution, ldr:extraction , ldr:reutilization , cc:DerivativeWorks , cc:Reproduction ; odrl:duty [ odrl:action cc:Attribution , cc:Notice ] ] ; odrl:prohibition [ odrl:action cc:CommercialUse ]

Objective

We need to understand how these policies propagate in data flows!

4

Data flows

5

Foundations

What are the semantic relations between input and output?

Data flows can be thought as graphs of data objects.Datanode is an ontology for data centric description of applications.Built as hierarchy of relations.More then 100 relations so far.

http://purl.org/datanode/ns/http://purl.org/datanode/docs/

Policy Propagation Rule (PPR)

6

Relying on Datanode for an enumeration of the possible relations and on the RDF License Database for the possible policies, we can setup rules like the following:

A Horn clauses of the form:

propagates(odrl:duty cc:Attribution, dn:isCopyOf)

For example:

It can be simplified as:

Foundations

Problem

A description of policies and data flows implies a huge number of Policy Propagation Rules to be specified and computed (number of possible policies times number of possible relations between data objects).

How to abstract this KB to make the management and reasoning on policy propagation rules easier?

7

Contributions

(1) A methodology to obtain an abstraction that allows to reduce the number of rules significantly (using an Ontology). (2) Evaluate how effective this methodology is when using the Datanode ontology.(3) Demonstrate how this ontology can evolve in order to better represent the behaviour of Policy Propagation Rules.

8

(A)AAAA Methodology

9

1.Aquire rules2.Analyse the rules: find clusters using Formal Concept Analysis3.Abstract the rules: match clusters & ontology hierarchy4.Assess the compression, and diagnose errors or refinements in the

rules or the ontology5.Adjust the rules or the ontologyrepeat from 2.

Preparing the Rules Base• This phase required a manual supervision of all associations

between policies and relations in order to establish the initial set of propagation rules.

• We used Contento, it was possible to prepare manually the rule base with a reasonable effort.

• The initial knowledge base was then composed of 3363 Policy Propagation Rules

10

Acquisition

Formal Concept Analysis (FCA)

• Input is a Formal Context (a binary matrix of objects/attributes)• Basic unit is a Close Concept: – (O,A) => (Extension,Intension)– Closure operator ’ … (O,A) is a concept when O’=A and A’=O

• Classifies concepts hierarchically in a concept lattice– Top: all objects, no attr, bottom: all attributes, no obj

11

Analysis

Applying FCA to the Rule Base

80 conceptsClusters of rulesRelations that have a common behaviour: they propagate the same policies.But why do they do it?

12

Analysis

Detect matches with the ontology

13

Abstraction

Search for matches between concepts and branches taken from the ontology hierarchy.When found, subtract the rules from the KB accordingly.

Example/1

14

Abstraction

dn:hasPart and dn:isVocabularyOf are valid abstractions

dn:isPartOf is not, because dn:isSelectionOf apparently does not propagate (all) the policies…

Example/2

15

Abstraction

dn:hasPart (49 rules)

propagates(dn:hasPart,duty  cc:ARribu2on)  propagates(dn:hasPart,duty  cc:Copyle^)  propagates(dn:hasPart,duty  cc:No2ce)  propagates(dn:hasPart,duty  cc:SourceCode)  propagates(dn:hasPart,duty  odrl:aRachPolicy)  propagates(dn:hasPart,duty  odrl:aRachSource)  propagates(dn:hasPart,duty  odrl:aRribute)

propagates(dn:hasSec2on,duty  cc:ARribu2on)  propagates(dn:hasSec2on,duty  cc:Copyle^)  propagates(dn:hasSec2on,duty  cc:No2ce)  propagates(dn:hasSec2on,duty  cc:SourceCode)  propagates(dn:hasSec2on,duty  odrl:aRachPolicy)  propagates(dn:hasSec2on,duty  odrl:aRachSource)  propagates(dn:hasSec2on,duty  odrl:aRribute)  propagates(dn:hasSelec2on,duty  cc:ARribu2on)  propagates(dn:hasSelec2on,duty  cc:Copyle^)  propagates(dn:hasSelec2on,duty  cc:No2ce)  propagates(dn:hasSelec2on,duty  cc:SourceCode)  propagates(dn:hasSelec2on,duty  odrl:aRachPolicy)  propagates(dn:hasSelec2on,duty  odrl:aRachSource)  propagates(dn:hasSelec2on,duty  odrl:aRribute)  propagates(dn:hasSample,duty  cc:ARribu2on)  propagates(dn:hasSample,duty  cc:Copyle^)  propagates(dn:hasSample,duty  cc:No2ce)  propagates(dn:hasSample,duty  cc:SourceCode)  propagates(dn:hasSample,duty  odrl:aRachPolicy)  propagates(dn:hasSample,duty  odrl:aRachSource)  propagates(dn:hasSample,duty  odrl:aRribute)  propagates(dn:hasPor2on,duty  cc:ARribu2on)  propagates(dn:hasPor2on,duty  cc:Copyle^)  propagates(dn:hasPor2on,duty  cc:No2ce)  propagates(dn:hasPor2on,duty  cc:SourceCode)  propagates(dn:hasPor2on,duty  odrl:aRachPolicy)  propagates(dn:hasPor2on,duty  odrl:aRachSource)  propagates(dn:hasPor2on,duty  odrl:aRribute)  propagates(dn:hasIden2fiers,duty  cc:ARribu2on)  propagates(dn:hasIden2fiers,duty  cc:Copyle^)  propagates(dn:hasIden2fiers,duty  cc:No2ce)  propagates(dn:hasIden2fiers,duty  cc:SourceCode)  propagates(dn:hasIden2fiers,duty  odrl:aRachPolicy)  propagates(dn:hasIden2fiers,duty  odrl:aRachSource)  propagates(dn:hasIden2fiers,duty  odrl:aRribute)  propagates(dn:hasExample,duty  cc:ARribu2on)  propagates(dn:hasExample,duty  cc:Copyle^)  propagates(dn:hasExample,duty  cc:No2ce)  propagates(dn:hasExample,duty  cc:SourceCode)  propagates(dn:hasExample,duty  odrl:aRachPolicy)  propagates(dn:hasExample,duty  odrl:aRachSource)  propagates(dn:hasExample,duty  odrl:aRribute)

propagates(dn:isVocabularyOf,duty  cc:ARribu2on)  propagates(dn:isVocabularyOf,duty  cc:Copyle^)  propagates(dn:isVocabularyOf,duty  cc:No2ce)  propagates(dn:isVocabularyOf,duty  cc:SourceCode)  propagates(dn:isVocabularyOf,duty  odrl:aRachPolicy)  propagates(dn:isVocabularyOf,duty  odrl:aRachSource)  propagates(dn:isVocabularyOf,duty  odrl:aRribute)

propagates(dn:aRributesOf,duty  cc:ARribu2on)  propagates(dn:aRributesOf,duty  cc:Copyle^)  propagates(dn:aRributesOf,duty  cc:No2ce)  propagates(dn:aRributesOf,duty  cc:SourceCode)  propagates(dn:aRributesOf,duty  odrl:aRachPolicy)  propagates(dn:aRributesOf,duty  odrl:aRachSource)  propagates(dn:aRributesOf,duty  odrl:aRribute)  propagates(dn:datatypesOf,duty  cc:ARribu2on)  propagates(dn:datatypesOf,duty  cc:Copyle^)  propagates(dn:datatypesOf,duty  cc:No2ce)  propagates(dn:datatypesOf,duty  cc:SourceCode)  propagates(dn:datatypesOf,duty  odrl:aRachPolicy)  propagates(dn:datatypesOf,duty  odrl:aRachSource)  propagates(dn:datatypesOf,duty  odrl:aRribute)  propagates(dn:descriptorsOf,duty  cc:ARribu2on)  propagates(dn:descriptorsOf,duty  cc:Copyle^)  propagates(dn:descriptorsOf,duty  cc:No2ce)  propagates(dn:descriptorsOf,duty  cc:SourceCode)  propagates(dn:descriptorsOf,duty  odrl:aRachPolicy)  propagates(dn:descriptorsOf,duty  odrl:aRachSource)  propagates(dn:descriptorsOf,duty  odrl:aRribute)  propagates(dn:typesOf,duty  cc:ARribu2on)  propagates(dn:typesOf,duty  cc:Copyle^)  propagates(dn:typesOf,duty  cc:No2ce)  propagates(dn:typesOf,duty  cc:SourceCode)  propagates(dn:typesOf,duty  odrl:aRachPolicy)  propagates(dn:typesOf,duty  odrl:aRachSource)  propagates(dn:typesOf,duty  odrl:aRribute)  propagates(dn:rela2onsOf,duty  cc:ARribu2on)  propagates(dn:rela2onsOf,duty  cc:Copyle^)  propagates(dn:rela2onsOf,duty  cc:No2ce)  propagates(dn:rela2onsOf,duty  cc:SourceCode)  propagates(dn:rela2onsOf,duty  odrl:aRachPolicy)  propagates(dn:rela2onsOf,duty  odrl:aRachSource)  propagates(dn:rela2onsOf,duty  odrl:aRribute)

dn:isVocabularyOf (42 rules)

7 rules! 7 rules!

Compression Factor

16

Assessment

By applying the original Datanode ontology, 1925 rules could be removed out of 3363, for a compression factor of 0.572.

We calculate the CF as the number of abstracted rules divided by the total number of rules:

Considerations

17

Assessment

2. The Datanode ontology has not been designed for the purpose of representing a common behaviour of relations in terms of propagation of policies. It is possible to refine the ontology in order to make it cover this use case better (and possibly reduce the number of rules even more).

1. The size of the matrix that was manually supervised is large, and it is possible that errors have been made at that stage of the process.

Observing the measures

18

Assessment

concepts * relations=~9040 (partial) matches. Hard to explore!Inspecting a partial match with high precision and low recall highlights a problem that might be easy to fix, as the number of relations and policies to compare will be low.

Operations

19

Adjustment

We try to make adjustments in order to improve the compression factor. We defined a set of operations, targeted to (a) fix errors in the initial rule base and (b) refine the ontology.

• Fill: makes a branch be fully in a cluster of concept c, attempting to push Pre up to 1.

• Group: some relations that share a concept, but belong to different branches, are abstracted by a new relation.

• Merge: two distinct branches are abstracted by a new relation.• Wedge: change the top relation of a branch to make it fully match

the concept.

(and we run our process again from the Analysis phase to the Assessment)

Example: Fill

20

Adjustment

… dn:isSelectionOf should indeed propagate all the policies listed in this concept. Fill adds the following rules:

propagates(dn:isSelectionOf, duty cc:Attribution)propagates(dn:isSelectionOf, duty cc:Copyleft)propagates(dn:isSelectionOf, duty cc:Notice)propagates(dn:isSelectionOf, duty cc:SourceCode)propagates(dn:isSelectionOf, duty odrl:attachPolicy)propagates(dn:isSelectionOf, duty odrl:attachSource)propagates(dn:isSelectionOf, duty odrl:attribute)

Example: Wedge

21

Adjustment

… dn:sameCapabilityAs does *not* propagate all the policies listed in this concept. We Wedge dn:sameIdentityAs

Evaluation

22

As a result we obtained: 3865 rules in total, 78 concepts, 2817 rules abstracted and 1048 rules remaining - for a compression factor of 0.729.

Thanks to this methodology we have been able to fix many errors in the initial data, and to refine Datanode by clarifying the semantics of many properties and adding new ones.

Conclusions

23

• We presented a method to abstract Policy Propagation Rules applying an ontology, the Datanode ontology, a hierarchical organisation of the possible relations between data objects.

• Datanode allowed us to reduce the number of rules to a factor of 0.5. • Applying the ontology and the method we were able to find and correct

errors in the rules. • Moreover we have been able to analyse the ontology in relation to this

task and enhance it having as result not only a further reduction of rules - 0.7, but also a better ontology.

Future work

• Apply the rule base to specific use cases (perform a practical evaluation), and publish it for reuse.

• Extend the Assessment phase of the methodology to also include consistency check between the hierarchy of the FCA lattice and the ontology.

• Define additional operations to support the Adjustment phase. • Study the rule base evolution via continuous integration of new

policies/actions/rules.• Apply the methodology to other use cases. Should we integrate the

measures and operations of the methodology in the Contento tool?

24

Thank you

Enrico Daga

Twitter: @[email protected]

76Bottom-Up Ontology Construction with CONTENTO

http://bit.ly/contento-tool

Contento supports the user in the generation and curation of concept lattices to produce Semantic Web Ontologies from data. In the demo, we show how to use CONTENTO with Linked Data.

Formal Context

Concept Lattice

Modeling (Naming &

Pruning)SPARQL

Export as OWL

Ontology

(adver2sement)

@ISWC  2015  Poster  and  Demo  session,  next  week…

Operations: Fill

27

Annex

Observation: the branch is close to be fully in the concept (high Pre)Diagnosis: the branch must be fully in the concept…Effect: The Fill operation makes a branch be fully in a cluster of concept c, attempting to push Pre up to 1.

Operations: Merge

28

Annex

Observation: two distinct branches match the same conceptDiagnosis: the respective top relations can be abstracted by a new common relation Effect: a new relation is added to the ontology

Operations: Group

29

Annex

Observation: A set of relations are all together in the extent of a concept, but belong to different branches. Diagnosis: they are actually related, and a common parent is missing.Effect: a new relation is added to the ontology.

Operations: Wedge

30

Annex

Observation: the top relation is missing, and it does not propagate (all) the policies of the concept.Diagnosis: the actual top relation is not a right abstraction.Effect: a new relation is added to the ontology.