26
Date: 29/11/2012 Work at ISI, Current Status, Next Steps Daniel Garijo Verdejo, Oscar Corcho, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid

Status update OEG - Nov 2012

  • Upload
    dgarijo

  • View
    295

  • Download
    2

Embed Size (px)

DESCRIPTION

Summary of the work done in my summer internship plus the current status of my thesis (2012)

Citation preview

Page 1: Status update OEG - Nov 2012

Date: 29/11/2012

Work at ISI,Current Status,

Next Steps

Daniel Garijo Verdejo,Oscar Corcho,

Yolanda Gil

Ontology Engineering Group. Laboratorio de Inteligencia ArtificialDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

Page 2: Status update OEG - Nov 2012

What am I working on?

•Creation of abstractions in scientific workflows

• Workflow Traces and template representation• Provenance representation• Plan representation

• Abstraction catalog

• Find ways to link the definitions to the provenance traces automatically

•Understandability and reuse of scientific workflows

2

Page 3: Status update OEG - Nov 2012

Outline

Index

1. Motivation2. Overview3. Workflow systems used4. Summary of work done in my previous visit to ISI

• OPMW and provenance publishing5. Summary of work done before second visit to ISI

• Workflow motif catalog6. Summary of work done in my second visit to ISI

• OPMW-PROV and P-PLAN• Automatic macro abstraction detection

7. Next Steps8. Future work

3

Page 4: Status update OEG - Nov 2012

Motivation

4

•As a designer: Discovery

• Workflows with similar functionality fragments/methods

• Design based in previous templates.

•As user/reuser: Understandability

• Search workflows by functionality

• Commonalities between execution runs

• Component categorization

Page 5: Status update OEG - Nov 2012

Overview

Abstraction definitions and categorization

Provenance representation

Plan representation

Algorithms for finding the different abstractions automatically

Experiment publication

5

Vocabularies

RDF Stores

Data mining tools, graph analysis, etc.

Descriptions/PSMS/Ontologies

Page 6: Status update OEG - Nov 2012

6

Taverna and Wings

IEEE eScience 2012. Chicago, USA

http://www.taverna.org.uk/

http://www.wings-workflows.org/

Page 7: Status update OEG - Nov 2012

Summary: Previous Work at ISI

Abstractions definitions and categorization

Provenance representation

Plan representation

Experiment Publication

OPMW

Virtuoso,Pubby, Wings (+Plugin)

7

Algorithms for finding the different abstractions automatically

Page 8: Status update OEG - Nov 2012

High level architecture

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPMconversion Publication Share Reuse

Core

Portal

WINGS on local laptopWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on shared hostWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on web serverWorkflow Template

WorkflowInstance

OPMexport

LinkedData

Publication

Users

Other workflow environments

8

Page 9: Status update OEG - Nov 2012

OPMW: Process view

9

Page 10: Status update OEG - Nov 2012

OPMW: Attribution view

10

Page 11: Status update OEG - Nov 2012

Work previous to second visit to ISI

Abstractions definitions and categorization

Provenance representation

Plan representation

Algorithms for automatic matching

Experiment Publication

OPMW

Virtuoso,Pubby, Wings (+Plugin)

Motif Detection

11

Page 12: Status update OEG - Nov 2012

12

Overview

• Empirical analysis on 177 workflow templates from Taverna and Wings

• Catalog of recurring patterns: scientific workflow motifs.

• Data Oriented Motifs

• Workflow Oriented Motifs

•Understandability and reuse

IEEE eScience 2012. Chicago, USA

Catalog

http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg

Page 13: Status update OEG - Nov 2012

13

Approach

•Reverse-engineer the set of current practices in workflowdevelopment through an analysis of empirical evidence

•Identify workflow abstractions that would facilitateunderstandability and therefore effective re-use

IEEE eScience 2012. Chicago, USA

Page 14: Status update OEG - Nov 2012

14

Workflow Motifs

•Workflow motif: Domain independent conceptual abstraction on the workflow steps.1. Data-oriented motifs: What kind of manipulations does the workflow have?

• E.g.: • Data retrieval • Data preparation• etc.

2. Workflow-oriented motifs: How does the workflow perform its operations?

•E.g.:• Stateful steps• Stateless steps• Human interactions• etc.

IEEE eScience 2012. Chicago, USA

WHAT?

HOW?

Page 15: Status update OEG - Nov 2012

15

Motif CatalogData-Oriented Motifs

Data Retrieval

Data Preparation

Format Transformation

Input Augmentation and Output Splitting

Data Organisation

Data Analysis

Data Curation/Cleaning

Data Moving

Data Visualisation

IEEE eScience 2012. Chicago, USA

Workflow-Oriented Motifs

Intra-Workflow Motifs

Stateful (Asynchronous) Invocations

Stateless (Synchronous) Invocations

Internal Macros

Human Interactions

Inter-Workflow Motifs

Atomic Workflows

Composite Workflows

Workflow Overloading

Page 16: Status update OEG - Nov 2012

Summary: Work done at ISI

Abstractions definitions and categorization

Provenance representation

Plan representation

Algorithms for automatic matching

Experiment Publication

OPMW + PROV+ P-PLAN

Virtuoso,Pubby, Wings (+Plugin)

Macro abstraction detection

Motif Detection

SUBDUE exploration and integration in RDF

16

Page 17: Status update OEG - Nov 2012

PROV Compatibility

•OPMW fits naturally into PROV• Same usage-generation

structure• Extension for the scientific

workflow with PROV

•Binary relationships (no n-ary patterns used).• Simplicity

•Publication of PROV as well as OPMW. • Queries can be answered in

both languages.• Flexibility.

•http://www.opmw.org/node/8

17

Page 18: Status update OEG - Nov 2012

P-PLAN

•Plans are not provenance•P-PLAN: Simple plan model for binding traces to template representations•Aligned with OPMW and PROV•Documentation in progress

18

Page 19: Status update OEG - Nov 2012

Summary: Work done at ISI

Abstractions definitions and categorization

Provenance representation

Plan representation

Algorithms for automatic matching

Experiment Publication

OPMW + PROV+ P-PLAN

Virtuoso,Pubby, Wings (+Plugin)

Macro abstraction detection

Motif Detection

SUBDUE exploration and integration in RDF

19

Page 20: Status update OEG - Nov 2012

Macro abstraction detection

Problem statement:

Given a repository of workflow templates (either abstract or specific) or workflow execution traces, what are the workflow fragments I can deduce from it?

Useful for:•Systems like Taverna and Wings: (Many templates, little annotation to relate them)• Finding relationships between workflows and sub-workflows.

• Most used fragments, most executed, etc.

• Systems like GenePattern and Galaxy: (Many runs, nearly no templates published)• Proposing new templates with the popular fragments.

20

Page 21: Status update OEG - Nov 2012

Macro abstraction detection

•Work in Progress (implementation and evaluation)• WINGS traces

•Similar to Sub-graph Isomorphism•Kind of “Graph Clustering”

•Early results• Tool for finding common sub-graphs

• Sequential graphs• Efficient• Scalable.

• Integration with RDF (by me)

•TO DO:• Finish implementation: inference.• Evaluation!!

21

Page 22: Status update OEG - Nov 2012

Next Steps

22

Page 23: Status update OEG - Nov 2012

Next Steps

•Thesis:• Finish up implementation.• How to evaluate results?

•Publications:• Workshop:

• Provenance Corpus (with Taverna Team). To have something citable• Conference:

• KCAP: Macro detection implementation and evaluation.• Journal

• Decay analysis publication in journal (January)• OPMW - PROV -P-PLAN publication in journal (December)• Motif extension publication in journal (Invited by special issue)

(Now)

23

Page 24: Status update OEG - Nov 2012

Future work

•Thesis:

• Other methods for detecting workflow abstraction automatically• Metadata and file analysis (diff, etc.): Filter, merge, etc.• Provenance reconstruction.

•Project:

• RO model specifications• Testcases • Workflow abstraction with Isoco

24

Page 25: Status update OEG - Nov 2012

Thanks !

25

:collaboratesWith

:collaboratesWith:collaboratesWith

:collaboratesWith

:supervises :supervises

:oscarCorcho :yolandGil

:khalidBelhajjame

:varunRatnakar

:caroleGoble

:pinarAlper

:danielGarijo

Page 26: Status update OEG - Nov 2012

Date: 03/10/2011

Daniel Garijo Verdejo

Ontology Engineering Group. Laboratorio de Inteligencia ArtificialDepartamento de Inteligencia Artificial

Facultad de InformáticaUniversidad Politécnica de Madrid

Work at ISI, relation with wf4Ever,

future steps