Crowd Sourcing Web Service Annotations

Preview:

DESCRIPTION

Presentation at the AAAI 2012 Spring Symposium: Intelligent Web Services Meet Social Computing, Palo Alto, CA, United States. Paper: Crowd Sourcing Web Service Annotations. Authors: James Scicluna, Christoph Blank and Nathalie Steinmetz.

Citation preview

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Crowd Sourcing Web Service Annotations

1

James Scicluna1, Christoph Blank1, Nathalie Steinmetz1 and Elena Simperl2

1seekda GmbH, 2Karlsruhe Institute of Technology

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Outline

Introduction to seekda Web Service search engine

Web API crawling & identification

Amazon Mechanical Turk crowdsourcing

Web Service Annotation wizard

© Copyright 2012 SEEKDA GmbH – www.seekda.com

seekda Web Service Search Engine

3

© Copyright 2012 SEEKDA GmbH – www.seekda.com

seekda Web Service Search Engine

4

WSDL ONLY

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Why crawl for Web APIs?

Significant growth of Web APIs > 5,400 Web APIs on ProgrammableWeb (including SOAP and

REST APIs) [end of 2009: ca. 1,500 Web APIs] > 6,500 Mashups on ProgrammableWeb (combining Web APIs

from one or more sources) SOAP services are only a small part of the overall available

public services

5

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Web API Crawling

Problem: Web APIs are

described by regular HTML pages

No standardized structure that helps with the identification

6

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Web API Identification

Solution: Crawl for Web APIs Approach 1: Manual Feature Identification Approach

Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)

Approach 2: Automatic Classification Approach Text Classification, supervised learning (Support Vector Machine

model) Training set: APIs from ProgrammableWeb

But: still needed human confirmation to be sure

7

© Copyright 2012 SEEKDA GmbH – www.seekda.com

New Search Engine Prototype

8

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Prototype – User Contributions

Web API – yes/no: confirmation from human needed!

Other annotations that help improve the search for Web Services

Categories Tags Natural Language descriptions Cost: Free or paid service

9

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Problem - User Contribution

Problem: Users/developers don’t contribute enough Hard to motivate them to provide annotations Community recognition or peer respect not enough

Solution: crowdsourcing the annotations, pay people to provide annotations

Use Amazon Mechanical Turk Bootstrap annotations quickly and cheap

10

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (1/4)

11

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (2/4)

12

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (3/4)

13

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (4/4)

14

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iteration 1

Annotation Wizard Web API Yes/No Assign a category Assign tags Provide a natural language description Determine whether page is documentation, pricing or listing Rate the service

15

Number of Submissions 70

Reward per task $0.10

Restrictions none

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iteration 1

Results 21 APIs correctly identified as APIs 28 Web documents (non APIs) identified correctly as non APIs 49/70 correctly identified (70% accuracy) Average task completion time: 2:20 min

But, only: 4 well done & complete annotations 8 acceptable annotations (non complete)

16

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iterations 2 & 3

Annotation Wizard Removed page type identification & service rating For a task to be accepted:

At least one category must be assigned At least 2 tags must be provided A meaningful description must be provided

17

Iteration 2 Iteration 3

Number of Submissions 100 150

Reward per task $0.20 $0.20

Restrictions yes yes

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iteration 2 & 3

Results Iteration 2 & 3: Ca. 80% of documents correctly identified Very satisfying annotations Average completion time: 2:36 min

18

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Survey

48 survey submissions Female 18, Male 30 Most popular origins: India (27) and USA (9) Popular age groups:

15-22 (12) 23-30 (18) 31-50 (16)

Most of them worked in some IT profession Provided best quality annotations

19

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk

Recommendations for further improvement: Improve task description, especially ‘what is a Web API’ Better examples (e.g., hinting what makes a false page false) Allow assignment of multiple categories

Conclusion: Very positive results good way to get quality annotations Results will help provide better search experience to users Results can be used as positive set for automatic classification

20

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Questions?

21

Recommended