21
© Copyright 2012 SEEKDA GmbH – www.seekda.com Crowd Sourcing Web Service Annotations 1 James Scicluna 1 , Christoph Blank 1 , Nathalie Steinmetz 1 and Elena Simperl 2 1 seekda GmbH, 2 Karlsruhe Institute of Technology

Crowd Sourcing Web Service Annotations

Embed Size (px)

DESCRIPTION

Presentation at the AAAI 2012 Spring Symposium: Intelligent Web Services Meet Social Computing, Palo Alto, CA, United States. Paper: Crowd Sourcing Web Service Annotations. Authors: James Scicluna, Christoph Blank and Nathalie Steinmetz.

Citation preview

Page 1: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Crowd Sourcing Web Service Annotations

1

James Scicluna1, Christoph Blank1, Nathalie Steinmetz1 and Elena Simperl2

1seekda GmbH, 2Karlsruhe Institute of Technology

Page 2: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Outline

Introduction to seekda Web Service search engine

Web API crawling & identification

Amazon Mechanical Turk crowdsourcing

Web Service Annotation wizard

Page 3: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

seekda Web Service Search Engine

3

Page 4: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

seekda Web Service Search Engine

4

WSDL ONLY

Page 5: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Why crawl for Web APIs?

Significant growth of Web APIs > 5,400 Web APIs on ProgrammableWeb (including SOAP and

REST APIs) [end of 2009: ca. 1,500 Web APIs] > 6,500 Mashups on ProgrammableWeb (combining Web APIs

from one or more sources) SOAP services are only a small part of the overall available

public services

5

Page 6: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Web API Crawling

Problem: Web APIs are

described by regular HTML pages

No standardized structure that helps with the identification

6

Page 7: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Web API Identification

Solution: Crawl for Web APIs Approach 1: Manual Feature Identification Approach

Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)

Approach 2: Automatic Classification Approach Text Classification, supervised learning (Support Vector Machine

model) Training set: APIs from ProgrammableWeb

But: still needed human confirmation to be sure

7

Page 8: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

New Search Engine Prototype

8

Page 9: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Prototype – User Contributions

Web API – yes/no: confirmation from human needed!

Other annotations that help improve the search for Web Services

Categories Tags Natural Language descriptions Cost: Free or paid service

9

Page 10: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Problem - User Contribution

Problem: Users/developers don’t contribute enough Hard to motivate them to provide annotations Community recognition or peer respect not enough

Solution: crowdsourcing the annotations, pay people to provide annotations

Use Amazon Mechanical Turk Bootstrap annotations quickly and cheap

10

Page 11: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (1/4)

11

Page 12: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (2/4)

12

Page 13: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (3/4)

13

Page 14: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Service Annotation Wizard (4/4)

14

Page 15: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iteration 1

Annotation Wizard Web API Yes/No Assign a category Assign tags Provide a natural language description Determine whether page is documentation, pricing or listing Rate the service

15

Number of Submissions 70

Reward per task $0.10

Restrictions none

Page 16: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iteration 1

Results 21 APIs correctly identified as APIs 28 Web documents (non APIs) identified correctly as non APIs 49/70 correctly identified (70% accuracy) Average task completion time: 2:20 min

But, only: 4 well done & complete annotations 8 acceptable annotations (non complete)

16

Page 17: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iterations 2 & 3

Annotation Wizard Removed page type identification & service rating For a task to be accepted:

At least one category must be assigned At least 2 tags must be provided A meaningful description must be provided

17

Iteration 2 Iteration 3

Number of Submissions 100 150

Reward per task $0.20 $0.20

Restrictions yes yes

Page 18: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Iteration 2 & 3

Results Iteration 2 & 3: Ca. 80% of documents correctly identified Very satisfying annotations Average completion time: 2:36 min

18

Page 19: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk – Survey

48 survey submissions Female 18, Male 30 Most popular origins: India (27) and USA (9) Popular age groups:

15-22 (12) 23-30 (18) 31-50 (16)

Most of them worked in some IT profession Provided best quality annotations

19

Page 20: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Amazon Mechanical Turk

Recommendations for further improvement: Improve task description, especially ‘what is a Web API’ Better examples (e.g., hinting what makes a false page false) Allow assignment of multiple categories

Conclusion: Very positive results good way to get quality annotations Results will help provide better search experience to users Results can be used as positive set for automatic classification

20

Page 21: Crowd Sourcing Web Service Annotations

© Copyright 2012 SEEKDA GmbH – www.seekda.com

Questions?

21