Upload
nathalie-steinmetz
View
277
Download
0
Embed Size (px)
DESCRIPTION
Presentation at the AAAI 2012 Spring Symposium: Intelligent Web Services Meet Social Computing, Palo Alto, CA, United States. Paper: Crowd Sourcing Web Service Annotations. Authors: James Scicluna, Christoph Blank and Nathalie Steinmetz.
Citation preview
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Crowd Sourcing Web Service Annotations
1
James Scicluna1, Christoph Blank1, Nathalie Steinmetz1 and Elena Simperl2
1seekda GmbH, 2Karlsruhe Institute of Technology
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Outline
Introduction to seekda Web Service search engine
Web API crawling & identification
Amazon Mechanical Turk crowdsourcing
Web Service Annotation wizard
© Copyright 2012 SEEKDA GmbH – www.seekda.com
seekda Web Service Search Engine
3
© Copyright 2012 SEEKDA GmbH – www.seekda.com
seekda Web Service Search Engine
4
WSDL ONLY
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Why crawl for Web APIs?
Significant growth of Web APIs > 5,400 Web APIs on ProgrammableWeb (including SOAP and
REST APIs) [end of 2009: ca. 1,500 Web APIs] > 6,500 Mashups on ProgrammableWeb (combining Web APIs
from one or more sources) SOAP services are only a small part of the overall available
public services
5
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Web API Crawling
Problem: Web APIs are
described by regular HTML pages
No standardized structure that helps with the identification
6
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Web API Identification
Solution: Crawl for Web APIs Approach 1: Manual Feature Identification Approach
Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)
Approach 2: Automatic Classification Approach Text Classification, supervised learning (Support Vector Machine
model) Training set: APIs from ProgrammableWeb
But: still needed human confirmation to be sure
7
© Copyright 2012 SEEKDA GmbH – www.seekda.com
New Search Engine Prototype
8
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Prototype – User Contributions
Web API – yes/no: confirmation from human needed!
Other annotations that help improve the search for Web Services
Categories Tags Natural Language descriptions Cost: Free or paid service
9
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Problem - User Contribution
Problem: Users/developers don’t contribute enough Hard to motivate them to provide annotations Community recognition or peer respect not enough
Solution: crowdsourcing the annotations, pay people to provide annotations
Use Amazon Mechanical Turk Bootstrap annotations quickly and cheap
10
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (1/4)
11
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (2/4)
12
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (3/4)
13
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (4/4)
14
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iteration 1
Annotation Wizard Web API Yes/No Assign a category Assign tags Provide a natural language description Determine whether page is documentation, pricing or listing Rate the service
15
Number of Submissions 70
Reward per task $0.10
Restrictions none
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iteration 1
Results 21 APIs correctly identified as APIs 28 Web documents (non APIs) identified correctly as non APIs 49/70 correctly identified (70% accuracy) Average task completion time: 2:20 min
But, only: 4 well done & complete annotations 8 acceptable annotations (non complete)
16
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iterations 2 & 3
Annotation Wizard Removed page type identification & service rating For a task to be accepted:
At least one category must be assigned At least 2 tags must be provided A meaningful description must be provided
17
Iteration 2 Iteration 3
Number of Submissions 100 150
Reward per task $0.20 $0.20
Restrictions yes yes
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iteration 2 & 3
Results Iteration 2 & 3: Ca. 80% of documents correctly identified Very satisfying annotations Average completion time: 2:36 min
18
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Survey
48 survey submissions Female 18, Male 30 Most popular origins: India (27) and USA (9) Popular age groups:
15-22 (12) 23-30 (18) 31-50 (16)
Most of them worked in some IT profession Provided best quality annotations
19
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk
Recommendations for further improvement: Improve task description, especially ‘what is a Web API’ Better examples (e.g., hinting what makes a false page false) Allow assignment of multiple categories
Conclusion: Very positive results good way to get quality annotations Results will help provide better search experience to users Results can be used as positive set for automatic classification
20
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Questions?
21