View
238
Download
0
Category
Preview:
Citation preview
Quantitative Comparisons of Search Engine Results
Mike Thlwall
School of Computing and Information Technology, University of Wolverhampton ( 伍爾弗漢普頓
UK)
Journal of the American Society for Information Science and Technology 2008
2
Abstract• Search engines
– To find information or web sites
• Webometric– Finding and measuring web based phenomena
• Comparing the applications programming interfaces– Google, Yahoo!, Live Search
• Webometric application– hit count, number of URLs, number of domains,
number of web sites, number of top-level domains
3
Search Engine and Web Crawlers
• Three key operations:– Crawling : identifying, downloading and storing to DB– Results matching: a search engine identifies the pages
in its database that match any user query.
4
Search Engine and Web Crawlers
– Results ranking• A search engine will arrange the matching URLs to
maximize the probability that a relevant result is in the first or second pages.
• Search term
• Occur frequency
• Number of click
5
Research Objectives• Are there specific anomalies that make the HCEs of
Google, Live Search or Yahoo! unreliable for particular values?
• How consistent are Google, Live Search and Yahoo! in the number of URLs returned for a search, and which of them typically returns the most URLs?
• How consistent are the search engines in terms of the spread of results (sites, domains and top-level domains) and which search engine gives the widest spread of results for a search?
6
Data
• 1,587 words– Blogs– Word frequency– http://cybermetrics.wlv.ac.uk/paperdata/
• Three engine searchs– Google, Yahoo! and Live Search– 1000 pages
• Five webometrics– hit count, number of URLs, number of domains,
number of web sites, number of top-level domains
7
Results - 1
• Hit count estimates
Figure 2a,b,c. Hit count estimates of the three search engines compared (logarithmic scales, excluding data with zero values; r=0.80, 0.96, 0.83).
8
Results - 2
• Number of URLs returned
Figure 3a,b,c. URLs returned by the three search engines compared (r=0.71, 0.68, 0.84)
9
Results - 3
• Number of domains returned
Figure 4a,b,c. Domains returned by the three search engines compared (r=0.65, 0.69, 0.83).
10
Results - 4
• Number of sites returned
Figure 5a,b,c. Sites returned by the three search engines compared (r=0.66, 0.69, 0.81)
11
Results - 5
• Number of TLDs returned
Figure 6a,b,c. TLDs returned by the three search engines compared (r=0.74, 0.77, 0.84)
12
Results - 6
• Comparison within results
13
Conclusion
• Google seems to be the most consistent in terms of the relationship between its HCEs and number of URLs returned.
• Yahoo! is recommended if the objective is to get results from the widest variety of web sites, domains or TLDs.
14
Evaluating Search Engine Effects on Web-based
Relatedness Measurement
15
Snippets
• Six manifest records – snippets– hit count– number of URLs– number of domains– number of web sites– number of top-level domains
16
Dataset• WordSimilarity-353 Test Collection (TC-353)
– TC353 Full (353 pairs)– TC353 Testing (153 pairs)
• Three famous search engines– Yahoo!– Google– Live Search
• Five domains– general web search (web09)– .Com – .Edu– .Net– .Org
17
The Model
• A web-based relatedness WebMetric(X, Y) measures the association of two objects X and Y
– where F is a transfer function and d is a dependency score.
• The dependency score d reflects a mutual dependency of X and Y on the web.
WebMetric(X, Y)= F(d(X,Y))
18
The Model
• Given a search engine G and two objects X and Y – we employ two double-checking functions, fG(Y@X) a
nd fG(X@Y), to estimate the dependence between X and Y
•
)(
)(
)(
)() ,(
Yf
Y@Xf
Xf
X@YfYXd
G
G
G
G
WebMetric(X, Y) =)( daebe
19
Figure 8. Behaviors of the Gompertz Curve and a Mapping Example
)()(
daebedy
20
Experiments
WebMetric(X, Y) =)( daebe
21
Recommended