Upload
chad-harrington
View
214
Download
2
Embed Size (px)
Citation preview
Summary of WWW Characterizations
James E. Pitkow
Xerox Palo Alto Research Center
WWW Journal 99
발표자 : 노양우
System Software Research Lab.
2Contents
Introduction Client, Proxies and Gateways, Server Traces and Analysis (distribution)
Summary 1994 1995 1996 1997 1998
Conclusion
System Software Research Lab.
3Introduction
Growth of Web Usages representative characterization --> enjoyable Web surfing various data sets at various points ( clients, proxy and gateways, servers )
several invariants
Clients informative but rare --> browser implementation, sufficient APIs
Proxy and Gateways greater availability less concentration on characteristics --> caching algorithm
Servers traffic analysis
System Software Research Lab.
41994
A Caching Relay for WWW DEC proxy, 4000/day from 100 users document popularity --> Zipf cache : hit (1/3), miss (1/3), invalidation (1/3)
Mosaic Will Kill Me Intel Intranet Proxies --> Images traffic
A Simple Yet Robust Caching Algorithm Georgia IT server : recency > frequency LRU : Server side cache-hit rate (80%)
Invalidation in Large Scale Object Cache Harvest Cache, Xmosaic HTML : frequently modified ( 75 days), Image (107 days)
System Software Research Lab.
51995
Charcteristics of WWW Client-Based Traces Xmosaic, 600 users, 6 months transmission times, doc. size, doc. size versus # of requests (Pareto) unix file systems : more small and large file exists
Application Level Document Caching Explaining WWW Traffic - Self - Similarity
1 second -- 100 seconds : self-similar Busiest periods : self-similar, idle periods: non self-similar
Caching Proxies : Limitations and Potentials # of requests per sever (Zipf), CGI (0.5%)
Network Behaviour of a Busy Web Servers DEC, Congressional Election Server
images --> major traffic, inter-arrival time --> not Poission
System Software Research Lab.
61996
WWW Cache Consistency Microsoft, BU, Harvard popularity : inverse with frequency of change image : 65 % , CGI : 9 % HTML : 50 days, GIF : 85 days
Web Server Workload Characterization University of Waterloog, Calgary, NASA, NCSA 10% documents --> 90% requests 10% domains --> 75% usags
Evaluating History Mechanism Xmosaic, 6 weeks new URL : 42%, revisting URL: 58%
System Software Research Lab.
71997
Strong Regularites in Web Surfing click per sites --> inverse Gaussian average clicks 8.32, typical case : 1 click
Shared User Behaviour DEC, Korean National Proxy, Virginia Tech, AOL Median file size : 2KB, Mean file size : 27 KB 25 % server : 80-95% requests, 90% bytes : 25% servers
Characterizing WWW Queries CGI : 4 % (KNP), 9 % (AOL), 12 % (VT) 99% queries : simple
Web Facts and Fantasy Educational (Harvard, Rice), Business (BUS, ISP, FSS, AE), Info (GOV, PROF) Characterization of Sites ( size of the site, diversity of users, user access patterns)
Renovational growth ( Business )
System Software Research Lab.
81998
Size growth ( Eudcational sites) Visit growth by the same user ( Information sites ) Attraction ( Adult Entertainment : Search Engine)
CGI : low requests, low traffic : counter, login, search engine Peak Activity : network bottlneck
Generating representative Web workloads SURGE
file size : body (lognormal), tail (Pareto), popularity : Zipf, request size: Pareto, reading times : Pareto, ….
realistic benchmark : HTTP-NG
System Software Research Lab.
9Conclusion
Dynamic Web --> Several Invariants file popularity, file size, # of request per user, site popularity, life span, request type….
Future Research Relation between file popularity and reoccurence rate User’s navigation paths