Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99...

Preview:

Citation preview

Summary of WWW Characterizations

James E. Pitkow

Xerox Palo Alto Research Center

WWW Journal 99

발표자 : 노양우

System Software Research Lab.

2Contents

Introduction Client, Proxies and Gateways, Server Traces and Analysis (distribution)

Summary 1994 1995 1996 1997 1998

Conclusion

System Software Research Lab.

3Introduction

Growth of Web Usages representative characterization --> enjoyable Web surfing various data sets at various points ( clients, proxy and gateways, servers )

several invariants

Clients informative but rare --> browser implementation, sufficient APIs

Proxy and Gateways greater availability less concentration on characteristics --> caching algorithm

Servers traffic analysis

System Software Research Lab.

41994

A Caching Relay for WWW DEC proxy, 4000/day from 100 users document popularity --> Zipf cache : hit (1/3), miss (1/3), invalidation (1/3)

Mosaic Will Kill Me Intel Intranet Proxies --> Images traffic

A Simple Yet Robust Caching Algorithm Georgia IT server : recency > frequency LRU : Server side cache-hit rate (80%)

Invalidation in Large Scale Object Cache Harvest Cache, Xmosaic HTML : frequently modified ( 75 days), Image (107 days)

System Software Research Lab.

51995

Charcteristics of WWW Client-Based Traces Xmosaic, 600 users, 6 months transmission times, doc. size, doc. size versus # of requests (Pareto) unix file systems : more small and large file exists

Application Level Document Caching Explaining WWW Traffic - Self - Similarity

1 second -- 100 seconds : self-similar Busiest periods : self-similar, idle periods: non self-similar

Caching Proxies : Limitations and Potentials # of requests per sever (Zipf), CGI (0.5%)

Network Behaviour of a Busy Web Servers DEC, Congressional Election Server

images --> major traffic, inter-arrival time --> not Poission

System Software Research Lab.

61996

WWW Cache Consistency Microsoft, BU, Harvard popularity : inverse with frequency of change image : 65 % , CGI : 9 % HTML : 50 days, GIF : 85 days

Web Server Workload Characterization University of Waterloog, Calgary, NASA, NCSA 10% documents --> 90% requests 10% domains --> 75% usags

Evaluating History Mechanism Xmosaic, 6 weeks new URL : 42%, revisting URL: 58%

System Software Research Lab.

71997

Strong Regularites in Web Surfing click per sites --> inverse Gaussian average clicks 8.32, typical case : 1 click

Shared User Behaviour DEC, Korean National Proxy, Virginia Tech, AOL Median file size : 2KB, Mean file size : 27 KB 25 % server : 80-95% requests, 90% bytes : 25% servers

Characterizing WWW Queries CGI : 4 % (KNP), 9 % (AOL), 12 % (VT) 99% queries : simple

Web Facts and Fantasy Educational (Harvard, Rice), Business (BUS, ISP, FSS, AE), Info (GOV, PROF) Characterization of Sites ( size of the site, diversity of users, user access patterns)

Renovational growth ( Business )

System Software Research Lab.

81998

Size growth ( Eudcational sites) Visit growth by the same user ( Information sites ) Attraction ( Adult Entertainment : Search Engine)

CGI : low requests, low traffic : counter, login, search engine Peak Activity : network bottlneck

Generating representative Web workloads SURGE

file size : body (lognormal), tail (Pareto), popularity : Zipf, request size: Pareto, reading times : Pareto, ….

realistic benchmark : HTTP-NG

System Software Research Lab.

9Conclusion

Dynamic Web --> Several Invariants file popularity, file size, # of request per user, site popularity, life span, request type….

Future Research Relation between file popularity and reoccurence rate User’s navigation paths

Recommended