9
Summary of WWW Characteriza tions James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발발발 : 발발발

Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

Embed Size (px)

Citation preview

Page 1: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

Summary of WWW Characterizations

James E. Pitkow

Xerox Palo Alto Research Center

WWW Journal 99

발표자 : 노양우

Page 2: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

2Contents

Introduction Client, Proxies and Gateways, Server Traces and Analysis (distribution)

Summary 1994 1995 1996 1997 1998

Conclusion

Page 3: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

3Introduction

Growth of Web Usages representative characterization --> enjoyable Web surfing various data sets at various points ( clients, proxy and gateways, servers )

several invariants

Clients informative but rare --> browser implementation, sufficient APIs

Proxy and Gateways greater availability less concentration on characteristics --> caching algorithm

Servers traffic analysis

Page 4: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

41994

A Caching Relay for WWW DEC proxy, 4000/day from 100 users document popularity --> Zipf cache : hit (1/3), miss (1/3), invalidation (1/3)

Mosaic Will Kill Me Intel Intranet Proxies --> Images traffic

A Simple Yet Robust Caching Algorithm Georgia IT server : recency > frequency LRU : Server side cache-hit rate (80%)

Invalidation in Large Scale Object Cache Harvest Cache, Xmosaic HTML : frequently modified ( 75 days), Image (107 days)

Page 5: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

51995

Charcteristics of WWW Client-Based Traces Xmosaic, 600 users, 6 months transmission times, doc. size, doc. size versus # of requests (Pareto) unix file systems : more small and large file exists

Application Level Document Caching Explaining WWW Traffic - Self - Similarity

1 second -- 100 seconds : self-similar Busiest periods : self-similar, idle periods: non self-similar

Caching Proxies : Limitations and Potentials # of requests per sever (Zipf), CGI (0.5%)

Network Behaviour of a Busy Web Servers DEC, Congressional Election Server

images --> major traffic, inter-arrival time --> not Poission

Page 6: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

61996

WWW Cache Consistency Microsoft, BU, Harvard popularity : inverse with frequency of change image : 65 % , CGI : 9 % HTML : 50 days, GIF : 85 days

Web Server Workload Characterization University of Waterloog, Calgary, NASA, NCSA 10% documents --> 90% requests 10% domains --> 75% usags

Evaluating History Mechanism Xmosaic, 6 weeks new URL : 42%, revisting URL: 58%

Page 7: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

71997

Strong Regularites in Web Surfing click per sites --> inverse Gaussian average clicks 8.32, typical case : 1 click

Shared User Behaviour DEC, Korean National Proxy, Virginia Tech, AOL Median file size : 2KB, Mean file size : 27 KB 25 % server : 80-95% requests, 90% bytes : 25% servers

Characterizing WWW Queries CGI : 4 % (KNP), 9 % (AOL), 12 % (VT) 99% queries : simple

Web Facts and Fantasy Educational (Harvard, Rice), Business (BUS, ISP, FSS, AE), Info (GOV, PROF) Characterization of Sites ( size of the site, diversity of users, user access patterns)

Renovational growth ( Business )

Page 8: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

81998

Size growth ( Eudcational sites) Visit growth by the same user ( Information sites ) Attraction ( Adult Entertainment : Search Engine)

CGI : low requests, low traffic : counter, login, search engine Peak Activity : network bottlneck

Generating representative Web workloads SURGE

file size : body (lognormal), tail (Pareto), popularity : Zipf, request size: Pareto, reading times : Pareto, ….

realistic benchmark : HTTP-NG

Page 9: Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우

System Software Research Lab.

9Conclusion

Dynamic Web --> Several Invariants file popularity, file size, # of request per user, site popularity, life span, request type….

Future Research Relation between file popularity and reoccurence rate User’s navigation paths