18
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 성 성

Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Embed Size (px)

Citation preview

Page 1: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Discovering Web Access Patterns and Trends by Applying OLAP

and Data Mining Technology on Web logs

Data Engineering Lab

성 유 진

Page 2: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Abstract

Web server log files analysis • server performance improvement• system performance improvement• customer targeting in electronic commerce

problem and difficulty• large raw log data processing is not easy• data reduce

• size and time

Page 3: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

• current weglogminer • slow, inflexible, difficult to maintain

• only frequency count not enough WebLogMiner

• Virtual University/data mining WeblogMiner• OLAP and data mining technique• multi-dimensional data cube• scalability, interactivity, variety, flexibility

Page 4: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Design of a Web log Miner

Web log server log file information• domain name of the request / user name / date and time of

the request / the method of the request(GET, POST) / the name of the file requested / the result of the request(success, failure, error, etc) / size of the data sent back / the URL of the referring page / identification of the client agent

• Example210.114.3.64 - - [01/Jul/1998:17:34:05 0900]"GET/~yjsung/sign.htmlHTTP/1.1" 200 740

210.114.3.64 -- [01/Jul/1998:17:38:44-0900]"POST/cgi-bin/yjsung/signHTTP/1.1" 200 352

POST : 브라우저가 채워진 양식을 서버에 전달 할 때 GET : 서버로부터의 데이터 요청 시

Page 5: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

• Cache information • frequent backtracking and reload : deficient design

– client site log

• Access count• not always the measure of interestingness

– 특정 document 를 access 하기 위해 반드시 거쳐야하는 사이트

• Time and Date • evaluate user interest by time spent

• Domain name • Sequence of requests can predict next request

improve traffic

Page 6: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

.Filtering the data, creating relational DB

2. Data cube construction

3. OLAP is used

4. Data mining technique are used

WebLogMiner 4 Stages

Page 7: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

1.DATABASE CONSTRUCTION FROM SERVER LOG FILES

Data Cleansing and Transformation• filter out page graphics(sound and video) but 보존• two types

• without knowledge about site– (time day, month, year 등으로의 transformation 은 서버 정보

없이 가능 )

• with knowledge about site : – associating server request to intended action needs site structure

• relation database• cleaned data and new implicit data is added

Page 8: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

2.MULTI-DIMENSIONAL WEB LOG DATA

CUBE CONSTRUCTION AND MANIPULATION Data Cube

• group by operator in SQL is used to compute aggregates on a set of attributes

sum of sales by P, C: for each product, give a breakdown on how much of it was sold to each customer

• CUBE is the n-dimensional generalization of group-by• gives remarkable flexibility to manipulate and view the

data• allow OLAP operation such as drill-down, roll-up,

slice and dice

Page 9: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

•Attributes - URL - domain name

- size of resource,

- time

. . .

•Attributes - URL - domain name

- size of resource,

- time

. . .

Page 10: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

3.DATA MINING ON WEB LOG DATA CUBE

AND WEB LOG DATABASE Data Characterization

• find rule that summarize user defined data set☞ the traffic on a web server for a given type of media

in a particular time of day Class comparison

• discover discriminant rules ☞ compare requests from two different web browsers

Association • discover the patterns that access to different

resources consistently occurring together Prediction

☞ access to a new resource on a given day can be prediected based on accesses to similar old resources on similar days

Page 11: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Classification • can be used to develop a better understanding of

each class in the web log database, and perhaps restructure a web sit or customize answers to requests based on classes of requests

Time-series analysis - • to analyze data along time sequences to discover

time-related interesting patterns …☞ disclose the patterns and trends of the

improvement of services of the web server

Focus will be on time-series analysis because web log records are highly time-related

Page 12: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Experiments with the web log miner Virtual-U:six different major component: Goal - understand the usage and user

behavior patterns

Data Cleaning and transformations• all entries were mapped one on one into

relational database• field site, user action are added.• Problem

– extraneous information => define those entries and eliminate them

– multiple server requests by same user action– same server request by multiple user actions– local activities are not recorded

Page 13: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진
Page 14: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Multi-dimensional data cube construction manipulation• summarization(group-bys on different

dimensions)• request/domain /event/session/bandwidth/error/referring organization /browser summary

ExamplesFigure2) OLAP analysis of Web log

Page 15: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Fig3) Typical event sequence and user behavior pattern analysis

Fig4) Web traffic analysis of Web log

Page 16: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진
Page 17: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

•Fig6) Event trees of month one to four

Page 18: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Discussion and Conclusion

WebLogMiner• OLAP and data mining technique• multi-dimensional data cube• major strength

• scalability, interactivity, variety, flexibility

Current log file 의 문제점• web server should collect more information• new structure is needed ==> would

simplify pre-processing