Click here to load reader

מחסני נתונים ( Data Warehousing )

  • View
    68

  • Download
    9

Embed Size (px)

DESCRIPTION

מחסני נתונים ( Data Warehousing ). קורס מסדי נתונים. מחסן נתונים. מחסן נתונים הוא מסד נתונים ענק המאחסן מידע היסטורי דוגמא: שמירת המידע על כל הקניות של מוצרים בכל הסניפים של רשת סופרמרקטים דוגמא: שמירת המידע על כל שיחות הטלפון שנעשו בטלפונים של חברה מסוימת. שאילתות OLTP ו- OLAP. - PowerPoint PPT Presentation

Text of מחסני נתונים ( Data Warehousing )

  • (Data Warehousing)

  • : :

  • OLTP - OLAP Online Line Transaction Processing (OLTP), Online Line Analytical Processing (OLAP)

  • (metadata repository)

  • : (fact table) (dimension tables) :

    Sales(pid, timeid, locid, amount) Products(pid, pname, category, price)Locations(locid, city, start, country)Times(timeid, date, week, month, holiday_flag)

  • BCNF : BCNF ,

  • (Data Mining)

  • :

  • Data Mining vs. Machine Learning : : :

  • (Association Rules) : {pen} {ink}:

  • transiditem113pen113diary114pen114ink114soap114tissues

  • : LR L -R (Support): LR -L -R (Cnfidence): LR R L

  • {pen}{ink} ::: {tissues}{ink}:: {pen}{soap} ::

  • LR L R: RL -LR

  • ( )

  • s -c, -s -c : 1: ( >= s) 2: F, F -R -L LR >= c

  • ?

    ?

  • (The A Priori Property): : n n n+1

  • Freq = {}scan all transactions once and add to Freq the items that have support > sk = 1repeat foreach Ik in Freq with k itemsgenerate all itemsets Ik+1 with k+1 items, such that Ik is contained in Ik+1 scan all transactions once and add to Freq the k+1-itemsets that have support > sk++until no new frequent itemsets are found

  • = 0.7 1: :{pen}, {ink}, {diary} 2: :{pen, ink}, {pen, diary}, {pen, soap}, {pen, tissues}, {ink, diary}, {ink, soap}, {ink, tissues}, {diary, soap}, {diary, tissues} >= 0.7 :{pen, ink}, {pen, diary}

  • 3: :{pen, ink, diary}, {pen, ink, soap}, {pen, ink, tissues}, {pen, diary, soap}, {pen, diary, tissues} >= 0.7 :{pen}, {ink}, {diary}{pen, ink}, {pen, diary}

  • , {pen, tissues} - {tissues} : :

  • foreach frequent itemset Iforeach partition of I to two sets L, Rgenerate a candidate rule LRforeach transaction T in the databaseforeach candidate rule LR if L in T thenlnum(LR)++if R in T then rnum(LR)++return all rules LR withrnum(LR)/lnum(LR) > c

  • {milk}{bread} , ?

  • (Sequential Patterns) : {pen, ink, soap}, {pen, ink diary, soap}

  • {pen}, {ink, diary}, {pen, soap} {pen, ink}, {shirt}, {milk, ink, diary}, {soap, pen, diary}

    {pen}, {ink, diary}, {pen, soap} {pen, ink}, {shirt}, {soap, pen, diary}, {milk, ink, diary}

  • S -S s1, s2, ..., sn s1 s2 ... :

  • (Classification Rules) :InsuranceInfo(custid:integer, age:integer, cartype:string, highrisk:boolean) : " "

  • (Classification Rules) :(l1< X1< h1) and ... and (lk < Xk < hk)Y=c X1,...,Xk Y

  • (dependent attribute) (predictor attribute) : li
  • (18< age< 25) and (cartype in {Sports,Truck}) highrisk=true

  • : C C C1C2 C1 and C2: C1C2 C2 C1

  • ( )AgeCar TypenoYesno25SedanSports, Truck

Search related