DataMining Shahjad

Embed Size (px)

Citation preview

  • 8/18/2019 DataMining Shahjad

    1/14

     

  • 8/18/2019 DataMining Shahjad

    2/14

    OverviewIntroduction

    Explanation of Data Mining Techniques

    AdvantagesApplications

    Privacy

  • 8/18/2019 DataMining Shahjad

    3/14

    Data MiningWhat is Data Mining

    !The process of se"i auto"atically analy#ing largedata$ases to %nd useful patterns&

    !Atte"pts to discover rules and patterns fro" data&Areas of 'se Internet ( Discover needs of custo"ersEcono"ics ( Predict stoc) prices*cience ( Predict environ"ental changeMedicine ( Match patients with si"ilar pro$le"s  cure

  • 8/18/2019 DataMining Shahjad

    4/14

    Exa"ple of Data Mining+redit +ard +o"pany wants to discover infor"ation

    a$out clients fro" data$ases, Want to %nd-+lients who respond to pro"otions in !.un) Mail&

    +lients that are li)ely to change to another co"petitor+lients that are li)ely to not pay

    *ervices that clients use to try to pro"ote servicesa/liated with the +redit +ard +o"pany

    Anything else that "ay help the +o"pany provide0pro"ote services to help their clients and ulti"ately"a)e "ore "oney,

  • 8/18/2019 DataMining Shahjad

    5/14

    Data Mining 1 Data WarehousingData Warehouse- !is a repository 2or archive3 of

    infor"ation gathered fro" "ultiple sources4 storedunder a uni%ed sche"a4 at a single site,&

    +ollect data  *tore in single repositoryAllows for easier query develop"ent as a single

    repository can $e queried,

  • 8/18/2019 DataMining Shahjad

    6/14

    Data Mining Techniques+lassi%cation

    +lustering

    5egressionAssociation 5ules

  • 8/18/2019 DataMining Shahjad

    7/14

    +lassi%cation+lassi%cation- 6iven a set of ite"s that have several classes4

    and given the past instances 2training instances3 with theirassociated class4 +lassi%cation is the process of predicting theclass of a new ite",

     Therefore to classify the new ite" and identify to which classit $elongs

    Exa"ple- A $an) wants to classify its 7o"e 8oan +usto"ersinto groups according to their response to $an)advertise"ents, The $an) "ight use the classi%cations!5esponds 5arely4 5esponds *o"eti"es4 5esponds9requently&,

     The $an) will then atte"pt to %nd rules a$out the custo"ersthat respond 9requently and *o"eti"es,

     The rules could $e used to predict needs of potentialcusto"ers,

  • 8/18/2019 DataMining Shahjad

    8/14

     Technique for+lassi%cationDecision:Tree +lassi%ers

     .o$

    Inco"e

     .o$

    Inco"e Inco"e

    +arpenterEngineer Doctor

    ;ad 6ood ;ad 6ood ;ad 6ood

    B>? 

    BC>? 

    B>>? 

    Predicting credit ris) of a person with the o$s

  • 8/18/2019 DataMining Shahjad

    9/14

    +lustering !+lustering algorith"s %nd groups of ite"s that are

    si"ilar, F It divides a data set so that records withsi"ilar content are in the sa"e group4 and groups

    are as diGerent as possi$le fro" each other, & 2H3

    Exa"ple- Insurance co"pany could use clustering togroup clients $y their age4 location and types ofinsurance purchased,

     The categories are unspeci%ed and this is referred toas unsupervised learningJ

  • 8/18/2019 DataMining Shahjad

    10/14

    +lustering6roup Data into +lusters

    *i"ilar data is grouped in the sa"e clusterDissi"ilar data is grouped in the sa"e cluster

    7ow is this achieved ?:Kearest Keigh$or A classi%cation "ethod that classi%es a point $y

    calculating the distances $etween the point and pointsin the training data set, Then it assigns the point to the

    class that is "ost co""on a"ong its ):nearestneigh$ors 2where ) is an integer3,2H3

    7ierarchical 6roup data into t:trees

  • 8/18/2019 DataMining Shahjad

    11/14

    Advantages of DataMiningProvides new )nowledge fro" existing data

    Pu$lic data$ases6overn"ent sources +o"pany Data$ases

    Old data can $e used to develop new )nowledge

    Kew )nowledge can $e used to i"prove services or products

    I"prove"ents lead to- ;igger pro%tsMore e/cient service

  • 8/18/2019 DataMining Shahjad

    12/14

    'ses of Data Mining*ales0 Mar)etingDiversify target "ar)et Identify clients needs to increase response rates

    5is) Assess"ent Identify +usto"ers that pose high credit ris)

    9raud Detection Identify people "isusing the syste", E,g, People who

    have two *ocial *ecurity Ku"$ers

    +usto"er +are Identify custo"ers li)ely to change providers Identify custo"er needs

  • 8/18/2019 DataMining Shahjad

    13/14

    Privacy +oncernsEGective Data Mining requires large sources of data To achieve a wide spectru" of data4 lin) "ultiple data

    sources

    8in)ing sources leads can $e pro$le"atic for privacy asfollows- If the following histories of a custo"er werelin)ed-*hopping 7istory+redit 7istory;an) 7istory

    E"ploy"ent 7istory

  • 8/18/2019 DataMining Shahjad

    14/14

    Thank you