BigData Mining Class

  • Upload
    ramin

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

  • 7/25/2019 BigData Mining Class

    1/46

    Big Data Mining

    COSC 526 Class 1

    Arvind Ramanathan

    Computational Science & Engineering DivisionOak Ridge National Laboratory, Oak Ridge

    P! "#$%$#%'##E%mail! ramanatana(ornl)gov

    Acknowledgement: Content borrowed from WilliamCohens (CMU) class 10-0! and "tanford Mining

    Massi#e datasets

    mailto:[email protected]:[email protected]
  • 7/25/2019 BigData Mining Class

    2/46

    2

    Class Logistics

    Where! *in +ao Engineering -#

    When! $)-$ P* to #)'- P* .u/. $%ce &o'rs! )-- P* to $)-- P* .u

    Where! *in +ao #01

    Who! 2rvind Ramanatan

    Research interests! Computational

    3iology4 5ealt 6n7ormatics4 Data 2nalytics

    8it eterogeneous compute arcitectures

    Email! ramanatana(ornl)gov

    mailto:[email protected]:[email protected]
  • 7/25/2019 BigData Mining Class

    3/46

    3

    Teaching Assistant

    Who! 9ang Song

    Research Interests:

    Email! ysong0"(utk)edu

    Ofcehours! .3D

    Where! .3D

    mailto:[email protected]:[email protected]
  • 7/25/2019 BigData Mining Class

    4/46

    4

    What I know aout the class!

    CE! Computer Eng)4 CN! Computer Net8orks4 CS! ComputerScience4 C.! Communication .eory4 *2! *atematics4 32!

    3usiness 2dmin)4 N:! Nuclear Eng)4 ES! Energy Sciences4 P5!Pysics4 6E! 6ndustrial Eng)4 PS! Po8er Systems4 *E! *ecanical Eng)

    DO! Doctoral4 *S! *asters4 :;! :ndergraduate

  • 7/25/2019 BigData Mining Class

    5/46

    5

    Intro"uctions

    .ell us a bit about yourselves

  • 7/25/2019 BigData Mining Class

    6/46

    6

    O#ecti$es

    Design and develop algoritms to analy?e

    large amounts o7 data

    Evaluate 5PC and distributed computing

    paradigms 7or analy?ing large datasets

    Develop end%to%end solutions tat can

    @select, manipulate, analy?e and vie8A large%

    scale datasets

    Collaborate 8it domain e>perts on inter%

    disciplinary areas suc as business analytics,

    social sciences, biomedical and ealt

  • 7/25/2019 BigData Mining Class

    7/467

    Class Wesite an" Course Materials

    ttp!//8eb)utk)edu/Bramana-0/inde>)tml

    6 8ill usually make te lecture notes

    available prior to te class not a promise

    :se te 8ebsite F all te materials are

    available tere

    Pia??a 7or class discussion

    ttps!//pia??a)com/utk/spring'-0$/cosc$'#/ome

    3lackboard 7or grades

    http://web.utk.edu/~ramana01/index.htmlhttps://piazza.com/utk/spring2015/cosc526/homehttps://piazza.com/utk/spring2015/cosc526/homehttps://piazza.com/utk/spring2015/cosc526/homehttps://piazza.com/utk/spring2015/cosc526/homehttp://web.utk.edu/~ramana01/index.htmlhttp://web.utk.edu/~ramana01/index.html
  • 7/25/2019 BigData Mining Class

    8/468

    O$er$iew o% Class &che"ule

    To'ic Date Classes

    Assignments

    *ap Reduce / 5adoop and Logistics o7andling large data sets Pyton! memmap

    0/"/'-0$

    ' 5

  • 7/25/2019 BigData Mining Class

    9/469

    O$er$iew o% Class &che"ule /,0

    To'ic Date 1o2

    classes

    Assignments

    ;rap *ining G/G0/'-0$

    '

    igital 2atholog 3 Molec'lar 4io*hsics('est ,ect're: r. Chakra Chenn'bhotla)

    //'-0$

    0

    2dvanced Programming *odels 7or datamining

    /1/'-0$

    ' 5< G in

    3ro#ect 3resentations /0#/'-0$

    '

    3oster 3resentations /'G/'-

    0$

    0 E>cept 7or orange and blue igligts, tings are not set in

    stone .opics can cange based on class participation

    Dates 6 am not available! Ieb 0-%0' ! 3iopysical society meetings *ar 0"%' ! 2rvind in 6ndia

  • 7/25/2019 BigData Mining Class

    10/4610

    Re4uirements

    Com'onents 5 6ra"e Total

    5ome8ork $ GProect $- 0

    In7class 4ui88es)'artici'ation

    $ ,

    G late%days in total 7or te 8ole semester Assignmentstake time4 start early 3ro#ect!

    SigniKcant implementation eMort

    Poster session at end o7 semester Peer%evaluation and udges 7rom :.+ and ORNL

  • 7/25/2019 BigData Mining Class

    11/46

    11

    3ro#ect Deli$erales an" Dea"lines

    Deli$erale Due7"ate 5 6ra"e

    6nitial selection o7 topics0, ' an ', '-0$ 0-Proect Description and 2pproac Ieb '-, '-0$ '-

    6nitial Report *ar '-, '-0$ 0-

    Proect Demonstration 2pr 0#%01, '-0$ 0-

    Iinal Proect Report 0- pages 2pr '0, '-0$ '$

    3oster /*,7*9 sli"es0 2pr 'G, '-0$ ,.

    0Proects can come 8it teir o8n data e)g), 7romyour proect or can be provided

    'Datasets need to be open Please dont usedatasets tat ave proprietary limitations

    2ll reports 8ill be in N6PS 7ormat!ttp!//nips)cc/Con7erences/'-0G/Paper6n7ormation/

    StyleIiles

    http://nips.cc/Conferences/2013/PaperInformation/StyleFileshttp://nips.cc/Conferences/2013/PaperInformation/StyleFileshttp://nips.cc/Conferences/2013/PaperInformation/StyleFileshttp://nips.cc/Conferences/2013/PaperInformation/StyleFiles
  • 7/25/2019 BigData Mining Class

    12/46

  • 7/25/2019 BigData Mining Class

    13/46

  • 7/25/2019 BigData Mining Class

    14/46

    14

    3re7re4uisites

    3asic Database course! SQL Jueries, data retrieval, etc)

    2lgoritms!

    Dynamic programming, data structures

    Statistics!

    *oments, Distributions, Regression

    Programming Languages!ava, Pyton H any obect oriented language

  • 7/25/2019 BigData Mining Class

    15/46

    15

    Course Books

    ttp!//888)mmds)org/

    Readings and Papers 8ill be available as part

    o7 te course 8ebsite

    :se7ul re7erences! *acine Learning, .om *itcell

    3uilding *acine Learning Systems 8it Pyton, Ricert, Pedro%Coelo

    ttp!//scikit%learn)org/stable/

    ttp!//guidetodatamining)com/

    ttp!//888)cs)cornell)edu/ome/kleinber/net8orks%book /

    http://www.mmds.org/http://www.mmds.org/http://scikit-learn.org/stable/http://guidetodatamining.com/http://www.cs.cornell.edu/home/kleinber/networks-book/http://www.cs.cornell.edu/home/kleinber/networks-book/http://www.cs.cornell.edu/home/kleinber/networks-book/http://www.cs.cornell.edu/home/kleinber/networks-book/http://guidetodatamining.com/http://guidetodatamining.com/http://scikit-learn.org/stable/http://scikit-learn.org/stable/http://www.mmds.org/http://www.mmds.org/
  • 7/25/2019 BigData Mining Class

    16/46

  • 7/25/2019 BigData Mining Class

    17/46

    17

    Data Mining

  • 7/25/2019 BigData Mining Class

    18/46

    18

    Data E'losion is ;ueling Inno$ation in

    &cience< Engineering an" Business

    Estimated 8orlds data in '-0-B 0)' ?ettabytes 0-'0bytes

    .otal data B G$ ?ettabytes in'-'-

    Data needs to be! Stored *anaged Anal=8e"

    .is class

  • 7/25/2019 BigData Mining Class

    19/46

    19

  • 7/25/2019 BigData Mining Class

    20/46

  • 7/25/2019 BigData Mining Class

    21/46

  • 7/25/2019 BigData Mining Class

    22/46

    22

    Data Mining as a Disci'line

    6nterdisciplinary 8it diverse

    @interactionsA

    Dataases! managinglarge datasets

    Machine Learning ?&tatistics! data andmodels

    Theor=! 2lgoritms, inparticular Randomi?edmetods

    Our class 7ocuses on!

    &calailit=! 8at to do8en 8e ave @big dataA=

    Algorithms! o8 to do8at 8e do 8it big data=

    Architecture! 8atin7rastructure is suitable=

    Databases*acine

    Learning &Statistics

    .eory

    DataMining

  • 7/25/2019 BigData Mining Class

    23/46

    23

    @ow are "atasets re'resente"

    &tructure" Data

    Data organi?ed interms o7 records 8itKelds corresponding tospeciKc entries

    E>amples! Databases relational

    *L and oterstructured layouts

    Data

  • 7/25/2019 BigData Mining Class

    24/46

  • 7/25/2019 BigData Mining Class

    25/46

  • 7/25/2019 BigData Mining Class

    26/46

  • 7/25/2019 BigData Mining Class

    27/46

    27

    @ow "o we go aout "oing this

    0-1people

    0,--- days

    Eac person stays 0 o7 te time in a otel

    0- days in 0,--- days Eac otel olds about 0-- people

    0-$otels

    67 everyone beaves randomly i)e), nocollusion, 8ill data mining Knd any

    suspicious beavior=

  • 7/25/2019 BigData Mining Class

    28/46

  • 7/25/2019 BigData Mining Class

    29/46

  • 7/25/2019 BigData Mining Class

    30/46

  • 7/25/2019 BigData Mining Class

    31/46

  • 7/25/2019 BigData Mining Class

    32/46

  • 7/25/2019 BigData Mining Class

    33/46

  • 7/25/2019 BigData Mining Class

    34/46

  • 7/25/2019 BigData Mining Class

    35/46

  • 7/25/2019 BigData Mining Class

    36/46

  • 7/25/2019 BigData Mining Class

    37/46

  • 7/25/2019 BigData Mining Class

    38/46

  • 7/25/2019 BigData Mining Class

    39/46

  • 7/25/2019 BigData Mining Class

    40/46

  • 7/25/2019 BigData Mining Class

    41/46

  • 7/25/2019 BigData Mining Class

    42/46

  • 7/25/2019 BigData Mining Class

    43/46

  • 7/25/2019 BigData Mining Class

    44/46

  • 7/25/2019 BigData Mining Class

    45/46

  • 7/25/2019 BigData Mining Class

    46/46