Upload
ramin
View
229
Download
0
Embed Size (px)
Citation preview
7/25/2019 BigData Mining Class
1/46
Big Data Mining
COSC 526 Class 1
Arvind Ramanathan
Computational Science & Engineering DivisionOak Ridge National Laboratory, Oak Ridge
P! "#$%$#%'##E%mail! ramanatana(ornl)gov
Acknowledgement: Content borrowed from WilliamCohens (CMU) class 10-0! and "tanford Mining
Massi#e datasets
mailto:[email protected]:[email protected]7/25/2019 BigData Mining Class
2/46
2
Class Logistics
Where! *in +ao Engineering -#
When! $)-$ P* to #)'- P* .u/. $%ce &o'rs! )-- P* to $)-- P* .u
Where! *in +ao #01
Who! 2rvind Ramanatan
Research interests! Computational
3iology4 5ealt 6n7ormatics4 Data 2nalytics
8it eterogeneous compute arcitectures
Email! ramanatana(ornl)gov
mailto:[email protected]:[email protected]7/25/2019 BigData Mining Class
3/46
3
Teaching Assistant
Who! 9ang Song
Research Interests:
Email! ysong0"(utk)edu
Ofcehours! .3D
Where! .3D
mailto:[email protected]:[email protected]7/25/2019 BigData Mining Class
4/46
4
What I know aout the class!
CE! Computer Eng)4 CN! Computer Net8orks4 CS! ComputerScience4 C.! Communication .eory4 *2! *atematics4 32!
3usiness 2dmin)4 N:! Nuclear Eng)4 ES! Energy Sciences4 P5!Pysics4 6E! 6ndustrial Eng)4 PS! Po8er Systems4 *E! *ecanical Eng)
DO! Doctoral4 *S! *asters4 :;! :ndergraduate
7/25/2019 BigData Mining Class
5/46
5
Intro"uctions
.ell us a bit about yourselves
7/25/2019 BigData Mining Class
6/46
6
O#ecti$es
Design and develop algoritms to analy?e
large amounts o7 data
Evaluate 5PC and distributed computing
paradigms 7or analy?ing large datasets
Develop end%to%end solutions tat can
@select, manipulate, analy?e and vie8A large%
scale datasets
Collaborate 8it domain e>perts on inter%
disciplinary areas suc as business analytics,
social sciences, biomedical and ealt
7/25/2019 BigData Mining Class
7/467
Class Wesite an" Course Materials
ttp!//8eb)utk)edu/Bramana-0/inde>)tml
6 8ill usually make te lecture notes
available prior to te class not a promise
:se te 8ebsite F all te materials are
available tere
Pia??a 7or class discussion
ttps!//pia??a)com/utk/spring'-0$/cosc$'#/ome
3lackboard 7or grades
http://web.utk.edu/~ramana01/index.htmlhttps://piazza.com/utk/spring2015/cosc526/homehttps://piazza.com/utk/spring2015/cosc526/homehttps://piazza.com/utk/spring2015/cosc526/homehttps://piazza.com/utk/spring2015/cosc526/homehttp://web.utk.edu/~ramana01/index.htmlhttp://web.utk.edu/~ramana01/index.html7/25/2019 BigData Mining Class
8/468
O$er$iew o% Class &che"ule
To'ic Date Classes
Assignments
*ap Reduce / 5adoop and Logistics o7andling large data sets Pyton! memmap
0/"/'-0$
' 5
7/25/2019 BigData Mining Class
9/469
O$er$iew o% Class &che"ule /,0
To'ic Date 1o2
classes
Assignments
;rap *ining G/G0/'-0$
'
igital 2atholog 3 Molec'lar 4io*hsics('est ,ect're: r. Chakra Chenn'bhotla)
//'-0$
0
2dvanced Programming *odels 7or datamining
/1/'-0$
' 5< G in
3ro#ect 3resentations /0#/'-0$
'
3oster 3resentations /'G/'-
0$
0 E>cept 7or orange and blue igligts, tings are not set in
stone .opics can cange based on class participation
Dates 6 am not available! Ieb 0-%0' ! 3iopysical society meetings *ar 0"%' ! 2rvind in 6ndia
7/25/2019 BigData Mining Class
10/4610
Re4uirements
Com'onents 5 6ra"e Total
5ome8ork $ GProect $- 0
In7class 4ui88es)'artici'ation
$ ,
G late%days in total 7or te 8ole semester Assignmentstake time4 start early 3ro#ect!
SigniKcant implementation eMort
Poster session at end o7 semester Peer%evaluation and udges 7rom :.+ and ORNL
7/25/2019 BigData Mining Class
11/46
11
3ro#ect Deli$erales an" Dea"lines
Deli$erale Due7"ate 5 6ra"e
6nitial selection o7 topics0, ' an ', '-0$ 0-Proect Description and 2pproac Ieb '-, '-0$ '-
6nitial Report *ar '-, '-0$ 0-
Proect Demonstration 2pr 0#%01, '-0$ 0-
Iinal Proect Report 0- pages 2pr '0, '-0$ '$
3oster /*,7*9 sli"es0 2pr 'G, '-0$ ,.
0Proects can come 8it teir o8n data e)g), 7romyour proect or can be provided
'Datasets need to be open Please dont usedatasets tat ave proprietary limitations
2ll reports 8ill be in N6PS 7ormat!ttp!//nips)cc/Con7erences/'-0G/Paper6n7ormation/
StyleIiles
http://nips.cc/Conferences/2013/PaperInformation/StyleFileshttp://nips.cc/Conferences/2013/PaperInformation/StyleFileshttp://nips.cc/Conferences/2013/PaperInformation/StyleFileshttp://nips.cc/Conferences/2013/PaperInformation/StyleFiles7/25/2019 BigData Mining Class
12/46
7/25/2019 BigData Mining Class
13/46
7/25/2019 BigData Mining Class
14/46
14
3re7re4uisites
3asic Database course! SQL Jueries, data retrieval, etc)
2lgoritms!
Dynamic programming, data structures
Statistics!
*oments, Distributions, Regression
Programming Languages!ava, Pyton H any obect oriented language
7/25/2019 BigData Mining Class
15/46
15
Course Books
ttp!//888)mmds)org/
Readings and Papers 8ill be available as part
o7 te course 8ebsite
:se7ul re7erences! *acine Learning, .om *itcell
3uilding *acine Learning Systems 8it Pyton, Ricert, Pedro%Coelo
ttp!//scikit%learn)org/stable/
ttp!//guidetodatamining)com/
ttp!//888)cs)cornell)edu/ome/kleinber/net8orks%book /
http://www.mmds.org/http://www.mmds.org/http://scikit-learn.org/stable/http://guidetodatamining.com/http://www.cs.cornell.edu/home/kleinber/networks-book/http://www.cs.cornell.edu/home/kleinber/networks-book/http://www.cs.cornell.edu/home/kleinber/networks-book/http://www.cs.cornell.edu/home/kleinber/networks-book/http://guidetodatamining.com/http://guidetodatamining.com/http://scikit-learn.org/stable/http://scikit-learn.org/stable/http://www.mmds.org/http://www.mmds.org/7/25/2019 BigData Mining Class
16/46
7/25/2019 BigData Mining Class
17/46
17
Data Mining
7/25/2019 BigData Mining Class
18/46
18
Data E'losion is ;ueling Inno$ation in
&cience< Engineering an" Business
Estimated 8orlds data in '-0-B 0)' ?ettabytes 0-'0bytes
.otal data B G$ ?ettabytes in'-'-
Data needs to be! Stored *anaged Anal=8e"
.is class
7/25/2019 BigData Mining Class
19/46
19
7/25/2019 BigData Mining Class
20/46
7/25/2019 BigData Mining Class
21/46
7/25/2019 BigData Mining Class
22/46
22
Data Mining as a Disci'line
6nterdisciplinary 8it diverse
@interactionsA
Dataases! managinglarge datasets
Machine Learning ?&tatistics! data andmodels
Theor=! 2lgoritms, inparticular Randomi?edmetods
Our class 7ocuses on!
&calailit=! 8at to do8en 8e ave @big dataA=
Algorithms! o8 to do8at 8e do 8it big data=
Architecture! 8atin7rastructure is suitable=
Databases*acine
Learning &Statistics
.eory
DataMining
7/25/2019 BigData Mining Class
23/46
23
@ow are "atasets re'resente"
&tructure" Data
Data organi?ed interms o7 records 8itKelds corresponding tospeciKc entries
E>amples! Databases relational
*L and oterstructured layouts
Data
7/25/2019 BigData Mining Class
24/46
7/25/2019 BigData Mining Class
25/46
7/25/2019 BigData Mining Class
26/46
7/25/2019 BigData Mining Class
27/46
27
@ow "o we go aout "oing this
0-1people
0,--- days
Eac person stays 0 o7 te time in a otel
0- days in 0,--- days Eac otel olds about 0-- people
0-$otels
67 everyone beaves randomly i)e), nocollusion, 8ill data mining Knd any
suspicious beavior=
7/25/2019 BigData Mining Class
28/46
7/25/2019 BigData Mining Class
29/46
7/25/2019 BigData Mining Class
30/46
7/25/2019 BigData Mining Class
31/46
7/25/2019 BigData Mining Class
32/46
7/25/2019 BigData Mining Class
33/46
7/25/2019 BigData Mining Class
34/46
7/25/2019 BigData Mining Class
35/46
7/25/2019 BigData Mining Class
36/46
7/25/2019 BigData Mining Class
37/46
7/25/2019 BigData Mining Class
38/46
7/25/2019 BigData Mining Class
39/46
7/25/2019 BigData Mining Class
40/46
7/25/2019 BigData Mining Class
41/46
7/25/2019 BigData Mining Class
42/46
7/25/2019 BigData Mining Class
43/46
7/25/2019 BigData Mining Class
44/46
7/25/2019 BigData Mining Class
45/46
7/25/2019 BigData Mining Class
46/46