Download pptx - Big Data Final

Transcript

Big Data

Big DataGV: TS V nh HiuThc hin:Phm Cng Thin LDng B CngNguyn Khc Chunginh Anh Thi1Gii thiu Big DataCc thnh phn Big DataT chc lu tr d liu BigDataGii php Big data ca Oracle2Ni dung3Gii thiu BIG DATABIG DATA ?L nhng s lng khng v cc h s khch hng, m thanh, hnh nh, vn bn

4

D liu c s lng ln cn c lu tr nhTruyn thng: thng tin khch hng, giao dchThu thp t ng qua cm bin: thi tit, nht kMng x hi: comment trn facebook, twitterc trngS lngTc a dngGi tr5BIG DATA ?Big Data6

Nhu cu lu tr ngy cng tng2000: 800000 (PB) lu tr trn th gii(*)2020: 35 ZB trn ton th gii?(*)Lm th no qun l?D liu cng ln th:Kh nng x l gim?Phn tch d liu gimTruy xut chm7Dung lng(*)S liu t IBM1ZB = 1021 bytes1PB = 1015 bytesa dngD liu n t nhiu ngun:Cm binSmart deviceMng x hiTin tcD liu phc tpTruyn thng v khng truyn thngC cu trc, bn cu trc, khng cu trc8Tc Khi lng d liu l rt lntc truy xut chmYu cu t ngi s dng:Nhanhn nhChnh xc9Tm quan trng Big DataMang ti s hiu bit su sc hn cho doanh nghipL s tn ti ca doanh nghipMang ti s hiu bit mi1011Cc thnh phn Big Data12Cc thnh phn

Cc thnh phnQun l d liu: c s h tng lu tr d liu, v ngun thao tc n.Phn tch d liu: cng ngh v cc cng c phn tch cc d liu v thu thp hiu bit su sc t nS dng d liu: a d liu ln phn tch phc v trong Kinh doanh thng minh v cc ng dng ca ngi dng cui13Qun l d liuH d liu c cu trcH thng qun l c s d liu quan h(RDBMS): lu tr v thao tc d liu c cu trc.H thng MPP: tp hp d liu s ngy cng ln thm v tng cng d liu tng trng.Kho d liu: tp hp v lu tr d liu cho cc bo co sau ny.Hn chKh m rng, hiu sut chm li.Biu din d liu14Qun l d liuH d liu khng cu trc: ph hp cho vic lu tr d liu c cu trc phc tp v d dng m rngD liuD liu c cu trc v khng c cu trcLy t nhiu ngun vi kch c khc nhauD liu thng rt ln, yu cu tc x l cao Yu cu t chc d liu p ng: Apache Hadoop

15Phn tch d liuL ni m cc cng ty bt u trch xut gi tr d liu ln.Lin quan ti vic pht trin cc ng dng v s dng cc ng dng t c ci nhn su sc vo d liu ln.Xy dng cc tool phn tch d liu

16S dng d liuL cc hot ng trn d liu c phn tch 1718T chc lu tr d liu BigDataHadoopGii thiu v Hadoop Cc thnh phn ca HadoopHDFS (Hadoop Distributed file System)

19Hadoop l g?Mt nn tng ng dng h tr cc ng dng phn tn vi d liu rt lnHng terabyteHng ngn nodeCung cp phng tin lu tr d liu trn nhiu node, h tr ti u ha lu lng mng.

20Thnh phn ca HadoopX l (MapReduce): mt framework gip pht trin cc ng dng phn tn theo m hnh MapReduce mt cch d dng v mnh m.Lu tr (HDFS): h thng file phn tn, cung cp kh nng lu tr d liu khng l v tnh nng ti u ho vic s dng bng thng gia cc node. 2122Hadoop Distributed file System

Hadoop Distributed file System23

24

Kin trc ca HDFSKin trc ca HDFSName node: ng vai tr l master ca h thng HDFS, qun l thng tin cc file, block id tng ng cho tng fileBlock: n v lu tr d liu nh nhtHadoop dng mc nh 64MB/blockMt file chia lm nhiu blockCc block cha bt k node no trong clusterDataNode: Cha cc block

25Kin trc ca HDFSJobTracker: tip nhn cc yu cu thc thi cc MapReduce job.Phn chia job v giao task cho task trackerQun l tnh trng ca tng nodeTaskTracker:Nhn cc task t jobTracker v thc hin task2627

C ch hot ng HDFScclient yu cu c d liu t Name Node, namenode tr v v tr cc block ca d liuChng trnh trc tip yu cu d liu ti cc node28C ch hot ng HDFS

GhiGhi theo dng ng ng (pipeline)client yu cu thao tc ghi Name NodeNamenode kim tra quyn ghi v m bo file khng tn tiCc bn sao ca block to thnh ng ng d liu tun t c ghi vo29C ch hot ng HDFS

u imLu tr c lng file rt lnTruy cp d liu theo dngLin kt d liu n ginPhn cng ph thng, a dngT ng pht hin li, phc hi d liu nhanhNhc imC tr truy cpKhng th lu tr qu nhiu file trn cng 1 cluster 30Hadoop Distributed file SystemTp hp cc th vin h tr cho HadoopBao gm tp cc lnhCatcopy file ti b ra chun(stdout)Chmodchuyn quyn c v ghi cho mt fileChownchuyn quyn s hu ca mt file hoc 1 tp hp file

31Hadoop CommonQun l tin trnh song song, phn tn, sp xp lch trnh I/OQun l trng thi d liuQun l s lng ln d liu c quan h ph thuc nhauX l liTru tng ha vi lp trnh vin

32MapReduce33MapReduce

34Oracle Big DataTng quan35

Oracle Big dataL s kt hp c phn cng v phn mmPhn cng:18 server SunDung lng 648TB2CPU/server, 6 nhn/CPU 216 nhn48GB RAM36Oracle Big dataPhn mmBn y ca Clouderas Distribution( bao gm c Apache Hadoop) (CDH)Cloudera manager: qun tr Cloudera CDHGi R l mt m ngun m cho vic phn tch d liu cha c x l trn Oracle Big DataOracle NoSQL databaseH iu hnh Oracle Enterprise Linux cng vi Oracle Java VM

37

Oracle Big dataCc thnh phn chnhCDH v Cloudera ManagerOracle Big data connectorsOracle Loader cho HadoopOracle Direct Connector for Hadoop Distributed file systemOracle data intergator application adapter cho HadoopOracle R connector for HadoopOracle NoSQL database

38Phn tch d liuV d:H thng bn hng online cc i tng c xc nh r rng

39

Phn tch d liu40

Phn tch d liuV d:D liu c thu thp t nhiu ngun, ko c cu trc

41

Phn tch d liu42

Ti liu tham khoBig-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society (Randal E. Bryant Carnegie Mellon University, Randy H. Katz University of California, Berkeley, Edward D. Lazowska University of Washington) Understanding the Elements of Big Data: More than a Hadoop Distribution(Martin Hall, Founder, Karmasphere)Big Data The power and possibilities of Big DataBasic Data Analysis TutorialOracle: Big Data for the enterprise43


Recommended