Download pptx - What could hadoop do for us

Transcript
Page 1: What could hadoop do for us

Hadoop 能為我們做什麼 ?- 觀察、想像、對未來的投資

徐瑞興 Simon Hsu

2014年 6月 20 日

Page 2: What could hadoop do for us

2

Page 3: What could hadoop do for us

3

Page 4: What could hadoop do for us

4

About Me

• 徐瑞興 (Simon Hsu)– 成功大學資工系 98 級 / 成功大學電通所 100 級

• 研究所時期開始接觸 Hadoop ( 高效能平行分散系統實驗室 )– “A Transparent Approach to Run

MapReduce Programs on Collaborative Hadoops” – IEEE BigData 2014

• 曾於鴻海 - 中央資訊總處研發部門– 負責集團 - Hadoop 產品維運 / 開發

• 現於精誠資訊 -Etu( 知意圖 ) 擔任技術經理– 從事 Hadoop (Etu/Cloudera) 相關解決方案 / 產品研發

Page 5: What could hadoop do for us

5

Overview

• Hadoop & Big Data• Hadoop

– HDFS / MapReduce Workflow– Hadoop Ecosystem Tools Introduction

• Resources

Page 6: What could hadoop do for us

6

Her ( 雲端情人 )

http://www.huffingtonpost.com/marshall-fine/movie-review-iheri_b_4459420.html

Page 7: What could hadoop do for us

7

We live in an age of rapid change..

http://b0.rimg.tw/ciltw/22fde0c3.jpghttp://pic.pimg.tw/fyu45/1358000174-162486648.jpg

(Before Steve jobs released iPhone in 2007..)

Page 8: What could hadoop do for us

8

We live in an age of rapid change..

2007

http://www.computerhistory.org/atchm/steve-jobs/http://www.businessinsider.com.au/yahoo-wants-to-keep-users-engaged-with-one-shop-shop-mobile-video-app-2013-9

Page 9: What could hadoop do for us

9

Jerry’s Siri Screenshots

Page 10: What could hadoop do for us

10

你對 Big Data 存在什麼想像 ?

• Etu :大數據時代篇– https://www.youtube.com/watch?v=wc2durk8p9o

Page 11: What could hadoop do for us

11http://media2.hpcwire.com/datanami/hadoopelephant.jpg

Page 12: What could hadoop do for us

12

Transcedence ( 全面進化 )

http://moviefloss.com/transcendence-movie-review-human-one-day-computer/

Page 13: What could hadoop do for us

13

• Stephen Hawking: The creation of true AI could be the 'greatest event in human history‘

史蒂芬 · 霍金 的 < 全面進化 > 觀後感

http://www.independent.co.uk/news/science/stephen-hawking-transcendence-looks-at-the-implications-of-artificial-intelligence--but-are-we-taking-ai-seriously-enough-9313474.html

Page 14: What could hadoop do for us

14

says a group of leading scientists..

• 人工智慧或許是人類歷史上最大的事件,而且還有可能是最後的事件– Success in creating AI would be the biggest event in human

history. Unfortunately, it might also be the last, unless we learn how to avoid the risks

• 當電腦有一天可以自己讀懂文章時,會發生什麼事情 ?• 當每件物品都具備聯網能力時,會發生什麼事情 ?

http://www.independent.co.uk/news/science/stephen-hawking-transcendence-looks-at-the-implications-of-artificial-intelligence--but-are-we-taking-ai-seriously-enough-9313474.html

Page 15: What could hadoop do for us

15

從物聯網角度看 Big Data

• < 科學月刊 >2014.6 月號– 物聯網

• 智慧醫療• 智慧農業

Page 16: What could hadoop do for us

16

物聯網 - IoT 架構及關鍵技術

http://scimonth.blogspot.tw/2014/05/blog-post_3117.html

Page 17: What could hadoop do for us

17

智慧醫療

< 科學月刊 > 2014 年 6 月號

Page 18: What could hadoop do for us

18

智慧農業 (1/2)

< 科學月刊 > 2014 年 6 月號

Page 19: What could hadoop do for us

19

智慧農業 (2/2)

< 科學月刊 > 2014 年 6 月號

Page 20: What could hadoop do for us

20

FamilyAsyst @Computex 2014

Page 21: What could hadoop do for us

21

FamilyAsyst Screenshots

Page 22: What could hadoop do for us

22

3Vs in Big Data

http://www.geektime.com/2013/10/24/the-3-vs-of-big-data-and-their-technologies/

Page 23: What could hadoop do for us

23

Brief of Hadoop

• Hadoop 之父 – Doug Cutting

• Hadoop– 特點

• 為批次處理,大量運算而生• 儲存成本低• 運算就資料 (Locality)

– 主要架構• HDFS

– 分散式儲存檔案系統• MapReduce

– 分散式運算框架

http://www.cnbc.com/id/100769719

Page 24: What could hadoop do for us

24

Relations between Hadoop and Google

• The Google File System– 2003 年 SOSP 會議

• MapReduce : Simplified Data Processing on Large Cluster– 2004 年 OSDI 會議

• Bigtable : A Distributed Storage System for Structured Data– 2006 年 OSDI 會議

Hadoop Distributed File System (Storage)

MapReduce framework (Processing)

HBase (Database)

Hadoop Community

Page 25: What could hadoop do for us

25

HDFS

• NameNode– 組成及功能

• 檔案索引 (FileSystem Image)– File Index (with meta data)– Mapping of File and Block– Locations of each Block

• 操作紀錄 (Journal)– Operations of Namespace

– 定時與 DataNode 連線監測狀態• 連線情況• 儲存空間使用情況

NameNode

FileSystem Image Journal

Logs of creating, deletion, rename of the namespace

Root

DirDir Dir

File

Block Block

Page 26: What could hadoop do for us

26

HDFS

• DataNode– 存放 Block 資料內容

• 每個 Block 預設大小 : 128MB– 定時與 NameNode 連線監測狀態

• 連線情況• 儲存空間使用情況• 回報 Block 列表給 NameNode

Page 27: What could hadoop do for us

27

一個幫助理解 HDFS 的概念

Local file system(/home/simon/testinput)

DFS Shell / DFS API

NN DN DN

HDFS/user/simon/testinput

• 要讓 Hadoop 幫你工作,要先上傳檔案到 Hadoop 認識的檔案系統

Page 28: What could hadoop do for us

28

Hadoop Data Distribution

http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/hdfs-and-mapreduce.html

Logical View Physical View

Page 29: What could hadoop do for us

29

HDFS – 寫檔流程

NameNode(replication factor : 2)

Agent

File 1

DataNode 1 DataNode 2 DataNode 3

告訴 NameNode 即將上傳 File1 ( 包含檔案大小、檔案類型等資訊 )1

根據 Data Block 備份機制,Block 1 選出 DN1 、 DN2 存放Block 2 選出 DN2 、 DN3 存放

3

根據分散式儲存理念,將 File1 分成多個 Block 存放 ( 此例為 Block 1 及 Block 2)

2

NameNode 回傳一個輸出流供 Agent 寫入 ( 內容包含待寫入的 Block 及 DN 資訊 )

主節點 ( 檔案索引 )

資料節點 ( 資料區塊 )

資料節點 ( 資料區塊 )

資料節點 ( 資料區塊 )

4

File 1

Block 1

Block 2=

Block 1

Block 2Block 2Block 1

Page 30: What could hadoop do for us

30

HDFS – 讀檔流程

NameNode(replication factor : 2)

Agent

DataNode 1 DataNode 2 DataNode 3

向 NameNode 提出要下載 File1需求 ( 透過 DFS Shell / DFS API )1

透過 NameNode 資訊,查到各 Block 相關存放位置Block1 放在 DN1 、 DN2Block2 放在 DN2 、 DN3

3

透過 NameNode 資訊,查到 File1 分為 Block1 、 Block2 存放2

NameNode 回傳一個輸入流( 內容包含上述資訊 )

主節點 ( 檔案索引 )

資料節點 ( 資料區塊 )

資料節點 ( 資料區塊 )

資料節點 ( 資料區塊 )

4

File 1

Block 1

Block 2=

Block 1

Block 2Block 2Block 1

Page 31: What could hadoop do for us

31

Then, How MapReduce work?

http://joyreactor.com/post/821302

Page 32: What could hadoop do for us

32

MapReduce

• JobTracker ( 指揮工作者 )– JobTracker 將 map 和 reduce 的執行工作,依 Locality 、

Feedbacks of heartbeat (failure node / faster node) 進行排程後,指派給 TaskTracker 上的 map worker 或 reduce worker• TaskTracker ( 實際工作者 )

– 預設一個 TaskTracker 上可執行 2 個 worker (map worker or reduce worker)

• 每個 worker 接受 JobTracker 的指派工作類型,執行 map function 或 reduce function

Page 33: What could hadoop do for us

33

用一個例子解釋 MapReduce 概念

南區分行中區分行北區分行第 1 ~ 300 號客戶帳戶明細 第 301 ~ 600號客戶帳戶明細

第 601 ~ 900號客戶帳戶明細map mapmap

統計結果reduce

瑞興銀行想統計全省客戶總資產

Page 35: What could hadoop do for us

35

WordCount Example

Hi, be a winner, do not be a loser.

map

map

map

Hi, 1be, 1a, 1

winner, 1do, 1not, 1

be, 1a, 1loser, 1

reduce

reduce

a, 2be, 2do, 1loser, 1

not, 1winner, 1Hi, 1

a, 2be, 2do, 1loser, 1not, 1winner, 1Hi, 1

Page 36: What could hadoop do for us

36

Hadoop Ecosystem (Still growing rapidly)

Hadoop Distributed File System (File System)

MapReduce(Processing)

Sqoop / Flume(Data Integration)

Pig / Hive(Analytical language)

Mahout(Data mining) ….

HBase(Database)

Zookeeper(Lock

service)

Page 37: What could hadoop do for us

37

Example : MapReduce vs Hive

Map/Reduce

Hive

Page 38: What could hadoop do for us

38

Map/Reduce

Pig

Example : MapReduce vs Pig

“About 40% of M/R jobs in Yahoo are written using Pig “

Page 39: What could hadoop do for us

39

• Sqoop 是 Hadoop Eco-System 中,用來存取大量數據與資料的工具,主要功能:1. 從 RDBMS 匯入資料到 HDFS /

Hbase / Hive2. 從 HDFS / Hbase / Hive 匯出資料到 RDBMS

map-only job

Sqoop

http://blog.cloudera.com/blog/2012/01/apache-sqoop-highlights-of-sqoop-2/

Page 40: What could hadoop do for us

40

Hadoop at glance..

http://ambuj4bigdata.blogspot.tw/2014/05/hadoop-at-glance.html

Page 41: What could hadoop do for us

41

後記• 我的租屋廣告觀察經驗分享

– 累積瀏覽人次比較– 行動版與電腦版 -瀏覽人次比較 (5X1租屋 )– 各時段瀏覽人次統計

• 如何爬資料,請參考 :– 資料爬理析 Python 實戰班

• http://www.etusolution.com/DSP/edm_dsp_ETL2.html

http://goo.gl/gYNFW1http://simonhsu.github.io/rent/

MHTML ¤å¥ó

Page 42: What could hadoop do for us

42

About Etu

• Etu - Big Data Solution – 軟硬一體機產品

• Etu Appliance - 運算與儲存並具的 Big Data 處理平台– 客戶對象分類

• 電信業– TQuery - 電信巨量資料多樣查詢平台

• 電子商務– Etu Recommender - 精準推薦和消費者行為分析平台

• 製造業• 媒體業• Any …

Page 43: What could hadoop do for us

43

可能會有幫助的 - 教學資源 / 資訊• Taiwan Hadoop User Group

– https://www.facebook.com/groups/hadoop.tw• Hadoop Taiwan ( 每季一次 workshop)

– http://www.hadoop.tw/• Hadoop Weekly (Mailing List)

– http://www.hadoopweekly.com/• Experfy (Big Data版的” 5945”)

– https://www.experfy.com/• Top Coder

– http://www.topcoder.com/

Page 44: What could hadoop do for us

44

Resources from Etu

• Etu Taiwan– 藍衣人月刊

• (ex.) 客戶送我們的禮物:常見的 Hadoop 十大應用誤解– Hadoop 直通學習地圖 (教育訓練 )

• 學生免費 (來電確認 )• http://www.etusolution.com/index.php/tw/product-and-servi

ces/etu-services/training-service– EHC (Hadoop 競賽 )

• https://www.youtube.com/watch?v=OWVsmVu_PV8– DSP (Data Scientist Program)

• Etu 與 CfT (Code for Tommorrow ) 合辦• http://datasci.co/

Page 46: What could hadoop do for us

318, Rueiguang Rd., Taipei 114, TaiwanSimon Hsu – Technical [email protected]

Thank you


Recommended