56
William Tai Aug 26, 2016 60 分分分分分 Big Data 分分分分分 分分分分分分分分分

2016 Ideathon Big Data Introduction

Embed Size (px)

Citation preview

Page 1: 2016 Ideathon Big Data Introduction

William TaiAug 26, 2016

60 分鐘學會用 Big Data 爬天氣資訊 來想明天該買什麼菜

Page 2: 2016 Ideathon Big Data Introduction

WELL… THE PURPOSE MAY BE…

1. 協助 Idea 的發想發展

2. 協助 Idea 的執行 (Prototyping)

Page 3: 2016 Ideathon Big Data Introduction

WELL… THE PURPOSE MAY BE…

1. 協助 Idea 的發想發展

2. 協助 Idea 的執行 (Prototyping)

3. Share My Learned/Learning Interests

Page 4: 2016 Ideathon Big Data Introduction

WELL… THE PURPOSE MAY BE…

1. 協助 Idea 的發想發展

2. 協助 Idea 的執行 (Prototyping)

3. Share My YOUR Learned/Learning Interests

Page 5: 2016 Ideathon Big Data Introduction

WHO I AM …1. William Tai2. PLS Partner Integration Team

• Integration• Marketplace• API Management

3. Interests in Technology• AWS/Docker• Spark/Machine Learning

Page 6: 2016 Ideathon Big Data Introduction

AGENDA

1. Big Data 是什麼 ? 業界遇到什麼問題 , 都拿 Big Data 來做些什麼 ?2. Web Crawler 101 - 單機作業 (Python/Java/R) - 單機上開發 Spark - 透過 Spark Cluster 進行平行運算3. 基本概念 - Map and Reduce - 讀 Spark 時 , 常會碰到這個字 : "Lambda" function - Spark 官方文件4. 還有其他那些正經用途

Page 7: 2016 Ideathon Big Data Introduction

BIG DATA 是什麼 ? 業界遇到什麼問題 , 都拿 BIG DATA 來做些什麼 ?

Page 8: 2016 Ideathon Big Data Introduction

http://www.iis.sinica.edu.tw/~swc/talk/data_science_overview.html

Page 9: 2016 Ideathon Big Data Introduction

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Page 11: 2016 Ideathon Big Data Introduction

http://www.iis.sinica.edu.tw/~swc/talk/data_science_overview.html

Page 12: 2016 Ideathon Big Data Introduction

https://www-935.ibm.com/services/multimedia/use_of_big_data.pdf

Page 13: 2016 Ideathon Big Data Introduction

I THINK IT IS ABOUT THE POWER OF…

分散式儲存系統 分散式運算叢集

Page 14: 2016 Ideathon Big Data Introduction

http://www.slideshare.net/chaoyu0513/hadoop-con-2015-hadoop-enables-enterprise-data-lake

Hadoop con 2015 hadoop enables enterprise data lake

Page 15: 2016 Ideathon Big Data Introduction

http://fredbigdata.blogspot.tw/2013/06/big-data-lifecycle.html

Page 16: 2016 Ideathon Big Data Introduction

USE CASES

Page 17: 2016 Ideathon Big Data Introduction

http://www.iis.sinica.edu.tw/~swc/talk/data_science_overview.html

Page 18: 2016 Ideathon Big Data Introduction

http://www.iis.sinica.edu.tw/~swc/talk/data_science_overview.html

Page 19: 2016 Ideathon Big Data Introduction

https://www.youtube.com/watch?v=yym4DGfZDt8

Page 20: 2016 Ideathon Big Data Introduction

WHAT IS SPARK?

Page 21: 2016 Ideathon Big Data Introduction

https://weidongzhou.wordpress.com/2015/09/08/hadoop-hdfs-mapreduce-and-spark-on-big-data/

Page 22: 2016 Ideathon Big Data Introduction

https://thestack.com/world/2015/04/29/faster-reporting-with-hadoop/

Page 23: 2016 Ideathon Big Data Introduction

WEB CRAWLER 101天氣與菜價

Page 24: 2016 Ideathon Big Data Introduction

農產品批發市場交易行情站

http://amis.afa.gov.tw/veg/VegProdDayTransInfo.aspx

Page 25: 2016 Ideathon Big Data Introduction

1. 整理 /產出資料表列2. 運算與處理

Page 26: 2016 Ideathon Big Data Introduction

http://opendata.cwb.gov.tw/index

Page 27: 2016 Ideathon Big Data Introduction
Page 28: 2016 Ideathon Big Data Introduction
Page 29: 2016 Ideathon Big Data Introduction

http://funtop.tw/vegetable-price/

Page 30: 2016 Ideathon Big Data Introduction

即時與長期菜價變化資訊

https://www.taiwanstat.com/realtime/vegetable-price/

Page 31: 2016 Ideathon Big Data Introduction

THE STEPS WE JUST WALKED THROUGH

1. 整理 /產出資料表列2. 運算與處理3. 儲存結果4. 設計呈現

Page 32: 2016 Ideathon Big Data Introduction

I THINK IT IS ABOUT THE POWER OF…

分散式儲存系統 分散式運算叢集

Page 33: 2016 Ideathon Big Data Introduction

WEB CRAWLER 101COOKPAD

- Demo -

Page 34: 2016 Ideathon Big Data Introduction

將 COOKPAD WEB CRAWLER 改為使用 SPARK

- 作業 -

Page 35: 2016 Ideathon Big Data Introduction

HOW SPARK WORKSWORD COUNT

- Demo -

Page 36: 2016 Ideathon Big Data Introduction

http://www.slideshare.net/chaoyu0513/etu-solution-day-2014-16-9-trackdimpalaandspark

Page 37: 2016 Ideathon Big Data Introduction
Page 38: 2016 Ideathon Big Data Introduction
Page 39: 2016 Ideathon Big Data Introduction
Page 40: 2016 Ideathon Big Data Introduction
Page 41: 2016 Ideathon Big Data Introduction
Page 42: 2016 Ideathon Big Data Introduction
Page 43: 2016 Ideathon Big Data Introduction

基本概念- MAP AND REDUCE- LAMBDA FUNCTION

Page 44: 2016 Ideathon Big Data Introduction

>>> REDUCE(LAMBDA X,Y: X+Y, [47,11,42,13]) 113

http://www.python-course.eu/lambda.php

Page 45: 2016 Ideathon Big Data Introduction
Page 46: 2016 Ideathon Big Data Introduction

WORD COUNT GOES TO AWS

- Demo -

Page 47: 2016 Ideathon Big Data Introduction

Confidential © 2013 Trend Micro Inc. 47

Page 48: 2016 Ideathon Big Data Introduction

Confidential © 2013 Trend Micro Inc. 48

SPARK 指令執行spark-submit --driver-memory 2g  --verbose --master local --executor-memory 2048m --num-executors 1 WordCount_HarryPoter_S3.py

spark-submit --driver-memory 18g  --verbose --master spark://52.197.150.195:7077 --executor-memory 4096m --num-executors 20 WordCount_HarryPoter_S3.py

Page 49: 2016 Ideathon Big Data Introduction

其他正經用途

Page 50: 2016 Ideathon Big Data Introduction

http://muyueh.com/seeall/

Page 51: 2016 Ideathon Big Data Introduction

https://www.taiwanstat.com/statistics/

Page 52: 2016 Ideathon Big Data Introduction

http://www.ithome.com.tw/guest-post/107900

Page 53: 2016 Ideathon Big Data Introduction

RESOURCE

Page 54: 2016 Ideathon Big Data Introduction

http://2016.hadoopcon.org/wp/?page_id=8

Page 55: 2016 Ideathon Big Data Introduction

SPARK SUMMIT 2014 TRAINING ARCHIVE

http://spark-summit.org/2014/training

Page 56: 2016 Ideathon Big Data Introduction

https://courses.edx.org/courses/course-v1:BerkeleyX+CS105x+1T2016/info