Data Lake,beyond the Data Warehouse

Preview:

Citation preview

Data Lake, beyond the Warehouse

1 Cheow Lan Lake, Thailand

โกเมษจันทวิมลFebruary, 3, 2016

Komes Chandavimol

Data Science Thailand Meetup#4

Shifting to the 3rd gen platform with Data Lake

2http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

The Growth of Data

3http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

4http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

Can these tools support Big Data?

Spreadsheet? Database? Data Mart? Data Warehouse?

5Source: Forrester Research’s James Kobielus

The Emergence of Big Data Tools

6http://blogs.forrester.com/category/hadoop

http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf

HADOOP

7http://opensource.com/life/14/8/intro-apache-hadoop-big-data

Analytics 3.0

Data Mining Tools

8

Data Discovery and Visualization Tools

Tableu.com, RapidMiner.com

How to apply to current environment?

9http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Traditional Data Warehouse

10http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

New Data Management Architecture

11http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

New Data Management Architecture

12http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Data Lake

13

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Data Lake

A single place to store every type of data in its native format with no fixed limits on account size or file size, high throughput to increase analytic performance and native integration with the Hadoop ecosystem.

15

Reference: James Serra's Blog

Data Lake Development with Big Data , Pradeep Pasupuleti (2015)https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Data Lake Processes

16

www.emc.com

Data Lake and Data Warehouse

17Hadoop Distributed Compared,BlazeClan Technology,2015

Data Lake and Data Warehouse

18Hadoop Distributed Compared,BlazeClan Technology,2015

Data Lakes

19

http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key- differences.html

Data Lake

Type of Data Raw Data Derived Data Aggregated Data

Type of Environment Discovery Environment Production Environment

20The Definition of Data Lake, John O’Brien(2015)

How the Data Lake works?

21http://www.clearpeaks.com/blog/category/tableau

Traditional Enterprise Data warehouse

New Data Management Architecture

22http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

23http://www.kdnuggets.com/2014/05/big-data-landscape-v30-

analyzed.html

Data Lake Maturity

25The Definition of Data Lake, John O’Brien(2015)

4 Maturity Stages of Data Lake

Stage 1 – Pilot Project (Understand the Technology) Stage 2 – Productionize Hadoop and its capabilities Stage 3 – Proactive consolidate data to (Big) Data Analytics Stage 4 – Platform the Data Lake to Core Competency

26The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Stage 1 – Pilot Project

Handling data at scale Involves getting the plumbing in place and learning to acquire

and transform data at scale. The analytics may be quite simple, but much is learned about

making Hadoop work the way you desire.

27The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Stage 2– Productionize Hadoop and its capabilities

Involves improving the ability to transform and analyze data. Find the tools that are most appropriate to their skillset Acquiring more data and build applications.

28The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Stage 3 – Proactive consolidate data to (Big) Data Analytics

Involves getting data and analytics into the hands of as many people as possible.

It is in this stage that the data lake and the enterprise data warehouse start to work in unison, each playing its role.

Started with a data lake eventually added an enterprise data warehouse to operationalize its data.

29The Definition of Data Lake, John O’Brien(2015)

Putting the Data Lake to Work, Teradata, Hortonworks (2015)

Big Data Analytics

30http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html

Data Lake and Big Data Analytics

31http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/

Stage 4 – Platform the Data Lake to Core Competency

Enhance Enterprise Capabilities are added to the data lake. Few companies have reached this level of maturity, but many

will as the use of big data grows, Require Data governance, compliance, security, and auditing

(and incorporate to Company Data Strategy)

32

The Technology of the Business Data Lake, Capgemini (2013)

Business Data Lake

33

The Technology of the Business Data Lake, Capgemini (2014)

34https://shefsite.files.wordpress.com/2014/04/where.jpg

35

36

http://image.slidesharecdn.com/mapr-db-in-hadoop-nosql-overview-150929062856-lva1-

app6892/95/maprdb-the-first-inhadoop-document-database-12-638.jpg?cb=1443536326

37http://www.predictiveanalyticstoday.com/waterline-data-

self-service-for-the-hadoop-data-lake/

The Data Lake Unifies Data Discovery, Data Science, and BI 3.0

38

Big Data

Self Serve BusinessData Science

Machine Learning

Visual AnalyticsBusiness Discovery

Deep Learning

Self Serve Business

Hadoop

Feature Engineering

Spark

Business Intelligence 3.0

YARN

Predictive AnalyticsHive

Data Lake

Data Visualization

Graph Analytics

Big Data

20+ posts relates to “Data Lake” Type “Data Science Thailand” “Data Lake”

40

41

42http://www.clearpeaks.com/blog/category/tableau

Traditional Enterprise Data warehouse

Questions?

43

44