14
TABLEAU AND HADOOP Tableau’s Place in a Big Data Architecture DAMA, Tableau User Group Meeting November 13, 2014

Tableau and hadoop

Embed Size (px)

DESCRIPTION

Architecture patterns for using Tableau and Hadoop.

Citation preview

Page 1: Tableau and hadoop

TABLEAU AND HADOOPTableau’s Place in a Big Data Architecture

DAMA, Tableau User Group Meeting

November 13, 2014

Page 2: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Agenda

BI/DW Workload Categories & Tableau

Three Integration Models

Capability Models

Architecture Patterns

Summary

Q & A

2

Page 3: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Workload Categories

3

Operational BI Data Exploration Data Science

• Operational processes • Reports and dashboards• Transactional sys integration• Automatic distribution• 100s – 1,000s of consumers

• Front-line staff• Data analysts• Business leaders• Executives

• Production data prep• High availability• Report archiving• Op sys response time & SLA• Enterprise governance• Enterprise security• Self-service

• report access & Interactivity

• Decision support processes• Less strict definition• Ad hoc reports and

dashboards• Perf mgmt analysis by • 100s of users

• data analysts • business leaders

• Production & manual data prep

• Enterprise or div governance• Corporate security• Self-Service

• Query• Report/analysis

authoring• Data design• Metadata definition

• Complex data exploration • Descriptive analytics• Predictive statistical models • Machine learning algorithms• Large data volumes• Wide data variety• 10s of users

• Data scientists• Technologists

• Departmental governance• Raw data (Bus & IT)• Derivative data (Bus & IT)• Self-Service: Full

Tableau

Page 4: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Three Integration Models

Isolated Exploration Environment (aka Sandbox)

Snapshot of data cached on desktop or server

Frequency of data change is analyst dependent

Integrations occur through analyst, not enterprise, work

Live Interactive Query (aka BI/DW)

Constantly changing data stored in an enterprise data platform.

Frequency of data change is independent of analyst

Integrations occur primarily through enterprise work

Integrated Advanced Analytic Platform

Access to [custom] advanced analytic algorithms through Tableau

Application of analytic algorithms to new datasets

4

Page 5: Tableau and hadoop

Analyst

Isolated Exploration EnvironmentT

AB

LEAU

AN

DH

AD

OO

P

Visual Exploration Prototype Analytical Applications

5

Metadata Tool

?

Analyst

Tableau SAS

Visual navigation Measures Hierarchies

Statistical profile

Technical & business metadata

?

Tableau

IntegrationsData designVisual organizationGranularity

Isolated Exploration Environment (aka Sandbox)Snapshot of data cached on desktop or serverFrequency of data change is analyst dependentIntegrations occur through analyst, not enterprise, work

Page 6: Tableau and hadoop

Live Interactive QueryT

AB

LEAU

AN

DH

AD

OO

P

Dashboarding Performance Management Analysis

6

Tableau

Visually engagingKPIs Defined analysis paths

Analyst

Define

Developer

Build

Business Leaders

& Staff

Use

Tableau

KPIs Ad hoc analysis pathsDetail records

Analyst

Iterates

Generate

AnalysisRecommendation

Live Interactive Query (aka BI/DW)Constantly changing data stored in an enterprise data platform.Frequency of data change is independent of analystIntegrations occur primarily through enterprise work

Page 7: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Integrated Advanced Analytic Platform

Enabling a “Clinical Trials” Model for Data Science

7

Phase IModel Discovery

Phase IIConfirmation

Phase IIIPilot

Phase IVRollout

Data Science Team(Centralized)

Data Analysts(Decentralized)

Select Business Leaders

Staff or Customers

AllBusiness Leaders

Staff or Customers

• Appropriate modeling technique

• Rapid iterations• Tool & algorithm

variety

• Confirm value• Wider application• Tool & data

conformity• Demo business value• Demo feasibility

• Realized value• Refine through

application

Tableau

Integrated Advanced Analytic PlatformAccess to [custom] advanced analytic algorithms through TableauApplication of analytic algorithms to new datasets

Page 8: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Analytic Capabilities & Hadoop

Architecture Pattern

Capability Suitable for Hadoop / Considerations

Isolated Exploration Environment

Visual Exploration

Possibly• Dataset has limited joins • Dataset is large enough to warrant Hadoop as the

“cache”

Prototype Analytical Apps

No• Too many joins typically required for a prototype• Prototypes can be confirmed on data subsets

Live Interactive Query

Dashboards No• Too many concurrent users • Response time requirements are too stringent

PerformanceMgmt Analysis

Possibly• Dataset has limited joins • Dataset is large enough to warrant Hadoop as the

repository

Integrated Advanced Analytic Platform

“Clinical Trial” approach

Yes.• Tableau’s R integration • Hadoop’s UDF, UDAF features

8

Page 9: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Architecture Pattern

Isolated Exploration Environment

9

Tableau

Desktopcache

Private

Data Data analyst

Business Leader

On demand

Enterprise

Data Asset

Extract Interactive

query

Isolated Exploration Environment

Page 10: Tableau and hadoop

Tableau

Server

Enterprise

Data Asset

TA

BLEA

UA

ND

HA

DO

OP

Architecture Pattern

Live Interactive Query

10

cache

cache

Data analyst

Developer

Cached Live Query

Live Query

Live Interactive Query

Tableau

Desktop

Tableau

Browser & Mobile

Page 11: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Architecture Pattern

Integrated Advanced Analytic Platform

11

Enterprise

Data Asset

Analytic

Workbench

M

M

M

M

Live Query

Live Query via

SQL extensions

& R integration

python, R,

SAS, …

Data analyst

Data scientist

Interactive Advanced

Analytic Platform

cache

Analytic ModelM

References:http://www.tableausoftware.com/about/blog/tableau-and-marklogichttp://developer.marklogic.com/blog/the-art-of-the-possible-marklogic-tableau-publichttps://cwiki.apache.org/confluence/display/Hive/HivePlugins

SQL Extension Examples

MarkLogic SPARQLSELECT name, affiliation

FROM emails

WHERE subject MATCH “answer”

HiveQLSELECT my_function(…),

sum(freq)

FROM myDataTable;

Tableau

Server

Page 12: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Architecture Pattern

Integrated Advanced Analytic Platform

12

References:https://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/ http://cran.r-project.org/src/contrib/Archive/sentiment/ http://kb.tableausoftware.com/articles/knowledgebase/r-implementation-noteshttp://www.tableausoftware.com/about/blog/2013/10/tableau-81-and-r-25327

Enterprise

Data Asset

Analytic

Workbench

M

M

M

M

Live Query

Live Query via

SQL extensions

& R integration

python, R,

SAS, …

Data analyst

Data scientist

Interactive Advanced

Analytic Platform

cache

Analytic ModelM

Tableau

Server

R integration example

Install R package called sentimentCall classify_polarity R function using SCRIPT_STR function

Page 13: Tableau and hadoop

Live Interactive Query

Interactive Advanced

Analytic Platform

Tableau

Browser & Mobile

Tableau

Desktop

TA

BLEA

UA

ND

HA

DO

OP

Consolidated Architecture

13

Tableau

Desktopcache

Private

Data

Data analyst

Business Leader

On demand

Enterprise

Data AssetExtract Interactive

query

Isolated Exploration Environment

W WTableau

Server

Tableau

Servercache

Data analyst

DeveloperCached Live Query

Live Query

Analytic

Workbench

M

M

M

M

Live Query

Live Query via

SQL extensions

& R integration

python, R,

SAS, …

Data analyst

Data scientist

cache

Page 14: Tableau and hadoop

TA

BLEA

UA

ND

HA

DO

OP

Summary, Q&A

– Thank you –

Contact Information

Craig Jordan

LinkedIn: www.linkedin.com/in/crjordan/

Email: [email protected]

15