31
HDF Powered by Apache NiFi Intro Milind Pandit Solutions Engineer

HDF Powered by Apache NiFi Introduction

Embed Size (px)

Citation preview

Page 1: HDF Powered by Apache NiFi Introduction

HDF Powered by Apache NiFi Intro

Milind PanditSolutions Engineer

Page 2: HDF Powered by Apache NiFi Introduction

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda HDF 2.0: Flow Management– NiFi basics– NiFi use cases– NiFi demos

Page 3: HDF Powered by Apache NiFi Introduction

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simplistic View of Enterprise Data Flow

Data Flow

Process and Analyze DataAcquire Data

Store Data

Page 4: HDF Powered by Apache NiFi Introduction

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Interacting with different business partners and customers

Realistic View of Enterprise Data Flow

Page 5: HDF Powered by Apache NiFi Introduction

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connected Data Platforms

Page 6: HDF Powered by Apache NiFi Introduction

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Processing

Flow Management

Enterprise Services

At the edge

Secu

rity

Visu

aliza

tion

On premises In the cloud

Registries/Catalogs Governance (Security/Compliance) Operations

HDF 2.0 – Data in Motion Platform

Page 7: HDF Powered by Apache NiFi Introduction

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow (HDF)

Constrained High-latency Localized context

Hybrid – cloud/on-premises Low-latency Global context

SOURCES REGIONAL INFRASTRUCTURE

CORE INFRASTRUCTURE

Page 8: HDF Powered by Apache NiFi Introduction

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control

• Ensures trust of your dataData Lineage (Provenance)

• Because not all data is of equal importanceData Prioritization

• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure

• Adapt to different situations with different requirementsControl Latency vs Throughput

• Security of data, and data accessSecure Control Plane/Data Plane

• ScalabilityScale out Clustering

• Ecosystem flexibility and growthExtensibility

Apache NiFi: Designed for 8 challenges of global enterprise dataflow

Page 9: HDF Powered by Apache NiFi Introduction

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi: Three key concepts

• Manage the flow of information

• Data Provenance

• Secure the control plane and data plane

Page 10: HDF Powered by Apache NiFi Introduction

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi – Key Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance

• Recovery/recording a rolling log of fine-grained history

• Visual command and control• Flow templates• Pluggable/multi-role security• Designed for extension• Clustering

Page 11: HDF Powered by Apache NiFi Introduction

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Common Apache NiFi Use Cases

Predictive AnalyticsEnsure the highest value data is captured and available for analysisComplianceGain full transparency into provenance and flow of data

IoT OptimizationSecure, Prioritize, Enrich and Trace data at the edge

Fraud DetectionMove sales transaction data in real time to analyze on demand

Big Data IngestEasily and efficiently ingest data into Hadoop

Value ResourcesGain visibility into how data sources are used to determine value

Page 12: HDF Powered by Apache NiFi Introduction

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What is Apache NiFi used for?• Reliable and secure transfer of data between systems• Delivery of data from sources to analytic platforms• Enrichment and preparation of data:

– Conversion between formats– Extraction/Parsing– Routing decisions

What is Apache NiFi NOT used for?• Distributed Computation• Complex Event Processing• Joins / Complex Rolling Window Operations

Use Cases for Apache NiFi

Page 13: HDF Powered by Apache NiFi Introduction

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFile• Unit of data moving through the system• Content + Attributes (key/value pairs)

Processor• Performs the work, can access FlowFiles

Connection• Links between processors• Queues that can be dynamically prioritized

Terminology

Page 14: HDF Powered by Apache NiFi Introduction

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HTTP Data FlowFile

HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTContent-Type: text/html

Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX

Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'Key: 'filename’ Value: '15650246997242'Key: 'path’ Value: './’

0101010101110101010101010101 (Binary)

Header

Content

Analogy: FlowFiles are like HTTP Data

Page 15: HDF Powered by Apache NiFi Introduction

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

1. Drag and drop processors to build a flow2. Start, stop, and configure components in real time3. View errors and corresponding error messages4. View statistics and health of data flow5. Create templates of common processor & connections

Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time

Page 16: HDF Powered by Apache NiFi Introduction

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka, Deliver to HDFS

Page 17: HDF Powered by Apache NiFi Introduction

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What is Data Provenance and Why is it Important?

BEGIN

ENDLINEAGE

IT and Cloud Operators• Understand traceability, lineage• Enable recovery and replay

Compliance Regulations• Provide an audit trail• Remediation capabilities

Page 18: HDF Powered by Apache NiFi Introduction

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Provenance Enables Easy Access and Traceability of Changes

Page 19: HDF Powered by Apache NiFi Introduction

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Need Fine-Grained Security and Compliance?

Security• Secured authentication• Enterprise authorization services –

entitlements change often• Encrypted content, encrypted

communications• People and systems with different roles

require difference access levels• Tagged/classified data

Page 20: HDF Powered by Apache NiFi Introduction

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Repositories - Pass by reference

Page 21: HDF Powered by Apache NiFi Introduction

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Repositories – Copy on Write

Page 22: HDF Powered by Apache NiFi Introduction

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture

Page 23: HDF Powered by Apache NiFi Introduction

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Edge Intelligence with Apache MiNiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension

Different from Apache NiFi Design and Deploy Warm re-deploys

Key Features

Page 24: HDF Powered by Apache NiFi Introduction

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs. MiNiFi Java Agent

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

Page 25: HDF Powered by Apache NiFi Introduction

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Company X provides alerting services when users’ resting heart rate higher than a threshold

Real-Time Insights Require DataFlow Mgmt and Stream Processing

Acquire Data

Company X Cloud Instance 1

Acquire Data

Company X Cloud Instance 2

Acquire Data

Company X Cloud Instance 3

Acquire Data Across Cloud

Instances

Parse, Filter, Validate, Enrich

and Route

Core Data Center

Analytics/Pattern Match

Data Store

Alerts

Dashboards/Visualization

Flow Management Stream ProcessingLegend:

Page 26: HDF Powered by Apache NiFi Introduction

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data in Motion Needs Dataflow Management and Stream Processing

Acquire data from various Wearable Device’s Cloud Instances

Move Data from Customer Cloud Instances to on-premise instance

Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.

Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.

Parse the device data to standardized format that downstream sysem can understand

Enrich the data with contextual information including patient/customer info (age, sex, etc..)

Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.

Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.

Flow Management (NiFi, MiNiFi)

StreamProcessing

(Storm, Kafka)

Page 27: HDF Powered by Apache NiFi Introduction

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data in Motion(Cloud)

Data in Motion

(on-premises)

Data at Rest

(on-premises)

Edge Data

Data in Motion

Edge Analytics

Data at Rest

(Cloud)

Edge Data

Data at Rest

(on-premises)

Closed Loop Analytics

MachineLearning

Deep HistoricalAnalysis

The Future of DataArchitectural Transformation Enabled By Connected Data Platforms

On PremCloud

Page 28: HDF Powered by Apache NiFi Introduction

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Cases for Data in Motion

Use Cases for Data-in-Motion Using DataFlow Mgmt• Data Ingestion • Edge Intelligence• First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich,

Transform, etc.

When Only DataFlow Management is

Required

Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing• Flow Management to deliver data for Stream Processing• PLUS: Complex pattern matching on unbounded streams of

data.

When Both DataFlow Management and Stream Processing

Page 29: HDF Powered by Apache NiFi Introduction

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Flow management

D A T A I N M O T I O N D A T A A T R E S T

IoT Data Sources AWSAzure

Google CloudHadoop

NiFiKafka

Storm

Others…NiFi

NiFi NiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

NiFi

HDF 2.0: Data-in-Motion Platform

Enterprise Services

Ambari Ranger Other services

Flow management + Stream Processing

Page 30: HDF Powered by Apache NiFi Introduction

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

New Stream Processing Features HDF 2.0

New Storm Connectors Storm-Kafka Spout using new

client APIs Storm Distributed Log Search Storm Dynamic Worker

Profiling Kafka Grafana Integration Storm Grafana Integration

Improved Nimbus HA Storm Automatic Back

Pressure Storm Distributed cache Storm Windowing and State

Management Storm Performance

improvements Improved Kafka SASL

Storm Topology Event inspector Storm Resource Aware

Scheduling Storm Dynamic Log Levels Pacemaker Storm Daemon Kafka Rack Awareness

Developer Productivity Enterprise Readiness Operational Simplicity

Page 31: HDF Powered by Apache NiFi Introduction

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

More Information, Resources

Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com

Partnerworks: http://hortonworks.com/partners/

HDF Certification: http://hortonworks.com/partners/product-integration-certification/

Webinars: http://hortonworks.com/events-webcasts/

Sandbox: http://hortonworks.com/events-webcasts/

HDF: http://hortonworks.com/hdf/

HDP: http://hortonworks.com/hdp/