5
Data Ingestion Platform (DiP) Co-Dev opportunity to ingest any data in near real time www.xavient.com

Xavient - DiP

Embed Size (px)

Citation preview

Page 2: Xavient - DiP

www.xavient.comXavient Data Ingestion Platform (DiP)

Introduction

When numerous big data sources exist in diverse formats (the sources may often number in the hundreds and the formats in the dozens), it can be challenging for businesses to ingest data at a reasonable speed and process it efficiently in order to maintain a competitive advantage. To that end, vendors offer software programs that are tailored to specific computing environments or software applications. When data ingestion is automated, the software used to carry out the process may also include data preparation features to structure and organize data so it can be analyzed on the fly or at a later time by business intelligence (BI) and business analytics (BA) programs.

Data Ingestion Platform (DiP) is a system to ingest data into Big Data systems. Data can be streamed in real time or ingested in batches. When data is ingested in real time, each data item is imported as it is emitted by the source. When data is ingested in batches, data items are imported in discrete chunks at periodic intervals of time. An effective data ingestion process begins by prioritizing data sources, validating

individual files and routing data items to the correct destination.

* This is a co-dev opportunity and provides initial baselines and

access to Big Data experts to enhance it further to meet the business requirements

“Every business is an analytics business, every business process is an analytics process, and every business user is an analytics user”

- Gartner

Challenges Faced

Business want to get data from various sources into Hadoop or NoSql databases for faster access in near real time. There is need for a platform that can help to build a scalable and fault tolerant data pipeline.

This system should allow to run the following:

High Speed Filtering and

Pattern Matching

Contextual Enrichment

on the Fly

Real-time KPIs, Analytics, Baselining

and Notification

Predictive Analytics

Actions and Decisions

2 |

Page 3: Xavient - DiP

www.xavient.com Xavient Data Ingestion Platform (DiP)3 |

Data Ingestion Platform (DiP)

Real time data ingestion using Data Ingestion Platform (DiP) harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to stream data into lambda architecture. Apache Kafka plays a key role as messaging bus from source to streaming component.

DiP comes along with a UI in case users wants to upload data from their desktops and also, any data can be ingested from any source like Cloud Storage or local file system. UI plays a key role in learning and choosing the streaming components in the initial phase of understanding the system.

DiP Technology Stack

• Source System – Web Client• Messaging System – Apache Kafka• Target System – HDFS, Apache HBase, Apache Hive• Reporting System – Apache Phoenix(CLI), Apache

Zeppelin• Streaming API’s – Apache Apex, Apache Flink,

Apache Spark and Apache Storm• Programming Language – Java• IDE – Eclipse• Build tool – Apache Maven• Operating System – CentOS 7

DiP Features

Any data source

Any data type

Easy to use UI

Data Visualization

High Level API’s

Java, Scala, Client bindings

Architecture

• Flume / Client UI ingests data to Kafka Queues

• Platform picks data from subscribed Kafka topics

• Four streaming APIs : Apex Streaming, Flink Streaming, Spark Streaming, Storm Streaming (Windowed Aggregations to MySQL)

• Process it in real time or micro-batching : HBase, HDFS (External tables on Hive tables), Phoenix views on Zeppelin

G

U

I

XML

JSON

CSV

TXT

K

A

F

K

A

B

R

O

K

E

R

HBASE

HDFS

Hive External

tables

Phoenix

Reporting

Zeppelin

Kafka Operator

Classifier Operator

File Operator

HBaseOperator

Apex Streaming

Kafka Source

Map Data

HDFS Sink

HBaseSink

Flink Streaming

Kafka Stream

Spark Streaming

Spark Executers

Kafka Spout

Storm TopologyHDFS bolt

HBASE bolt

Filter bolt

Data Ingestion Platform

Page 4: Xavient - DiP

www.xavient.comXavient Data Ingestion Platform (DiP)4 |

DiP comes with an easy to use UI that offers the following features –• Switch easily between the supported streaming engines just by clicking on a radio button.• Supports xml, json and tsv data formats• Use text area to enter data manually for getting processed• Process files for batch processing by simply uploading them

DiP User Interface (Co-Dev)

Use Cases

Sentiment Analysis

Click Stream

Analysis

Log Analysis

Social Media and Customer Sentiment

Analyze Machine

and Sensor Data

Page 5: Xavient - DiP

www.xavient.com Xavient Data Ingestion Platform (DiP)5 |

Great Ideas… Simple Solutions is what Xavient thrives on. As a global IT consultingand software services company, we focus on transforming business ideas intoeffective solutions.

Founded in 2002, the company is led by a passionate team of experts who come witha history of entrepreneurial and management success. Xavient is headquartered inthe U.S with an international network of delivery centers primarily established inIndia.

About Xavient

• Enabled one of the largest Billing Transformation initiative in North America

• Powered one of the largest OTT platform for video-on-demand services

• Designed one of the most engaging high touch - high performance Retail UI/UX

• Proven expertise & unflinching focus on Digital Media & Communication space for over 14 years

• Partner of choice for 4 out of Top 5 CSPs in the US

• Developed the Live Streaming solution for a Weather channel supporting next generation internet connected devices