High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Preview:

DESCRIPTION

Companies are increasingly dealing with large data sets and looking for ways to increase the scale and lower the cost of Big Data analysis with AWS. In this interactive session, you’ll learn how to: * Integrate massive data volumes, from any on-premises or cloud data sources into AWS with Informatica’s high performance cloud integration connectors and Vibe Secure Agent technology. * Transform and load data into RDS, Redshift, and S3 without the need for coding. * Automate streaming data collection into Kinesis with built-in high availability and failover features.

Citation preview

High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Ajay Gandhi, VP Cloud Product Marketing

Nicolas Brisoux, Sr. Cloud Platform Specialist

Roderick Clemente, Product Specialist

July 10th, 2014

Why Are Customers Adopting Cloud and AWS?

1.

Cost savings

through economics

of scale

Don’t have to

guess on capacity

3.

Agility, Speed to

market & Flexibility

4.

Global in minutes

5.

2.

Trade capital

expense for

variable expense

Security and

Compliance

6.

3

So, How Do You Try Redshift – Quickly & Easily?

Amazon Redshift

4

Amazon Redshift

ERP, CRM Apps

Files

Legacy, RDBMS

Firewall

Logs, JSONs, Social

SaaS Apps

Use New Cloud & Traditional Data Sources

5

How To Manage Integration In This New World?

Amazon Redshift

ERP, CRM Apps

Files

Legacy, RDBMS

Firewall

Experiment.

Prototype.

Repeat.

Logs, JSONs, Social

SaaS Apps

AWS RDS Staging, Redshift DW, Infa Cloud

ERP, CRM Apps

Files

Legacy, RDBMS

Amazon

RDS

Logs, JSONs, Social

SaaS Apps

Experiment.

Prototype.

Repeat.

Amazon

Redshift

Map Once. Deploy Anywhere.

ON PREMISE HADOOP 3rd PARTY

APPLICATIONS

CLOUD

AWS EMR (Hadoop) and DynamoDB (NoSQL)

ERP, CRM Apps

Files

Legacy, RDBMS

Amazon

RDS

Amazon

Redshift

Amazon

EMR

Logs, JSONs, Social

SaaS Apps

Dynamo

DB

Growth Path to Hybrid Data Warehouse

ERP, CRM Apps

Files

Legacy, RDBMS

Amazon

RDS

Amazon

Redshift

Amazon

EMR

Logs, JSONs, Social

SaaS Apps

Dynamo

DB

Traditional

Staging

DB

Traditional

Data

Warehouse

Informatica Cloud - Get it right. Go live. Grow flexibly.

Cloud

Data Integration

Cloud

Real-time

Integration

Cloud Test

Data

Management

Cloud

Data

Quality

Cloud Master

Data

Management

Secure

Development DataLeverage Existing

Bulk Data

Cleanse and

De-dupe Data

Consolidate and

Visualize Data

Instant Access to

Actionable Data

“The Informatica Cloud Platform is the only complete solution for cloud integration and data management

that allows SaaS application administrators, architects, and developers to easily power optimal processes

connected with enterprise-ready data across cloud, on-premises, big data, social, and mobile environments.”

Hundreds of Connectors

JDBC

Technical Innovations for AWS Data Loading

• Out-of-the-box integration for S3, DynamoDB, Kinesis, Redshift and

RDS available NOW!

• Agile data loading for cloud data warehousing with Redshift

• Create target using cloud designer and multiple source objects

• High performance parallel data loading architecture

• E.g. load data in parallel across all 32 nodes in a Redshift cluster

• Push down optimization for increased throughput

• Push data transformations down to optimal source/target database engine

©2013 Informatica. Proprietary and Confidential 12

Loading data into REDSHIFT,

DYNAMODB and RDS

2

Informatica Cloud Architecture Overview- Redshift

4Secure

Agent

Your Company or VPC

Amazon

Redshift

1

Amazon

RDSAmazon S3 Amazon

DynamoDB

3

Informatica Cloud Amazon Redshift Architecture

Firewall

Informatica Cloud Secure Agent

Metadata Mappings

Build mapping and execute job

1

1Retrieve Account Data2

2

3 Put Account Data into Flat File

4 Transfer compressed Flat File to S3

5 Initiate copy from S3

6 Load data into Amazon Redshift

6

3

54

Amazon S3 Amazon Redshift

REDSHIFT and RDS DEMO!

REDSHIFT and DYNAMODB DEMO!

Loading data into KINESIS

1 0 1010

1 0 1010

1 0 1010

1 0 1010

1 0 1010

1 0 1010

KINESIS

IoT: Operational Intelligence

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Machine device,cloud

Social media, webLogs

Social media, webLogs

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

aws

amazonkinesis

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Machine device,cloud

Social media, webLogs

Social media, webLogs

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

aws

amazonkinesis

Streaming Collection: Vibe Data Stream

VD

S

VD

S

VD

S

• Central Monitoring Console for

Deployment

• Fault Tolerant

• High Availability

• Vertical &

Horizontal

Scaling

• Ease of

Configuration

Industrial Systems

IoT devices

Social media, webLogs

aws

amazonkinesis

HVAC

KINESIS DEMO!

Try it today:community.informatica.com/solutions/

vibe_data_stream_for_kinesis

Next Steps

• Visit us at Booth# 107 to

see more demos

• Try our 60-Day free trial

for Redshift

• www.informaticacloud.com

/cloud-trial-for-redshift

26

Q & A

InformaticaCloud.com