27
High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica Ajay Gandhi, VP Cloud Product Marketing Nicolas Brisoux, Sr. Cloud Platform Specialist Roderick Clemente, Product Specialist July 10 th , 2014

High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Embed Size (px)

DESCRIPTION

Companies are increasingly dealing with large data sets and looking for ways to increase the scale and lower the cost of Big Data analysis with AWS. In this interactive session, you’ll learn how to: * Integrate massive data volumes, from any on-premises or cloud data sources into AWS with Informatica’s high performance cloud integration connectors and Vibe Secure Agent technology. * Transform and load data into RDS, Redshift, and S3 without the need for coding. * Automate streaming data collection into Kinesis with built-in high availability and failover features.

Citation preview

Page 1: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Ajay Gandhi, VP Cloud Product Marketing

Nicolas Brisoux, Sr. Cloud Platform Specialist

Roderick Clemente, Product Specialist

July 10th, 2014

Page 2: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Why Are Customers Adopting Cloud and AWS?

1.

Cost savings

through economics

of scale

Don’t have to

guess on capacity

3.

Agility, Speed to

market & Flexibility

4.

Global in minutes

5.

2.

Trade capital

expense for

variable expense

Security and

Compliance

6.

Page 3: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

3

So, How Do You Try Redshift – Quickly & Easily?

Amazon Redshift

Page 4: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

4

Amazon Redshift

ERP, CRM Apps

Files

Legacy, RDBMS

Firewall

Logs, JSONs, Social

SaaS Apps

Use New Cloud & Traditional Data Sources

Page 5: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

5

How To Manage Integration In This New World?

Amazon Redshift

ERP, CRM Apps

Files

Legacy, RDBMS

Firewall

Experiment.

Prototype.

Repeat.

Logs, JSONs, Social

SaaS Apps

Page 6: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

AWS RDS Staging, Redshift DW, Infa Cloud

ERP, CRM Apps

Files

Legacy, RDBMS

Amazon

RDS

Logs, JSONs, Social

SaaS Apps

Experiment.

Prototype.

Repeat.

Amazon

Redshift

Page 7: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Map Once. Deploy Anywhere.

ON PREMISE HADOOP 3rd PARTY

APPLICATIONS

CLOUD

Page 8: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

AWS EMR (Hadoop) and DynamoDB (NoSQL)

ERP, CRM Apps

Files

Legacy, RDBMS

Amazon

RDS

Amazon

Redshift

Amazon

EMR

Logs, JSONs, Social

SaaS Apps

Dynamo

DB

Page 9: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Growth Path to Hybrid Data Warehouse

ERP, CRM Apps

Files

Legacy, RDBMS

Amazon

RDS

Amazon

Redshift

Amazon

EMR

Logs, JSONs, Social

SaaS Apps

Dynamo

DB

Traditional

Staging

DB

Traditional

Data

Warehouse

Page 10: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Informatica Cloud - Get it right. Go live. Grow flexibly.

Cloud

Data Integration

Cloud

Real-time

Integration

Cloud Test

Data

Management

Cloud

Data

Quality

Cloud Master

Data

Management

Secure

Development DataLeverage Existing

Bulk Data

Cleanse and

De-dupe Data

Consolidate and

Visualize Data

Instant Access to

Actionable Data

“The Informatica Cloud Platform is the only complete solution for cloud integration and data management

that allows SaaS application administrators, architects, and developers to easily power optimal processes

connected with enterprise-ready data across cloud, on-premises, big data, social, and mobile environments.”

Page 11: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Hundreds of Connectors

JDBC

Page 12: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Technical Innovations for AWS Data Loading

• Out-of-the-box integration for S3, DynamoDB, Kinesis, Redshift and

RDS available NOW!

• Agile data loading for cloud data warehousing with Redshift

• Create target using cloud designer and multiple source objects

• High performance parallel data loading architecture

• E.g. load data in parallel across all 32 nodes in a Redshift cluster

• Push down optimization for increased throughput

• Push data transformations down to optimal source/target database engine

©2013 Informatica. Proprietary and Confidential 12

Page 13: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Loading data into REDSHIFT,

DYNAMODB and RDS

Page 14: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

2

Informatica Cloud Architecture Overview- Redshift

4Secure

Agent

Your Company or VPC

Amazon

Redshift

1

Amazon

RDSAmazon S3 Amazon

DynamoDB

3

Page 15: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Informatica Cloud Amazon Redshift Architecture

Firewall

Informatica Cloud Secure Agent

Metadata Mappings

Build mapping and execute job

1

1Retrieve Account Data2

2

3 Put Account Data into Flat File

4 Transfer compressed Flat File to S3

5 Initiate copy from S3

6 Load data into Amazon Redshift

6

3

54

Amazon S3 Amazon Redshift

Page 16: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

REDSHIFT and RDS DEMO!

Page 17: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

REDSHIFT and DYNAMODB DEMO!

Page 18: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Loading data into KINESIS

Page 19: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

1 0 1010

1 0 1010

1 0 1010

1 0 1010

1 0 1010

1 0 1010

KINESIS

IoT: Operational Intelligence

Page 20: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Machine device,cloud

Social media, webLogs

Social media, webLogs

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

aws

amazonkinesis

Page 21: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Machine device,cloud

Social media, webLogs

Social media, webLogs

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

Documents andfiles

pdf DOC XLS EDI

Machine device,cloud

Social media, webLogs

aws

amazonkinesis

Page 22: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Streaming Collection: Vibe Data Stream

VD

S

VD

S

VD

S

• Central Monitoring Console for

Deployment

• Fault Tolerant

• High Availability

• Vertical &

Horizontal

Scaling

• Ease of

Configuration

Industrial Systems

IoT devices

Social media, webLogs

aws

amazonkinesis

HVAC

Page 23: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

KINESIS DEMO!

Page 24: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Try it today:community.informatica.com/solutions/

vibe_data_stream_for_kinesis

Page 25: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Next Steps

• Visit us at Booth# 107 to

see more demos

• Try our 60-Day free trial

for Redshift

• www.informaticacloud.com

/cloud-trial-for-redshift

26

Page 26: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica

Q & A

InformaticaCloud.com

Page 27: High Performance Big Data Loading for AWS: Deep Dive and Best Practices from Informatica