Upload
vanphuc
View
249
Download
5
Embed Size (px)
Citation preview
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Amazon Redshift, Amazon EMR, Amazon DynamoDB
Yifeng JiangSolutions ArchitectAmazon Data Services Japan
TC-03
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS 40
AWSRegions / Availability Zones / Contents Delivery POPSAZ Region
EC2 ElasticLoad Balancing
Auto Scaling S3 Glacier EBS Storage Gateway RDS DynamoDB ElastiCache Redshift
Kinesis EMR Data Pipeline
CloudFront
Virtual Private Cloud Direct Connect Rout53
WorkSpaces
SQS SNS SES SWF Elastic Transcoder CloudSearch
Management & Administration
CloudWatch CloudTrail IAM Management Console SDK CLI
CloudFormation BeanStalk OpsWorks
EcosystemTechnology Partner / Consulting Partner
4
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion
Storage
Big Data services on AWS
DWH NoSQL DynamoDB Redshift
S3
Glacier
Data Pipeline
RDB
Hadoop
Workflow Management
RDS
Elastic MapReduce
5
Kinesis
AWS
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage
Big Data services on AWS
DWH NoSQL DynamoDB Redshift
S3
Glacier
Data Pipeline
RDB
Hadoop
Workflow Management
RDS
Elastic MapReduce
7
Ingestion Kinesis
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage
Big Data services on AWS
DWH NoSQL DynamoDB Redshift
S3
Glacier
Data Pipeline
RDB
Hadoop
Workflow Management
RDS
Elastic MapReduce
8
Ingestion Kinesis
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
4
1.
2.
3.
4. BI API
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glacier RDS
EMR
EMR
Redshift
DynamoDB
Data Pipeline
S3
Data
ETL
Sum
Web app
Analytics
Dashboard
11
Kinesis
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331,
EMRRedshift
ETLS3
S3
Web
ETL
USER PATH TIMESTAMP -----------------------------------USER1 /login 2014-02-26 00:00:01USER2 /home 2014-02-26 01:13:31
12
BI
1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331,
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
USER1, 20140226000101, USER2, 20140226011226, USER1, 20140226011331,
EMRRedshift
ETL
S3
S3Web
DMP-
13
API
DynamoDBETL
USER1: { Interest: [ Car, Home ], ... }USER2: { Interest: [ Dog, Cat ], }
EMRS3
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
4
1.
2.
3.
4. BI API
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. : 3. Getting Started with Big Data Services
Amazon DynamoDBAmazon Elastic MapReduceAmazon Redshift
4. Practical Deep Dive
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
Data Warehouse as a Service
160GB1.6PB (MPP)
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
18
PostgreSQL(psql)
BI
JDBC/ODBC
10GigE Mesh
SQL :
N
S3, DynamoDB, EMR
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDBMS Redshiftorderid name price
1 Book 100
2 Pen 50
n Eraser 70
orderid name price
1 Book 100
2 Pen 50
n Eraser 70
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20
Amazon Redshift dw2.large:
CPU: 2 virtual cores ECU: 7 Memory: 15 GiB Storage: 160GB(SSD) Network: 0.2GB/s
dw2.8xlarge CPU: 32 virtual cores ECU: 104 Memory: 244 GiB Storage: 2.56TB(SSD) Network: 3.7GB/s
" dw1.xlarge: CPU: 2 virtual cores ECU: 4.4 Memory: 15 GiB Storage: 2TB(HDD) Network: 0.3GB/s
" dw1.8xlarge CPU: 16 virtual cores ECU: 35 Memory: 120 GiB Storage: 16TB(SSD) Network: 2.4GB/s
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3CSV
21
Redshiftpsql -d mydb h YOUR_REDSHIFT_ENDPOINT -p 5439 -U awsuser -W !
!COPY customer FROM 's3://data/customer.tbl. CREDENTIALS aws_access_key_id=KEY;aws_secret_access_key=SEC DELIMITER , GZIP TIME_FORMAT auto; !
Redshiftcopy
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
22
RedshiftCREATE TABLE nginx ( ! remote_addr char(15), ! time timestamp, ! request varchar(255), ! status integer, ! bytes bigint, ! ua varchar!); !
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SQL
23
SELECT ua, request, COUNT(*) !FROM nginx!GROUP BY ua, request; !
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
24
UNLOAD TO s3://YOUR_BUCKET/PATH/ !SELECT * FROM nginx; !
Tableau + Redshift
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
SQL RDB ETL
,
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic MapReduce
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop
Elastic MapReduce
AWSHadoop Hadoop12MapR
CloudWatch S3
S3DynamoDB
Hadoop
Hadoop
28
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop
29
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 31
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
AMIHadoop
Hadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 33
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 34
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 35
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
VPC
Hadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
37
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !--steps Type=HIVE,Name='Hive program, Args=[-f,s3://PATH/TO/QUERY.q] \ !--auto-terminate !
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
cronData PipelineHadoop
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
HDFSS3 INPUTOUTPUTs3://
40
hadoop jar YOUR_JAR.jar \ !--src s3://YOUR_BUCKET/logs/ \ !--dest s3://YOUR_BUCKET/output/ !
hadoop jar YOUR_JAR.jar \ !--src s3://YOUR_BUCKET/logs/ \ !--desct hdfs:///output/ !
S3S3
S3HDFS
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hive
CREATE EXTERNAL TABLE s3_as_external_table( !"user_id INT, !"movie_id INT, !"rating INT, !"unixtime STRING ) !
ROW FORMAT DELIMITED FIELDS !TERMINATED BY '\t' !STORED AS TEXTFILE !LOCATION 's3://mybucket/tables/'; !
41
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HiveETL
INSERT INTO TABLE table2 !SELECT ! column1, ! column2, ! column5, !FROM table1; ! !
42
table1column3,4
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EMR Deep DiveRedshiftEMR
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
RDB SQL BI
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic MapReduce
Hadoop MapReduce, Hive, Pig, Hadoop StreamingHadoop
hiveSQLRedshift
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EMRRedshift
SQL/Redshift
EMR
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic MapReduce
Hadoop
S3
ETL
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
NoSQL as a Service
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SPOF 3
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ReadWrite
Read : 1,000 Write : 100
Read : 500 Write : 1,000
DB
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
API
SDK
HTTPAPI
Database
Client SideService Side
Client application
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
1. KeyIndex-
2. Read/Write
Thats it, write your code!
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
Hash key
Hash key & Range key
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IDKVS
UserIdItem
UserId (Hash)
Name Nicknames Mail Address Interests
aed9d Bob [ Rob, Bobby ] [email protected] some address [ Car, Motor Cycle]
edfg12 Alice [ Allie ]
a8eesd Carol [ Caroline ]
f42aed Dan [ Daniel, Danny ]
Users Table
DynamoDBauto_incrementID UUID
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
&
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User (Hash)
Timestamp (Range)
Opponent Result
Alice 2014-02-21 12:21:20 Bob Lost
Alice 2014-02-21 12:42:01 Bob Won
Alice 2014-02-24 09:48:00 Dan Won
Alice 2014-02-25 16:21:11 Charlie Won
Battle History
User(Alice)Timestamp7
Charlie 02-25 16:21
Won!
Your Battle History
Dan 02-24 09:48
Won!
Alice 02-21 12:42
Won!
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
+1
Local Secondary Index Range key Hash key
Global Secondary Index Hash Key
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
NoSQL RedshiftOLTP SQLJOIN
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. : 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSS3
64
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elastic MapReduce
DynamoDB Redshift
S3
S3
65
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3EMR
Hadoop
66
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3EMR
S3
HDFSS3
67
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3EMR
S3
S3
68
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3Redshift
RedshiftS3
COPY table_name FROM s3://hogeCREDENTIALS access_key_id:hogeDELIMITER ,
RedshiftS3
69
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3Redshift
RedshiftS3
RedshiftS3
UNLOAD (SELECT * FROM)TO s3://fuga/.CREDENTIALS access_key_id:hoge;
70
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3Redshift
S3
RedshiftS3
71
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3DynamoDB
DynamoDB
72
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3DynamoDB
S3
DynamoDBS3
73
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Elastic MapReduce
Redshift
EC2
RDS
Storage Gatewa
y
EBS
Redshift
CloudFront
GW
Storage Gateway
Elastic Transcoder
Glacier
Data Pipeline
S3
74
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
DynamoDB
RDS
EMR
EC2
Redshift
DynamoDB
RDS
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.