Amazon Elastic MapReducewith Hive/Presto ()2015/5/13Ryosuke IwanagaSolutions Architect, Amazon Data Services Japan
AWS
3
POS
KPI
GB TB PB
ZB
EB
4
AWS
5
M&A
Before
6
After
M&A
IT
7
8
9
10
Amazon EMR
Amazon Elastic MapReduce
12
Amazon Elastic MapReduce
1Hadoop
Application: Hive, Hue, Impala Bootstrap Action: Presto, Spark
13
Task Node
Task Instance Group
Amazon EMR
security group
security group
Master Node
Master Instance Group
Amazon S3DynamoDB
Amazon Kinesis
Core Node
Core Instance Group
HDFS HDFS
HDFS HDFS
Task Node
Task Instance Group
HDFS
AWS
14
Amazon EMR Master Instance Group
Master Node1 Failover
NameNodeJobTracker
Core NodeTask Node
Master Node
Master Instance Group
Hadoop1: JobTracker Hadoop2: ResourceManager HDFS: NameNode Hive: HiveServer, MetaStore Presto: Coordinator
15
Amazon EMR Core Instance Group
1Core Node TaskTracker DataNodeHDFS
HDFS
Core Node
Core Instance Group
Hadoop1: TaskTracker Hadoop2: NodeManager HDFS: DataNode Presto: Worker
16
Amazon EMR Core Instance Group
Core Node HDFS CPU/RAM
HDFS
HDFS
Core Node
Core Instance Group
Hadoop1: TaskTracker Hadoop2: NodeManager HDFS: DataNode Presto: Worker
17
Amazon EMR Task Instance Group
HDFSCore
HDFSCore Node
HDFS
Task Node
Task Instance Group
Hadoop1: TaskTracker Hadoop2: NodeManager Presto: Worker
18
Amazon EMR Task Instance Group
Group Spotbid Instance Type
RISpot
Task Instance Group 2
Task Instance Group 1
c3.xlarge * 2 bid: $0.1
r3.xlarge * 2 bid: $0.5
19
:
1h 1h
>
1
1
20
Spot Instance
Task Instance Group Core Instance Group
SLA SLA
On-demandCore NodeSLAOn-demand
Spot InstanceTask NodeOn-demand90%
21
EMRFS: Amazon S3HDFS
EMR
EMR
Amazon S3
22
EMRFS
s3:// Amazon S3
: Amazon Glacier Amazon S3 Amazon S3
23
EMRFSConsistent View
Amazon S3
EMRFSConsistent View Amazon DynamoDB
Amazon S3 Amazon DynamoDB
EMRFS
24
Amazon EMR
Amazon S3
EMR
t
25
Amazon EMR: Bootstrap Action
Node OK
Bash, Ruby, Python, etc. Amazon S3
AWS
26
Amazon EMR: Step
: ETLHiveQL Amazon S3jar
Streaming, Hive, PigEMR S3
script-runner.jarbash Step(Auto-terminate)
27
Amazon EMR: Application
Application Hive, Pig, Hue, HBase, etc. Bootstrap ActionStep
MapReduce
Bootstrap Action, Step, ApplicationAmazon EMR
28
Amazon EMR
(Spot) Amazon S3
Hive
30
Hive
SQL like
HDFS Amazon EMREMRFSAmazon S3Amazon DynamoDBAmazon Kinesis
MapReduce, Tez, Spark
31
Hive
Metastore
HDFS
Cluster
Amazon S3
Hiveserver CREATE TABLE
SELECT FROM
1: HDFSS3
2:
3:
4:
5:
6: Amazon
DynamoDB
Amazon Kinesis
32
Hive
MapReduce MapReduce
Stinger Initiative = Hive 0.13 TezSQLetc.
Stinger.next = 1 SparkACIDSQL:2011
33
Hive on Amazon EMR
Application
Metastore MasterMySQL
hive-site.xmlMySQL
$ aws emr create-cluster \ --applications Name=Hive \
YARN
35
YARN
YARN = Yet-Another-Resource-Negotiator Hadoop2
JobTracker YARN MapReduce
36
YARN: ResourceManager
CPU, Memory, etc.
37
YARN: NodeManager
RM
Container Container
38
YARN: Container
NM
DefaultContainerExecutor
LinuxContainerExecutor cgroups
DockerContainerExecutor Docker
39
YARN: ApplicationMaster
Container JobTracker 1AM
Container
40
YARN
Mesos, Amazon EC2 Container Service
YARN
YARN Tez: DAGYARN() Twill: YARN Slider: YARN
Tez
42
Tez
YARNDAG
Directed Acyclic Graph (DAG) Vertex():
Map, Reduce, Join Edge():
ProducerConsumer SQLDAG
DAGTez
43
Tez
Hive,Pig,Cascading MR
HDFS
Container Session
ORC File / Parquet
45
Hive
TEXTFILE 1(CSV)
SEQUENCEFILE 1
1SELECT
46
ORC File / Parquet
Optimized Row Columnar(ORC) File Stinger ProjectRCFile
Parquet Twitter/Cloudera GoogleDremel
47
ORC File
Stripe IndexFooter Indexmin/maxRow
Data File
Footer
Hive
48
Parquet
ColumnChunk Thrift
ColumnChunkRow Group
ORC
49
11
Hive 0.13ORC/Parquet
50
HiveORC File/Parquet
CREATE TABLE t ( col1 STRING, ) STORED AS [ORC/PARQUET]; INSERT INTO t (); SELECT col1 FROM t;
51
Hadoop/Hive
SQL
Hadoop2YARN
Hive Amazon EMR
HiveSQL
53
Massively Parallel Processing (MPP) SQL
SQLHive
HiveMPP Presto, Impala, Spark SQL, Drill
DWH Amazon Redshift
MPPAmazon Redshift
Amazon Redshift
54
Hive
Metastoresync/share Hive
Impala, Spark SQLHiveQL Presto, DrillANSI SQL
Metastore
HDFS
Amazon S3
Presto
56
Presto
MPP SQL1 ANSI SQL SQL
YARN
MySQLPostgreSQL JOIN
57
Presto
Coordinator
Worker Coordinator
Amazon S3
58
Presto on Amazon EMR
20154Application Bootstrap Action
AWS https://github.com/awslabs/emr-bootstrap-actions/tree/master/presto/latest
PrestoYARN PrestoAmazon EMR
60
Amazon EMR
AWS Amazon EMR Amazon S3