34
Simplified Cluster Operation & Troubleshooting Alejandro Fernandez + Jayush Luniya

Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Embed Size (px)

Citation preview

Page 1: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Simplified Cluster Operation & Troubleshooting

Alejandro Fernandez + Jayush Luniya

Page 2: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Speakers

Alejandro Fernandez

Sr. Software Engineer @ Hortonworks

Apache Ambari PMC

[email protected]

Jayush Luniya

Staff Engineer @ Hortonworks

Apache Ambari PMC

[email protected]

Page 3: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

What is Apache Ambari?

Apache Ambari is the open-source platform to

provision, manage and monitor Hadoop clusters

Page 4: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

New Enterprise Features

Ambari 2.4

• New Services: Log Search, Zeppelin, Hive LLAP

• Role Based Access Control

• Management Packs

• Grafana UI for Ambari Metrics System

• New Views: Zeppelin, Storm

Page 5: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Apache Ambari Jiras

April

2015

16901864

277

379

797

206

488

July - Sept

2015

Dec 2015 –

Feb 2016Today

v2.0

v2.1

v2.2v2.4

1542 and

growing

Page 6: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Deploy

Secure/LDAP

Smart Configs

Monitor

Upgrade

Scale, Extend, Analyze

Simply Operations - Lifecycle

Ease-of-Use Deploy

Page 7: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Deploy On Premise

Ambari UI wizard handles all of these

combinations and makes recommendations

based on host specs.

Page 8: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Deploy On The Cloud

Certified environments

Sysprepped VMs

Hundreds of similar clusters

Page 9: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Deploy with Blueprints

• Systematic way of defining a cluster

• Export existing cluster into blueprint/api/v1/clusters/:clusterName?format=blueprint

Configs Topology Hosts Cluster

Page 10: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Create a cluster with Blueprints{

"configurations" : [

{

"hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1,

/hadoop/2,/hadoop/3"

}

}

],

"host_groups" : [

{

"name" : "master-host",

"components" : [

{ "name" : "NAMENODE” },

{ "name" : "RESOURCEMANAGER” },

],

"cardinality" : "1"

},

{

"name" : "worker-host",

"components" : [

{ "name" : "DATANODE" },

{ "name" : "NODEMANAGER” },

],

"cardinality" : "1+"

},

],

"Blueprints" : {

"stack_name" : "HDP",

"stack_version" : "2.5"

}

}

{

"blueprint" : "my-blueprint",

"host_groups" :[

{

"name" : "master-host",

"hosts" : [

{

"fqdn" : "master001.ambari.apache.org"

}

]

},

{

"name" : "worker-host",

"hosts" : [

{

"fqdn" : "worker001.ambari.apache.org"

},

{

"fqdn" : "worker002.ambari.apache.org"

},

{

"fqdn" : "worker099.ambari.apache.org"

}

]

}

]

}

1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

Page 11: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Create a cluster with Blueprints{

"configurations" : [

{

"hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1,

/hadoop/2,/hadoop/3"

}

}

],

"host_groups" : [

{

"name" : "master-host",

"components" : [

{ "name" : "NAMENODE” },

{ "name" : "RESOURCEMANAGER” },

],

"cardinality" : "1"

},

{

"name" : "worker-host",

"components" : [

{ "name" : "DATANODE" },

{ "name" : "NODEMANAGER” },

],

"cardinality" : "1+"

},

],

"Blueprints" : {

"stack_name" : "HDP",

"stack_version" : "2.5"

}

}

{

"blueprint" : "my-blueprint",

"host_groups" :[

{

"name" : "master-host",

"hosts" : [

{

"fqdn" : "master001.ambari.apache.org"

}

]

},

{

"name" : "worker-host",

"hosts" : [

{

"fqdn" : "worker001.ambari.apache.org"

},

{

"fqdn" : "worker002.ambari.apache.org"

},

{

"fqdn" : "worker099.ambari.apache.org"

}

]

}

]

}

1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

Page 12: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Create a cluster with Blueprints{

"configurations" : [

{

"hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1,

/hadoop/2,/hadoop/3"

}

}

],

"host_groups" : [

{

"name" : "master-host",

"components" : [

{ "name" : "NAMENODE” },

{ "name" : "RESOURCEMANAGER” },

],

"cardinality" : "1"

},

{

"name" : "worker-host",

"components" : [

{ "name" : "DATANODE" },

{ "name" : "NODEMANAGER” },

],

"cardinality" : "1+"

},

],

"Blueprints" : {

"stack_name" : "HDP",

"stack_version" : "2.5"

}

}

{

"blueprint" : "my-blueprint",

"host_groups" :[

{

"name" : "master-host",

"hosts" : [

{

"fqdn" : "master001.ambari.apache.org"

}

]

},

{

"name" : "worker-host",

"hosts" : [

{

"fqdn" : "worker001.ambari.apache.org"

},

{

"fqdn" : "worker002.ambari.apache.org"

},

{

"fqdn" : "worker099.ambari.apache.org"

}

]

}

]

}

1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

Page 13: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Create a cluster with Blueprints{

"configurations" : [

{

"hdfs-site" : {

"dfs.datanode.data.dir" : "/hadoop/1,

/hadoop/2,/hadoop/3"

}

}

],

"host_groups" : [

{

"name" : "master-host",

"components" : [

{ "name" : "NAMENODE” },

{ "name" : "RESOURCEMANAGER” },

],

"cardinality" : "1"

},

{

"name" : "worker-host",

"components" : [

{ "name" : "DATANODE" },

{ "name" : "NODEMANAGER” },

],

"cardinality" : "1+"

},

],

"Blueprints" : {

"stack_name" : "HDP",

"stack_version" : "2.5"

}

}

{

"blueprint" : "my-blueprint",

"host_groups" :[

{

"name" : "master-host",

"hosts" : [

{

"fqdn" : "master001.ambari.apache.org"

}

]

},

{

"name" : "worker-host",

"hosts" : [

{

"fqdn" : "worker001.ambari.apache.org"

},

{

"fqdn" : "worker002.ambari.apache.org"

},

{

"fqdn" : "worker099.ambari.apache.org"

}

]

}

]

}

1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

Page 14: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Blueprints for Large Scale

• Kerberos, secure out-of-the-box

• High Availability is setup initially for

NameNode, YARN, Hive, Oozie, etc

• Host Discovery allows Ambari to

automatically install services for a Host

when it comes online

• Stack Advisor recommendations

Page 15: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

POST /api/v1/clusters/MyCluster/hosts

[

{

"blueprint" : "single-node-hdfs-test2",

"host_groups" :[

{

"host_group" : "slave",

"host_count" : 3,

"host_predicate" : "Hosts/cpu_count>1”

}, {

"host_group" : "super-slave",

"host_count" : 5,

"host_predicate" : "Hosts/cpu_count>2&

Hosts/total_mem>3000000"

}

]

}

]

Blueprint Host Discovery

Page 16: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Kerberos

Available since Ambari 2.0

• Ambari manages Kerberos principals and keytabs

• Works with existing MIT KDC or Active Directory

• Once Kerberized, handles

• Adding hosts

• Adding components to existing hosts

• Adding services

• Moving components to different hosts

Page 17: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Management Packs - Motivation

• Release Management

o Ambari core and stacks released together

o Stack changes require Ambari release

oDecouple stack and Ambari core releases

• Add-on Services

oRelease vehicle for 3rd party services

o Self contained release artifacts

Page 18: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Management Packs – Release Trains

Page 19: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Management Packs

• Generalized release artifact for stacks, add-on

services, views, etc

• Decouples stack releases from Ambari core

release

• Tarballs with metadata for applicability and

content

• Stack is an overlay of multiple management

packs

Page 20: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Overlay of Management Packs

Page 21: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Management Pack++

Short Term Goals (Ambari 2.4)

• Retrofit in Stack Processing Framework

• Enable 3rd party to ship add-on services

• Command line support

Long Term Goals (Future)

• Management Pack Framework

• Deliver Views

• Rest API support

Page 22: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Role Based Access Control (RBAC)

As Ambari & organizations grow,

so do security needs

Ambari integrates with external

authentication systems & LDAP

Page 23: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

RBAC Terms

• Roles have permissions,

e.g., add services to cluster

• Roles are applied to Resources

e.g., Ambari, particular Cluster, particular View

• Users belong to groups

• A group has a role

• Users can also have additional roles

Page 24: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

New RBAC Roles

allAmbari Admin

Cluster Admin except manage permissions

Cluster Op except add services, Kerberos,

manage Alerts, & upgrades

Service Admin except alter cluster topologyor install components

Service Op except change configs

Read-Only only view

Page 25: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Background: Upgrade Terminology

Manual Upgrade

The user follows instructions to upgrade

the stack

Incurs downtime

Page 26: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Background: Upgrade Terminology

Manual Upgrade

The user follows instructions to upgrade

the stack

Incurs downtime

Rolling Upgrade

Automated

Upgrades one component

per host at a time

Preserves cluster operation

and minimizes service impact

Page 27: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Background: Upgrade Terminology

ExpressUpgrade

Automated

Runs in parallel across hosts

Incurs downtime

Manual Upgrade

The user follows instructions to upgrade

the stack

Incurs downtime

Rolling Upgrade

Automated

Upgrades one component

per host at a time

Preserves cluster operation

and minimizes service impact

Page 28: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Automated Upgrade: Rolling or Express

Check Prerequisites

Review the prereqs to confirm your cluster configs are ready

Prepare

Take backups of critical cluster metadata

Perform Upgrade

Perform the HDP upgrade. The steps depend on upgrade method: Rolling or Express

Register + Install

Register the HDP repository and install the targetHDP version on the cluster

Finalize

Finalize the upgrade, making the targetversion the currentversion

Page 29: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Process: Rolling Upgrade

ZooKeeper

Ranger

Core Masters

Core Slaves

Hive

Oozie

Falcon

Clients

Kafka

Knox

Storm

Slider

Flume

Finalize or Downgrade

HDFS, YARN, MR, Tez, HBase, Pig. Hive, etc.

HDFS

YARN

HBase

Page 30: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Grafana for Ambari Metrics

• Grafana as a “Native UI” for

Ambari Metrics

• Pre-built Dashboards

Host-level, Service-level

• Supports HTTPS

• System Home, Servers

• HDFS Home, NameNodes,

DataNodes

• YARN Home, Applications,

Job History Server

• HBase Home,

Performance, Misc

FEATURES DASHBOARDS

Page 31: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Grafana includes pre-built dashboards for visualizing the most important cluster metrics.

Page 32: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

The HDFS NameNodedashboard highlightsfile system activity.

Page 33: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Demo

• Grafana

• LogSearch

Page 34: Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting

Future of Ambari

• Cloud features

• Multiple instances of same service at different

versions, e.g., Spark 1.6 & Spark 2.0

• YARN assemblies

• Component & Patch Upgrades: upgrade individual

components in the same stack version, e.g., just

DN and RM in HDP 2.4.*.* with zero downtime