17
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ranger Rommel Garcia

Apache Ranger

Embed Size (px)

Citation preview

Page 1: Apache Ranger

Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache RangerRommel Garcia

Page 2: Apache Ranger

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Who Am I

• Solutions Engineer @hortonworks• Security SME Lead @hortonworks• Author “Virtualizing Hadoop: How to Install, Deploy, and Optimize

Hadoop in A Virtualized Architecture”

Page 3: Apache Ranger

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

5 Pillars of Security

• Authentication• Authorization• Audit• Encryption• Centralized Administration

Page 4: Apache Ranger

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop Security Tools

• AD/LDAP (authentication)• Apache Knox (authentication)• Kerberos (authentication)• Apache Ranger (authorization, audit, kms)• HDFS TDE (data encryption)• Wire Encryption (data protection)

Page 5: Apache Ranger

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Data Sources

Page 6: Apache Ranger

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Ranger

• Provides centralized policy definition for authorizing access to resources

• Supported components as of v0.5• HDFS• HBase• Hive• YARN• Knox• Storm• Solr• Kafka

Page 7: Apache Ranger

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Agent AgentAgent AgentAgent Agent

Apache Ranger authZ Architecture

HBase Hive YARN Knox Storm Solr Kafka

Agent

HDFS

Agent

Audit Server

Policy Server

Administration Portal

REST APIs

DB

SOLR

HDFS

KMS

LDAP/AD

user/group syncLog4j

Page 8: Apache Ranger

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Sample Simplified Workflow - HDFS

Policy Manager

Agent

Admin sets policies for HDFS files/folder

Data scientist runs a map reduce job

User Application

Users access HDFS data through application Name Node

IT users access HDFS through CLI

Namenode usesAgent for Authorization

Audit Database Audit logs pushed to DB

Namenode provides resource access to user/client

1

2

2

2

3

4

5

Page 9: Apache Ranger

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

authZ Best Practice – POSIX + Ranger

• HDFS -> POSIX -> owned by hdfs -> Ranger ACLs

• Hive -> POSIX -> owned by hive -> Ranger ACLs

• HBase -> POSIX -> owned by hbase -> Ranger ACLs

• Solr -> native -> owned by solr -> Ranger ACLs

• Kafka -> owned by kafka -> Ranger ACLs

Page 10: Apache Ranger

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

authZ Best Practice - Ranger

10

000(posix permissions on all HDFS files)

Page 11: Apache Ranger

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Ranger UserSync Best Practice

11

• Ensure LDAPS is used to integrate with Ranger• Create OU ONLY for Hadoop users for performance• Only run usersync when necessary

– How much users are being added and how often– How much users are changing roles– Too much syncing can degrade LDAP performance

• Do not sync anonymously

Page 12: Apache Ranger

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Ranger Audit Locations

12

• HDFS– Long term storage that can be used to understand user event

trends and predict anomaly• RDBMS

– When SQL is preferred by auditors– MySQL, Oracle, Postgres, SQL Server

• Solr– Nice quick reporting metrics to understand user event trends

• Log4j Appenders

Page 13: Apache Ranger

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Ranger – ACLs & Audit Demo

Environment• CentOS 6.6• 2 vms• FreeIPA 2.0• HDP 2.3

• Apache Ranger v0.5• Kerberized 2 node cluster

Page 14: Apache Ranger

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Q&A

Page 15: Apache Ranger

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

1

°

°

°

°

° °

° °

° °

° °

° N°

Ranger KMS + HDFS TDE

DATA ACCESS

DATA MANAGEMENT

1 ° ° ° ° °

° ° ° ° ° °

° ° ° ° ° °

SECURITY

YARN

HDFS Client

° ° ° ° ° °

° ° ° ° ° °

° °

° °

° °

° °

°HDFS (Hadoop Distributed File System)

Encryption Zone (attributes - EZKey ID, version)

HDFS-6134

Encrypted File(attributes - EDEK, IV)

Name Node

KeyProviderAPI

KeyProvider API

Key Management System (KMS)Hadoop-10433

KeyProvider API – Hadoop-10141

EDEK

DEK

Crypto Stream (r/w with DEK)

DEKs EZKs

Acronym Description

EZ Encryption Zone (an HDFS directory)

EZK Encryption Zone Key; master key associated with all files in an EZ

DEK Data Encryption Key, unique key associated with each file. EZ Key used to generate DEK

EDEK Encrypted DEK, Name Node only has access to encrypted DEK.

IV Initialization Vector

EDEK

EDEK

Page 16: Apache Ranger

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Ranger – KMS + TDE Demo

Exercise• Create an encryption zone• Create key for encryption zone• Create file• Load to hdfs, encrypted zone• List encrypted file• Print encrypted file

Page 17: Apache Ranger

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Thank you!Rommel Garcia@rommelgarcia/in/rommelgarcia