Upload
yifeng-jiang
View
324
Download
0
Embed Size (px)
Citation preview
ApacheSpark:EnterpriseSecurityforProductionDeployments
蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushDecember21,2016
2 ©HortonworksInc.2011– 2016.AllRightsReserved
Whatarethesecurityrequirements?
à Sparkusershouldbeauthenticated
à IntegratewithcorporateLDAP/AD
à Allowonlyauthorizedusersaccess
à Auditallaccess
à Protectdatabothinmotion&atrest
à Easilymanageallsecurity
à Makesecurityeasytomanage
à …
3 ©HortonworksInc.2011– 2016.AllRightsReserved
InteractingwithSpark
Ex
SparkonYARN
Zeppelin
Spark-Shell
Ex
SparkThriftServer
Driver
RESTServerDriver
Driver
Driver
4 ©HortonworksInc.2011– 2016.AllRightsReserved
Context:SparkDeploymentModes
• Spark on YARN– Spark driver (SparkContext) in YARN AM(yarn-cluster)– Spark driver (SparkContext) in local (yarn-client):
• Spark Shell & Spark Thrift Server runs in yarn-client only
Client
Executor
App Master
Spark Driver
Client
Executor
App Master
Spark Driver
YARN-Client YARN-Cluster
5 ©HortonworksInc.2011– 2016.AllRightsReserved
SparkonYARN
Spark Submit
John Doe
SparkAM
1
HadoopCluster
HDFS
Executor
YARNRM
4
2 3
NodeManager
6 ©HortonworksInc.2011– 2016.AllRightsReserved
DEMO
ADATALAKEWITHOUTSECURITY
7 ©HortonworksInc.2011– 2016.AllRightsReserved
Spark– Security– FourPillars
à Authenticationà Authorizationà Audità Encryption
SparkleveragesKerberosonYARN
8 ©HortonworksInc.2011– 2016.AllRightsReserved
AuthenticateuserswithKerberos/AD
KDC
Use Spark ST, submit Spark Job
Spark gets Namenode (NN) service ticket
YARN launches Spark Executors using John Doe’s identity
Get service ticket for Spark,
John Doe
SparkAMNN
ExecutorreadsfromHDFSusingJohnDoe’sdelegationtoken
kinit
1
2
3
4
5
6
7
HadoopCluster
9 ©HortonworksInc.2011– 2016.AllRightsReserved
Spark– Kerberos- Example
kinit -kt /etc/security/keytabs/[email protected]
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/hdp/current/spark-client/lib/spark-examples*.jar 10
10 ©HortonworksInc.2011– 2016.AllRightsReserved
HDFS
AllowonlyauthorizedusersaccesstoSparkjobs
YARN Cluster
A B C
KDC
Use Spark ST, submit Spark Job
Get Namenode (NN) service ticket
Executors read from HDFS
Client gets service ticket for Spark
RangerCanJohnlaunchthisjob?CanJohnreadthisfile
John Doe
11 ©HortonworksInc.2011– 2016.AllRightsReserved
SparkSQL:Finegrainedsecurity
12 ©HortonworksInc.2011– 2016.AllRightsReserved
SparkSQL Security-- CurrentStatusà SparkSQL – Onlycoarsegrainaccesscontroltoday
JDBCclientSpark
ThriftServer(driver)
YARNContainer
HDFS/apps/hive/warehouse/…
HiveMetastore
YARNContainer(DAG)
Runashiveuser
13 ©HortonworksInc.2011– 2016.AllRightsReserved
SparkSQL Security
à SparkThriftServer&SparkExecutorsrunasHiveusertoreadalldata– NoauthorizationsupportinSTS– NoRangerintegrationsupport– AnyonecanauthenticatetoSTScanrealALLdata
à Noidentitypropagationon2nd hop(STStoExecutors):nodoAs equivalenceinHS2
14 ©HortonworksInc.2011– 2016.AllRightsReserved
YARN & HDFS
HowHiveSecurityWorks
HiveServer 2A B C
KDC
Use Hive ST, submit query
4. Hive gets Namenode (NN) service ticket
5.Hive creates MR/ Tez using NN ST as proxy user
Ranger
1.Original request w/user id/password
Client gets query result
O/JDBC clients
LDAP
2.HS2 Authenticates user/pass
Ranger Sync users/groups from LDAP
3. Ranger AuthZ
15 ©HortonworksInc.2011– 2016.AllRightsReserved
DEMO
HIVE&SPARKSQL AUTHORIZATION
16 ©HortonworksInc.2011– 2016.AllRightsReserved
KeyFeatures:SparkColumnSecuritywithLLAP
à Fine-GrainedColumnLevelAccessControlforSparkSQL.
à Fullydynamicpoliciesperuser.Doesn’trequireviews.
à UseStandardRangerpoliciesandtoolstocontrolaccessandmaskingpolicies.
Flow:1. SparkSQL getsdatalocations
knownas“splits” fromHiveServerandplansquery.
2. HiveServer2authorizesaccessusingRanger.Per-userpolicieslikerowfilteringareapplied.
3. Sparkgetsamodifiedqueryplanbasedondynamicsecuritypolicy.
4. SparkreadsdatafromLLAP.Filtering/maskingguaranteedbyLLAPserver.
HiveServer2
Authorization
HiveMetastoreDataLocationsViewDefinitions
LLAPDataRead
FilterPushdown
RangerServer
DynamicPolicies
SparkClient
12
4
3
17 ©HortonworksInc.2011– 2016.AllRightsReserved
Example:Per-UserRowFilteringbyRegioninSparkSQL
SparkUser2(EastRegion)
SparkUser1(WestRegion)
OriginalQuery:SELECT*fromCUSTOMERSWHEREtotal_spend>10000
QueryRewritesbasedonDynamicRangerPolicies
LLAPDataAccessUserID Region TotalSpend1 East 5,1312 East 27,8283 West 55,4934 West 7,1935 East 18,193
DynamicRewrite:SELECT*fromCUSTOMERSWHEREtotal_spend>10000
ANDregion=“east”
DynamicRewrite:SELECT*fromCUSTOMERSWHEREtotal_spend>10000
ANDregion=“west”
FinegrainedSecuritytoSparkSQL
http://bit.ly/2bLghGzhttp://bit.ly/2bTX7Pm
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DynamicMaskingandRowLevelFiltering
Country National ID CCNo Name DOB MRN PolicyIDUS 232323233 4539067047629850 JohnDoe 9/12/1969 8233054331 nj23j424
US 333287465 5391304868205600 JaneDoe 9/13/1969 3736885376 cadsd984
Japan T30007873 4532488639863821 BenJackson 73/1975 876392473A KK-287365
RangerPolicyEnforcement
Country NationalID
CC No MRN Name
US xxxxx3233 4539 xxxxxxxxxxxx null JohnDoe
US xxxxx7465 5391 xxxxxxxxxxxx null JaneDoe
Country NationalID
Name MRN
Japan 232323233 JohnDoe 8233054331
UsersfromUScustomersupportgroupsseerowfiltereddataforUSpersonswithCCandSSNasmaskedvaluesandMRNisnullified
JapanHealthPolicyAdminsviewrelevantcolumnsofdataunmaskedbutarerestrictedbyrowfilteringpoliciestoseedataforJapanpersonsonly
19 ©HortonworksInc.2011– 2016.AllRightsReserved
THANKYOU
@uprush