Click here to load reader
Upload
wyukawa
View
493
Download
0
Embed Size (px)
Citation preview
UpgradingfromHDP2.1toHDP2.5(IncludingpodcastCM)
2017/03/03@wyukawa
HadoopSCR#hadoopreadinghttps://www.eventbrite.com/e/hadoop-22-tickets-
31987821435
Aboutme
• DataEngineeratLINEformorethan4years• Thishadoop upgradeoperationisthethirdtimeatLINE– https://www.slideshare.net/wyukawa/upgrading-fromhdp21tohdp24-59994044 isthesecondtime
2014/6-2017/1• Machines
– 40servers– CPU24processors– Memory64GB– HDD3.6TBx12– Network1Gbps– Hardwaremaintenancedeadlineis2017/6
• HDP2.1(Ambari 1.6.0)– Hadoop2.4.0
• NameNode HA• Ambari 1.6didn’tsupportResourceManager HA
– Hive0.13.0• MapReduce(notTez)
Hadoop,HiveofHDP2.1
Azkaban3.1.0
Presto0.163
Cognos
Prestogres
Netezza
DBDB
ETLwithPython2.7.11
InfiniDBPentaho
Saiku
2017/1
2017/1-• Machines
– 40servers– CPU40processors– Memory256GB– HDD6.1TBx12– Network10Gbps
• HDP2.5.3(Ambari 2.4.2)– Hadoop2.7.3
• NameNode HA• ResourceManager HA
– Hive1.2.1• MapReduce• Tez
Howtoupgrade• SetupnewHadoopClustertonewmachines• Bluegreendeploymentallatonce– http://aws.typepad.com/sajp/2015/12/what-is-blue-green-deployment.html
• MigratedatabyDistCp(-m 20 -bandwidth 125)– Copy500TB(firstcopytookabout3days)
• Notparallelexecutiononbothhadoop clusters• Seehttp://d.hatena.ne.jp/wyukawa/20170131/1485854288 indetail
DistCp withHDFSSnapshot
• http://qiita.com/bwtakacy/items/fa63cdcdfc05e4043c69 isgoodarticle
• -update-diffoptiondoesn’tsupportwebhdfs://orig/...– Edithdfs-site.xml indestinationhadoop andusehdfs://orig/...
MigrateHiveschema
• Useshowcreatetablecommand• Usemsck repaircommand toaddpartition– Butitdidn’tworkintoomany(forexample,4000)partitiontables
• Usewebhdfs://...inexternaltable– can’tusehdfs://…– butemptyreturnswhenyouselectbypresto
HDFS/YARN/Hive/Sqoop setting• dfs.datanode.failed.volumes.tolerated=1• fs.trash.interval=4320• Namenode heap64GB• yarn.nodemanager.resource.memory-mb 100GB• yarn.scheduler.maximum-allocation-mb 100GB• UseDominantResourceCalculator
– https://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/
• hive.server2.authentication=NOSASL• hive.server2.enable.doAs=false• hive.auto.convert.join=false• hive.support.sql11.reserved.keywords=false• org.apache.sqoop.splitter.allow_text_splitter=true• SometimesuseTez
– https://community.hortonworks.com/questions/24953/solution-for-hive-runtime-error-while-processing-r.html
Myfeeling
• Ifyouupgradehadoop withmanybatches(forexample,morethan100azkaban flows),manyerrorswilloccurthenextday– highlyrecommendtoupgradeonfirsthalfoftheweek.WeupgradedonTuesday.
– sharejobstoaddressbatcherror• Ifyoudosuchkindjobsalone,youwillbeoverwhelmed
Podcast
• https://itunes.apple.com/jp/podcast/wyukawas-podcast/id1152456701
• http://wyukawa.tumblr.com/