11

Click here to load reader

Upgrading from-hdp-21-to-hdp-25

  • Upload
    wyukawa

  • View
    493

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Upgrading from-hdp-21-to-hdp-25

UpgradingfromHDP2.1toHDP2.5(IncludingpodcastCM)

2017/03/03@wyukawa

HadoopSCR#hadoopreadinghttps://www.eventbrite.com/e/hadoop-22-tickets-

31987821435

Page 2: Upgrading from-hdp-21-to-hdp-25

Aboutme

• DataEngineeratLINEformorethan4years• Thishadoop upgradeoperationisthethirdtimeatLINE– https://www.slideshare.net/wyukawa/upgrading-fromhdp21tohdp24-59994044 isthesecondtime

Page 3: Upgrading from-hdp-21-to-hdp-25

2014/6-2017/1• Machines

– 40servers– CPU24processors– Memory64GB– HDD3.6TBx12– Network1Gbps– Hardwaremaintenancedeadlineis2017/6

• HDP2.1(Ambari 1.6.0)– Hadoop2.4.0

• NameNode HA• Ambari 1.6didn’tsupportResourceManager HA

– Hive0.13.0• MapReduce(notTez)

Page 4: Upgrading from-hdp-21-to-hdp-25

Hadoop,HiveofHDP2.1

Azkaban3.1.0

Presto0.163

Cognos

Prestogres

Netezza

DBDB

ETLwithPython2.7.11

InfiniDBPentaho

Saiku

2017/1

Page 5: Upgrading from-hdp-21-to-hdp-25

2017/1-• Machines

– 40servers– CPU40processors– Memory256GB– HDD6.1TBx12– Network10Gbps

• HDP2.5.3(Ambari 2.4.2)– Hadoop2.7.3

• NameNode HA• ResourceManager HA

– Hive1.2.1• MapReduce• Tez

Page 6: Upgrading from-hdp-21-to-hdp-25

Howtoupgrade• SetupnewHadoopClustertonewmachines• Bluegreendeploymentallatonce– http://aws.typepad.com/sajp/2015/12/what-is-blue-green-deployment.html

• MigratedatabyDistCp(-m 20 -bandwidth 125)– Copy500TB(firstcopytookabout3days)

• Notparallelexecutiononbothhadoop clusters• Seehttp://d.hatena.ne.jp/wyukawa/20170131/1485854288 indetail

Page 7: Upgrading from-hdp-21-to-hdp-25

DistCp withHDFSSnapshot

• http://qiita.com/bwtakacy/items/fa63cdcdfc05e4043c69 isgoodarticle

• -update-diffoptiondoesn’tsupportwebhdfs://orig/...– Edithdfs-site.xml indestinationhadoop andusehdfs://orig/...

Page 8: Upgrading from-hdp-21-to-hdp-25

MigrateHiveschema

• Useshowcreatetablecommand• Usemsck repaircommand toaddpartition– Butitdidn’tworkintoomany(forexample,4000)partitiontables

• Usewebhdfs://...inexternaltable– can’tusehdfs://…– butemptyreturnswhenyouselectbypresto

Page 9: Upgrading from-hdp-21-to-hdp-25

HDFS/YARN/Hive/Sqoop setting• dfs.datanode.failed.volumes.tolerated=1• fs.trash.interval=4320• Namenode heap64GB• yarn.nodemanager.resource.memory-mb 100GB• yarn.scheduler.maximum-allocation-mb 100GB• UseDominantResourceCalculator

– https://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/

• hive.server2.authentication=NOSASL• hive.server2.enable.doAs=false• hive.auto.convert.join=false• hive.support.sql11.reserved.keywords=false• org.apache.sqoop.splitter.allow_text_splitter=true• SometimesuseTez

– https://community.hortonworks.com/questions/24953/solution-for-hive-runtime-error-while-processing-r.html

Page 10: Upgrading from-hdp-21-to-hdp-25

Myfeeling

• Ifyouupgradehadoop withmanybatches(forexample,morethan100azkaban flows),manyerrorswilloccurthenextday– highlyrecommendtoupgradeonfirsthalfoftheweek.WeupgradedonTuesday.

– sharejobstoaddressbatcherror• Ifyoudosuchkindjobsalone,youwillbeoverwhelmed

Page 11: Upgrading from-hdp-21-to-hdp-25

Podcast

• https://itunes.apple.com/jp/podcast/wyukawas-podcast/id1152456701

• http://wyukawa.tumblr.com/