72
CHAPTER 8 Hadoop 設設設設設

CHAPTER 8

Embed Size (px)

DESCRIPTION

CHAPTER 8. Hadoop 設定與配置. Outline. 前置作業 Hadoop 安裝 設定 HBase 叢集安裝 Hadoop 基本 操作 HBase 基本 操作 網頁介面. 前置作業 Hadoop 安裝設定 HBase 叢集安裝 Hadoop 基本操作 HBase 基本操作 網頁介面. 前置作業 (1/5). Hadoop 可建立在 GNU/Linux 及 Win32 平台之上,本實作以 GNU/Linux 為安裝平台 。 - PowerPoint PPT Presentation

Citation preview

CHAPTER 5

CHAPTER 8Hadoop1OutlineHadoopHBaseHadoopHBase

22HadoopHBaseHadoopHBase

33(1/5)HadoopGNU/LinuxWin32GNU/LinuxHadoopJavasshHadoopJavaJava(JRE)Java()44CentOS 5.5OpenJDKJavajava -version

~# java -versionjava version "1.6.0_17"OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386)OpenJDK Client VM (build 14.0-b16, mixed mode)OpenJDKyum~# yum -y install java-1.6.0-openjdk(2/5)55HadoopOpenJDKOpenJDKJavaOracle (Sun) Java JDKOracle (Sun) Java JDKOracle(http://www.oracle.com)

(3/5)66jdk-6u25-linux-i586.bin/usr~# chmod +x jdk-6u25-linux-i586.bin~# ./jdk-6u25-linux-i586.bin/usr()jdk1.6.0_25alternativesOracle (Sun) Java JDKOpenJDK~# alternatives --install /usr/bin/java java /usr/jdk1.6.0_25/bin/java 20000~# alternatives --install /usr/bin/javac javac /usr/jdk1.6.0_25/bin/javac 20000(4/5)77Java~# java versionjava version "1.6.0_25"Java(TM) SE Runtime Environment (build 1.6.0_25-b06)Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)~#javac -versionJavac 1.6.0_25sshrsync~# yum -y install openssh rsync~# /etc/init.d/sshd restartHadooproot(5/5)88HadoopHBaseHadoopHBase

99Hadoop HadoopLocal (Standalone) ModePseudo-Distributed ModeFully-Distributed Mode1010Local (Standalone) Mode(1/7)HadoopApache Hadoop(http://hadoop.apache.org/)HadoopHadoop 0.21.0Hadoop 0.20.2wgethadoop-0.20.2.tar.gz

~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz~# tar zxvf hadoop-0.20.2.tar.gz1111Local (Standalone) Mode(2/7)hadoop-0.20.2/opthadoop~# mv hadoop-0.20.2 /opt/hadoopJavaHadoophadoopviconf/hadoop-env.sh~# cd /opt/hadoop//hadoop# vi conf/hadoop-env.sh1212Local (Standalone) Mode(3/7)hadoop-env.shJAVA_HOME(export JAVA_HOME=/usr/jdk1.6.0_25) IPv6IPv6hadoop-env.shexport HADOOP_OPTS=-Djava.net.preferIPv4Stack=trueIPv4

# Command specific options appended to HADOOP_OPTS when specified.........export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"export JAVA_HOME=/usr/jdk1.6.0_25 JAVA_HOMEexport HADOOP_OPTS=-Djava.net.preferIPv4Stack=true IPv41313Local (Standalone) Mode(4/7)Hadoop Local (Standalone) ModeHadoop

conf/hadoop-env.shJAVA_HOME/hadoop# bin/hadoopUsage: hadoop [--config confdir] COMMANDwhere COMMAND is one of: namenode -format format the DFS filesystem.........or CLASSNAME run the class named CLASSNAMEMost commands print help when invoked w/o parameters.1414Local (Standalone) Mode(5/7)Hadoophadoop-0.20.2-examples.jargrepinputconf/xmlinput/hadoop# mkdir input /hadoop# cp conf/*.xml input1515Local (Standalone) Mode(6/7)hadoop-0.20.2-examples.jargrepconfig

/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'/hadoop# cat output/*13 configuration4 configuration.xsl1 configure1616Local (Standalone) Mode(7/7)hadoop-0.20.2-examples.jar grepoutputoutput/hadoop# rm -rf output1717Pseudo-Distributed Mode(1/9)Pseudo-Distributed ModeLocal (Standalone) Modeconfcore-site.xmlhdfs-site.xmlmapred-site.xmlcore-site.xml

/hadoop# vi conf/core-site.xml1818Pseudo-Distributed Mode(2/9)core-site.xml

fs.default.name hdfs://localhost:9000

1919Pseudo-Distributed Mode(3/9)hdfs-site.xml/hadoop# vi conf/hdfs-site.xmlhdfs-site.xml

dfs.replication 1

2020Pseudo-Distributed Mode(4/9)mapred-site.xml/hadoop# vi conf/mapred-site.xmlmapred-site.xml

mapred.job.tracker localhost:9001

2121Pseudo-Distributed Mode(5/9)SSHHadoopsshssh(yesEnter)~# ssh localhostThe authenticity of host 'localhost (127.0.0.1)' can't be established.RSA key fingerprint is Are you sure you want to continue connecting (yes/no)? yes yesWarning: Permanently added 'localhost' (RSA) to the list of known hosts.root@localhost's password: 2222Pseudo-Distributed Mode(6/9)Ctrl + C~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keysexit~# ssh localhostLast login: Mon May 16 10:04:39 2011 from localhost~# exit2323Pseudo-Distributed Mode(7/9)HadoopHadoopbin/hadoop namenode -formatHDFS/hadoop# bin/hadoop namenode -format11/05/16 10:20:27 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNode.........11/05/16 10:20:28 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1************************************************************/2424Pseudo-Distributed Mode(8/9)bin/start-all.shJobtrackerTasktracker/hadoop# bin/start-all.shstarting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.outlocalhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host01.outlocalhost: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Host01.outstarting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.outlocalhost: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host01.out2525Pseudo-Distributed Mode(9/9)hadoop-0.20.2-examples.jar grepbin/hadoop fs -putconfHDFSinputhadoop-0.20.2-examples.jar grep/hadoop# bin/hadoop fs -put conf input/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'2626Fully-Distributed Mode(1/14)HadoopMasterSlaveJavasshIPHost01Namenode + Jobtracker192.168.1.1Host02Datanode + Tasktracker192.168.1.22727Fully-Distributed Mode(2/14)stop-all.shHadoopHadoop.ssh/hadoop# /opt/hadoop/bin/stop-all.sh~# rm -rf /opt/hadoop~# rm -rf ~/.ssh~# rm -rf /tmp/*HadoopHost01Host01Hadoop 0.20.2Hadoop/opt/hadoop~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz~# tar zxvf hadoop-0.20.2.tar.gz~# mv hadoop-0.20.2 /opt/hadoop2828Fully-Distributed Mode(3/14)bin/hadoop-env.shHadoop/opt/hadoopvibin/hadoop-env.sh

~# cd /opt/hadoop//hadoop# vi conf/hadoop-env.shhadoop-env.shJAVA_HOME# Command specific options appended to HADOOP_OPTS when specifiedexport HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"export JAVA_HOME=/usr/jdk1.6.0_25 JAVA_HOME2929Fully-Distributed Mode(4/14)conf/core-site.xmlvi/hadoop# vi conf/core-site.xmlconf/core-site.xml3030Fully-Distributed Mode(5/14)

fs.default.name hdfs://Host01:9000 hadoop.tmp.dir /var/hadoop/hadoop-${user.name}

3131Fully-Distributed Mode(6/14)conf/hdfs-site.xmlviconf/hdfs-site.xml/hadoop# vi conf/hdfs-site.xmlconf/hdfs-site.xml

dfs.replication 2

3232Fully-Distributed Mode(7/14)conf/mapred-site.xmlviconf/mapred-site.xml/hadoop# vi conf/mapred-site.xmlconf/mapred-site.xml

mapred.job.tracker Host01:9001

3333Fully-Distributed Mode(8/14)conf/mastersviconf/masters/hadoop# vi conf/mapred-site.xmlconf/slavesviconf/slavesconf/slaveslocalhostHost02/hadoop# vi conf/mapred-site.xml3434Fully-Distributed Mode(9/14)scp~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys~# scp -r ~/.ssh Host02:~/~# ssh Host02 Host01Host02~# ssh Host01 Host02Host01~# exit Host01~# exit Host02 (Host01)3535Fully-Distributed Mode(10/14)HadoopHadoopSlave(NFS)Host01HadoopHost02~# scp -r /opt/hadoop Host02:/opt/HadoopHDFS/hadoop# bin/hadoop namenode -format3636Fully-Distributed Mode(11/14)11/05/16 21:52:13 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = Host01/127.0.0.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/11/05/16 21:52:13 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel......11/05/16 21:52:13 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at Host01/127.0.0.1************************************************************/3737Fully-Distributed Mode(12/14)Hadoop/hadoop# bin/start-all.shstarting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.outHost02: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host02.outstarting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.outHost02: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host02.out3838Fully-Distributed Mode(13/14)Fully-Distributed Mode bin/hadoop dfsadmin -reportHDFSHDFS/hadoop# bin/hadoop dfsadmin -reportConfigured Capacity: 9231007744 (8.6 GB)......Blocks with corrupt replicas: 0Missing blocks: 0-------------------------------------------------Datanodes available: 1 (1 total, 0 dead)......DFS Remaining%: 41.88%Last contact: Mon May 16 22:15:03 CST 20113939Fully-Distributed Mode(14/14)hadoop-0.20.2-examples.jar grepHDFSinputHadoopconf/hadoop-0.20.2-examples.jar grep/hadoop# bin/hadoop fs -mkdir input/hadoop# bin/hadoop fs -put conf/* input//hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'/hadoop# bin/hadoop fs -cat output/part-0000019 configuration6 configuration.xsl1 configure4040HadoopHBaseHadoopHBase

4141HBase(1/9)HBaseHadoopHBase0.20ZooKeeperNTP4242HBase(2/9)HBaseHBase(http://hbase.apache.org/)HBasehbase-0.90.2.tar.gz/opt/hbaseHBase~# wget http://apache.cs.pu.edu.tw//hbase/hbase-0.90.2/hbase-0.90.2.tar.gz/hadoop~# tar zxvf hbase-0.90.2.tar.gz~# mv hbase-0.90.2 /opt/hbase~# cd /opt/hbase/viconf/hbase-env.sh/hbase# vi conf/hbase-env.sh4343HBase(3/9)conf/hbase-env.sh

export JAVA_HOME=/usr/jdk1.6.0_25/export HBASE_MANAGES_ZK=trueexport HBASE_LOG_DIR=/tmp/hadoop/hbase-logsexport HBASE_PID_DIR=/tmp/hadoop/hbase-pidsconf/hbase-site.xmlHBase/hbase# vi conf/hbase-site.xml4444HBase(4/9)conf/hbase-site.xml

hbase.rootdir hdfs://Host01:9000/hbase hbase.cluster.distributed true

hbase.zookeeper.property.clientPort 2222 hbase.zookeeper.quorum Host01,Host02 hbase.zookeeper.property.dataDir /tmp/hadoop/hbase-data

4545HBase(5/9) hbase.tmp.dir /var/hadoop/hbase-${user.name} hbase.master Host01:60000

4646HBase(6/9)viconf/regionservers/hbase# vi conf/regionserversSlaveslavesSlaveHost02 HadoopHBaseconf//hbase# cp /opt/hadoop/conf/core-site.xml conf//hbase# cp /opt/hadoop/conf/mapred-site.xml conf//hbase# cp /opt/hadoop/conf/hdfs-site.xml conf/4747HBase(7/9)hbaselib/hadoop-core-0.20-append-r1056497.jarhadoophadoop-0.20.2-core.jarhbaselib/hbase# rm lib/hadoop-core-0.20-append-r1056497.jar/hbase# cp /opt/hadoop/hadoop-0.20.2-core.jar ./lib/hbaseSlave/hbase# scp -r /opt/hbase Host02:/opt/hbase4848HBase(8/9)HBase

/hbase# bin/start-hbase.shHost02: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host02.outHost01: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host01.outstarting master, logging to /tmp/hadoop/hbase-logs/hbase-root-master-Host01.outHost02: starting regionserver, logging to /tmp/hadoop/hbase-logs/hbase-root-regionserver-Host02.out4949HBase(9/9)hbase shellHBaselistHBsae

/hbase# bin/hbase shellHBase Shell; enter 'help' for list of supported commands.Type "exit" to leave the HBase ShellVersion 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011

hbase(main):001:0> list listEnterTABLE0 row(s) in 0.3950 seconds

hbase(main):002:0>5050HadoopHBaseHadoopHBase

5151Hadoop(1/7)bin/start-all.shHadoopbin/stop-all.shHadoopbin/hadoop versionHadoopbin/hadoop dfsadmin reportHDFSbin/hadoop namenode formatbin/hadoop fs -ls HDFSbin/hadoop fs -ls /user/root/inputbin/hadoop fs -mkdir /user/root/tmpbin/hadoop fs -put conf/* /user/root/tmpHDFSbin/hadoop fs -cat /user/root/tmp/core-site.xmlHDFSbin/hadoop fs -get /user/root/tmp/core-site.xml /opt/hadoop/HDFSbin/hadoop fs -rm /user/root/tmp/core-site.xmlHDFSbin/hadoop fs -rmr /user/root/tmpHDFS5252Hadoop(2/7)HDFSbin/hadoop fs/hadoop# bin/hadoop fsUsage: java FsShell [-ls ] [-lsr ] [-du ].........-files specify comma separated files to be copied to the map reduce cluster-libjars specify comma separated jar files to include in the classpath.-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]5353Hadoop(3/7)MapReduce JobHadoopMapReduce JobjarHadoopbin/hadoop jar [MapReduce Job jar] [Job] [Job]Hadoophadoop-0.20.2-examples.jargrepwordcountpi/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar5454Hadoop(4/7)Hadoopjarhadoop-0.20.2-core.jarhadoop commonhdfsmapreducehadoop-0.20.2-test.jar Hadoophadoop-0.20.2-ant.jarAnt5555Hadoop(5/7)bin/hadoop jobJob/hadoop# bin/hadoop job -list all5 jobs submittedStates are: Running : 1 Succeded : 2 Failed : 3 Prep : 4JobId State StartTime UserName Priority SchedulingInfojob_201105162211_0001 2 1305555169692 root NORMAL NAjob_201105162211_0002 2 1305555869142 root NORMAL NAjob_201105162211_0003 2 1305555912626 root NORMAL NAjob_201105162211_0004 2 1305633307809 root NORMAL NAjob_201105162211_0005 2 1305633347357 root NORMAL NA5656Hadoop(6/7)Jobbin/hadoop job -status [JobID]/hadoop# bin/hadoop job -status job_201105162211_0001Jobbin/hadoop job -history []bin/hadoop job -history /user/root/output

Hadoop job: job_201105162211_0007=====================================Job tracker host name: Host01job tracker start time: Mon May 16 22:11:01 CST 2011User: rootJobName: grep-sort5757Hadoop(7/7)Jobbin/hadoop job/hadoop# bin/hadoop jobUsage: JobClient [-submit ] [-status ] [-counter ] [-kill ] [-set-priority ]. Valid values for priorities are: VERY_HIGH HIGH NORMAL LOW VERY_LOW.........

The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]5858HadoopHBaseHadoopHBase

5959HBase(1/10)HBasenamestudent IDcourse : mathcourse : historyJohn18085Adam27590bin/hbase hsellHBase/hbase# bin/hbase shellHBase Shell; enter 'help' for list of supported commands.Type "exit" to leave the HBase ShellVersion 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011

hbase(main):001:0>6060HBase(2/10)scoresstudentidcoursecolumn> create [], [column1], [column2],

hbase(main):001:0> create 'scores', 'studentid', 'course'0 row(s) in 1.8970 secondslistHBasehbase(main):002:0> listTABLEscores1 row(s) in 0.0170 seconds6161HBase(3/10)> describe []hbase(main):003:0> describe 'scores'DESCRIPTION ENABLED BLOCKCACHE => 'true'}]}1 row(s) in 0.0260 secondsscoresJohnstudentidcolumn1> put [], [row], [column], []hbase(main):004:0> put 'scores', 'John', 'studentid:', '1'0 row(s) in 0.0600 seconds6262HBase(4/10)Johncourse:mathcolumn80hbase(main):005:0> put 'scores', 'John', 'course:math', '80'0 row(s) in 0.0100 secondsJohncourse:historycolumn85hbase(main):006:0> put 'scores', 'John', 'course:history', '85'0 row(s) in 0.0080 seconds6363HBase(5/10)Adamstudentid2course:math75course:history90 hbase(main):007:0> put 'scores', 'Adam', 'studentid:', '2'0 row(s) in 0.0130 seconds

hbase(main):008:0> put 'scores', 'Adam', 'course:math', '75'0 row(s) in 0.0100 seconds

hbase(main):009:0> put 'scores', 'Adam', 'course:history', '90'0 row(s) in 0.0080 seconds6464HBase(6/10)scores> scan []

hbase(main):011:0> scan 'scores'ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=12 row(s) in 0.0420 seconds6565HBase(7/10)scoresJohn> get [], [row]

hbase(main):010:0> get 'scores', 'John'COLUMN CELL course:history timestamp=1305704046378, value=85 course:math timestamp=1305703949662, value=80 studentid: timestamp=1305703742527, value=13 row(s) in 0.0440 seconds6666HBase(8/10)scorescoursescolumn family> scan [], {COLUMNS => [column family]}

hbase(main):011:0> scan 'scores', {COLUMNS => 'course:'}ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=802 row(s) in 0.0250 seconds6767HBase(9/10)scorescolumn> scan [], {COLUMNS => [[column1], [column2],]}

hbase(main):012:0> scan 'scores', {COLUMNS => ['studentid','course:']}ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=12 row(s) in 0.0290 seconds6868HBase(10/10)disabledrophbase(main):003:0> disable 'scores'0 row(s) in 2.1510 seconds

hbase(main):004:0> drop 'scores'0 row(s) in 1.7780 seconds6969HadoopHBaseHadoopHBase

7070(1/2)HadoopHadoopHDFSMapReduceJobtracker(Mozilla Firefox)http://localhost:50070http://localhost:50030Jobtracker

7171(2/2)HBaseMasterMasterhttp://localhost:60010/Region ServerSlavehttp://localhost:60030/ ZooKeeperMasterhttp://localhost:60010/zk.jsp7272