Upload
chase-lee
View
38
Download
1
Embed Size (px)
DESCRIPTION
CHAPTER 8. Hadoop 設定與配置. Outline. 前置作業 Hadoop 安裝 設定 HBase 叢集安裝 Hadoop 基本 操作 HBase 基本 操作 網頁介面. 前置作業 Hadoop 安裝設定 HBase 叢集安裝 Hadoop 基本操作 HBase 基本操作 網頁介面. 前置作業 (1/5). Hadoop 可建立在 GNU/Linux 及 Win32 平台之上,本實作以 GNU/Linux 為安裝平台 。 - PowerPoint PPT Presentation
Citation preview
CHAPTER 5
CHAPTER 8Hadoop1OutlineHadoopHBaseHadoopHBase
22HadoopHBaseHadoopHBase
33(1/5)HadoopGNU/LinuxWin32GNU/LinuxHadoopJavasshHadoopJavaJava(JRE)Java()44CentOS 5.5OpenJDKJavajava -version
~# java -versionjava version "1.6.0_17"OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386)OpenJDK Client VM (build 14.0-b16, mixed mode)OpenJDKyum~# yum -y install java-1.6.0-openjdk(2/5)55HadoopOpenJDKOpenJDKJavaOracle (Sun) Java JDKOracle (Sun) Java JDKOracle(http://www.oracle.com)
(3/5)66jdk-6u25-linux-i586.bin/usr~# chmod +x jdk-6u25-linux-i586.bin~# ./jdk-6u25-linux-i586.bin/usr()jdk1.6.0_25alternativesOracle (Sun) Java JDKOpenJDK~# alternatives --install /usr/bin/java java /usr/jdk1.6.0_25/bin/java 20000~# alternatives --install /usr/bin/javac javac /usr/jdk1.6.0_25/bin/javac 20000(4/5)77Java~# java versionjava version "1.6.0_25"Java(TM) SE Runtime Environment (build 1.6.0_25-b06)Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)~#javac -versionJavac 1.6.0_25sshrsync~# yum -y install openssh rsync~# /etc/init.d/sshd restartHadooproot(5/5)88HadoopHBaseHadoopHBase
99Hadoop HadoopLocal (Standalone) ModePseudo-Distributed ModeFully-Distributed Mode1010Local (Standalone) Mode(1/7)HadoopApache Hadoop(http://hadoop.apache.org/)HadoopHadoop 0.21.0Hadoop 0.20.2wgethadoop-0.20.2.tar.gz
~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz~# tar zxvf hadoop-0.20.2.tar.gz1111Local (Standalone) Mode(2/7)hadoop-0.20.2/opthadoop~# mv hadoop-0.20.2 /opt/hadoopJavaHadoophadoopviconf/hadoop-env.sh~# cd /opt/hadoop//hadoop# vi conf/hadoop-env.sh1212Local (Standalone) Mode(3/7)hadoop-env.shJAVA_HOME(export JAVA_HOME=/usr/jdk1.6.0_25) IPv6IPv6hadoop-env.shexport HADOOP_OPTS=-Djava.net.preferIPv4Stack=trueIPv4
# Command specific options appended to HADOOP_OPTS when specified.........export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"export JAVA_HOME=/usr/jdk1.6.0_25 JAVA_HOMEexport HADOOP_OPTS=-Djava.net.preferIPv4Stack=true IPv41313Local (Standalone) Mode(4/7)Hadoop Local (Standalone) ModeHadoop
conf/hadoop-env.shJAVA_HOME/hadoop# bin/hadoopUsage: hadoop [--config confdir] COMMANDwhere COMMAND is one of: namenode -format format the DFS filesystem.........or CLASSNAME run the class named CLASSNAMEMost commands print help when invoked w/o parameters.1414Local (Standalone) Mode(5/7)Hadoophadoop-0.20.2-examples.jargrepinputconf/xmlinput/hadoop# mkdir input /hadoop# cp conf/*.xml input1515Local (Standalone) Mode(6/7)hadoop-0.20.2-examples.jargrepconfig
/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'/hadoop# cat output/*13 configuration4 configuration.xsl1 configure1616Local (Standalone) Mode(7/7)hadoop-0.20.2-examples.jar grepoutputoutput/hadoop# rm -rf output1717Pseudo-Distributed Mode(1/9)Pseudo-Distributed ModeLocal (Standalone) Modeconfcore-site.xmlhdfs-site.xmlmapred-site.xmlcore-site.xml
/hadoop# vi conf/core-site.xml1818Pseudo-Distributed Mode(2/9)core-site.xml
fs.default.name hdfs://localhost:9000
1919Pseudo-Distributed Mode(3/9)hdfs-site.xml/hadoop# vi conf/hdfs-site.xmlhdfs-site.xml
dfs.replication 1
2020Pseudo-Distributed Mode(4/9)mapred-site.xml/hadoop# vi conf/mapred-site.xmlmapred-site.xml
mapred.job.tracker localhost:9001
2121Pseudo-Distributed Mode(5/9)SSHHadoopsshssh(yesEnter)~# ssh localhostThe authenticity of host 'localhost (127.0.0.1)' can't be established.RSA key fingerprint is Are you sure you want to continue connecting (yes/no)? yes yesWarning: Permanently added 'localhost' (RSA) to the list of known hosts.root@localhost's password: 2222Pseudo-Distributed Mode(6/9)Ctrl + C~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keysexit~# ssh localhostLast login: Mon May 16 10:04:39 2011 from localhost~# exit2323Pseudo-Distributed Mode(7/9)HadoopHadoopbin/hadoop namenode -formatHDFS/hadoop# bin/hadoop namenode -format11/05/16 10:20:27 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNode.........11/05/16 10:20:28 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1************************************************************/2424Pseudo-Distributed Mode(8/9)bin/start-all.shJobtrackerTasktracker/hadoop# bin/start-all.shstarting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.outlocalhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host01.outlocalhost: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Host01.outstarting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.outlocalhost: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host01.out2525Pseudo-Distributed Mode(9/9)hadoop-0.20.2-examples.jar grepbin/hadoop fs -putconfHDFSinputhadoop-0.20.2-examples.jar grep/hadoop# bin/hadoop fs -put conf input/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'2626Fully-Distributed Mode(1/14)HadoopMasterSlaveJavasshIPHost01Namenode + Jobtracker192.168.1.1Host02Datanode + Tasktracker192.168.1.22727Fully-Distributed Mode(2/14)stop-all.shHadoopHadoop.ssh/hadoop# /opt/hadoop/bin/stop-all.sh~# rm -rf /opt/hadoop~# rm -rf ~/.ssh~# rm -rf /tmp/*HadoopHost01Host01Hadoop 0.20.2Hadoop/opt/hadoop~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz~# tar zxvf hadoop-0.20.2.tar.gz~# mv hadoop-0.20.2 /opt/hadoop2828Fully-Distributed Mode(3/14)bin/hadoop-env.shHadoop/opt/hadoopvibin/hadoop-env.sh
~# cd /opt/hadoop//hadoop# vi conf/hadoop-env.shhadoop-env.shJAVA_HOME# Command specific options appended to HADOOP_OPTS when specifiedexport HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"export JAVA_HOME=/usr/jdk1.6.0_25 JAVA_HOME2929Fully-Distributed Mode(4/14)conf/core-site.xmlvi/hadoop# vi conf/core-site.xmlconf/core-site.xml3030Fully-Distributed Mode(5/14)
fs.default.name hdfs://Host01:9000 hadoop.tmp.dir /var/hadoop/hadoop-${user.name}
3131Fully-Distributed Mode(6/14)conf/hdfs-site.xmlviconf/hdfs-site.xml/hadoop# vi conf/hdfs-site.xmlconf/hdfs-site.xml
dfs.replication 2
3232Fully-Distributed Mode(7/14)conf/mapred-site.xmlviconf/mapred-site.xml/hadoop# vi conf/mapred-site.xmlconf/mapred-site.xml
mapred.job.tracker Host01:9001
3333Fully-Distributed Mode(8/14)conf/mastersviconf/masters/hadoop# vi conf/mapred-site.xmlconf/slavesviconf/slavesconf/slaveslocalhostHost02/hadoop# vi conf/mapred-site.xml3434Fully-Distributed Mode(9/14)scp~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys~# scp -r ~/.ssh Host02:~/~# ssh Host02 Host01Host02~# ssh Host01 Host02Host01~# exit Host01~# exit Host02 (Host01)3535Fully-Distributed Mode(10/14)HadoopHadoopSlave(NFS)Host01HadoopHost02~# scp -r /opt/hadoop Host02:/opt/HadoopHDFS/hadoop# bin/hadoop namenode -format3636Fully-Distributed Mode(11/14)11/05/16 21:52:13 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = Host01/127.0.0.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/11/05/16 21:52:13 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel......11/05/16 21:52:13 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at Host01/127.0.0.1************************************************************/3737Fully-Distributed Mode(12/14)Hadoop/hadoop# bin/start-all.shstarting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.outHost02: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host02.outstarting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.outHost02: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host02.out3838Fully-Distributed Mode(13/14)Fully-Distributed Mode bin/hadoop dfsadmin -reportHDFSHDFS/hadoop# bin/hadoop dfsadmin -reportConfigured Capacity: 9231007744 (8.6 GB)......Blocks with corrupt replicas: 0Missing blocks: 0-------------------------------------------------Datanodes available: 1 (1 total, 0 dead)......DFS Remaining%: 41.88%Last contact: Mon May 16 22:15:03 CST 20113939Fully-Distributed Mode(14/14)hadoop-0.20.2-examples.jar grepHDFSinputHadoopconf/hadoop-0.20.2-examples.jar grep/hadoop# bin/hadoop fs -mkdir input/hadoop# bin/hadoop fs -put conf/* input//hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'/hadoop# bin/hadoop fs -cat output/part-0000019 configuration6 configuration.xsl1 configure4040HadoopHBaseHadoopHBase
4141HBase(1/9)HBaseHadoopHBase0.20ZooKeeperNTP4242HBase(2/9)HBaseHBase(http://hbase.apache.org/)HBasehbase-0.90.2.tar.gz/opt/hbaseHBase~# wget http://apache.cs.pu.edu.tw//hbase/hbase-0.90.2/hbase-0.90.2.tar.gz/hadoop~# tar zxvf hbase-0.90.2.tar.gz~# mv hbase-0.90.2 /opt/hbase~# cd /opt/hbase/viconf/hbase-env.sh/hbase# vi conf/hbase-env.sh4343HBase(3/9)conf/hbase-env.sh
export JAVA_HOME=/usr/jdk1.6.0_25/export HBASE_MANAGES_ZK=trueexport HBASE_LOG_DIR=/tmp/hadoop/hbase-logsexport HBASE_PID_DIR=/tmp/hadoop/hbase-pidsconf/hbase-site.xmlHBase/hbase# vi conf/hbase-site.xml4444HBase(4/9)conf/hbase-site.xml
hbase.rootdir hdfs://Host01:9000/hbase hbase.cluster.distributed true
hbase.zookeeper.property.clientPort 2222 hbase.zookeeper.quorum Host01,Host02 hbase.zookeeper.property.dataDir /tmp/hadoop/hbase-data
4545HBase(5/9) hbase.tmp.dir /var/hadoop/hbase-${user.name} hbase.master Host01:60000
4646HBase(6/9)viconf/regionservers/hbase# vi conf/regionserversSlaveslavesSlaveHost02 HadoopHBaseconf//hbase# cp /opt/hadoop/conf/core-site.xml conf//hbase# cp /opt/hadoop/conf/mapred-site.xml conf//hbase# cp /opt/hadoop/conf/hdfs-site.xml conf/4747HBase(7/9)hbaselib/hadoop-core-0.20-append-r1056497.jarhadoophadoop-0.20.2-core.jarhbaselib/hbase# rm lib/hadoop-core-0.20-append-r1056497.jar/hbase# cp /opt/hadoop/hadoop-0.20.2-core.jar ./lib/hbaseSlave/hbase# scp -r /opt/hbase Host02:/opt/hbase4848HBase(8/9)HBase
/hbase# bin/start-hbase.shHost02: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host02.outHost01: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host01.outstarting master, logging to /tmp/hadoop/hbase-logs/hbase-root-master-Host01.outHost02: starting regionserver, logging to /tmp/hadoop/hbase-logs/hbase-root-regionserver-Host02.out4949HBase(9/9)hbase shellHBaselistHBsae
/hbase# bin/hbase shellHBase Shell; enter 'help' for list of supported commands.Type "exit" to leave the HBase ShellVersion 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011
hbase(main):001:0> list listEnterTABLE0 row(s) in 0.3950 seconds
hbase(main):002:0>5050HadoopHBaseHadoopHBase
5151Hadoop(1/7)bin/start-all.shHadoopbin/stop-all.shHadoopbin/hadoop versionHadoopbin/hadoop dfsadmin reportHDFSbin/hadoop namenode formatbin/hadoop fs -ls HDFSbin/hadoop fs -ls /user/root/inputbin/hadoop fs -mkdir /user/root/tmpbin/hadoop fs -put conf/* /user/root/tmpHDFSbin/hadoop fs -cat /user/root/tmp/core-site.xmlHDFSbin/hadoop fs -get /user/root/tmp/core-site.xml /opt/hadoop/HDFSbin/hadoop fs -rm /user/root/tmp/core-site.xmlHDFSbin/hadoop fs -rmr /user/root/tmpHDFS5252Hadoop(2/7)HDFSbin/hadoop fs/hadoop# bin/hadoop fsUsage: java FsShell [-ls ] [-lsr ] [-du ].........-files specify comma separated files to be copied to the map reduce cluster-libjars specify comma separated jar files to include in the classpath.-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]5353Hadoop(3/7)MapReduce JobHadoopMapReduce JobjarHadoopbin/hadoop jar [MapReduce Job jar] [Job] [Job]Hadoophadoop-0.20.2-examples.jargrepwordcountpi/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar5454Hadoop(4/7)Hadoopjarhadoop-0.20.2-core.jarhadoop commonhdfsmapreducehadoop-0.20.2-test.jar Hadoophadoop-0.20.2-ant.jarAnt5555Hadoop(5/7)bin/hadoop jobJob/hadoop# bin/hadoop job -list all5 jobs submittedStates are: Running : 1 Succeded : 2 Failed : 3 Prep : 4JobId State StartTime UserName Priority SchedulingInfojob_201105162211_0001 2 1305555169692 root NORMAL NAjob_201105162211_0002 2 1305555869142 root NORMAL NAjob_201105162211_0003 2 1305555912626 root NORMAL NAjob_201105162211_0004 2 1305633307809 root NORMAL NAjob_201105162211_0005 2 1305633347357 root NORMAL NA5656Hadoop(6/7)Jobbin/hadoop job -status [JobID]/hadoop# bin/hadoop job -status job_201105162211_0001Jobbin/hadoop job -history []bin/hadoop job -history /user/root/output
Hadoop job: job_201105162211_0007=====================================Job tracker host name: Host01job tracker start time: Mon May 16 22:11:01 CST 2011User: rootJobName: grep-sort5757Hadoop(7/7)Jobbin/hadoop job/hadoop# bin/hadoop jobUsage: JobClient [-submit ] [-status ] [-counter ] [-kill ] [-set-priority ]. Valid values for priorities are: VERY_HIGH HIGH NORMAL LOW VERY_LOW.........
The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]5858HadoopHBaseHadoopHBase
5959HBase(1/10)HBasenamestudent IDcourse : mathcourse : historyJohn18085Adam27590bin/hbase hsellHBase/hbase# bin/hbase shellHBase Shell; enter 'help' for list of supported commands.Type "exit" to leave the HBase ShellVersion 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011
hbase(main):001:0>6060HBase(2/10)scoresstudentidcoursecolumn> create [], [column1], [column2],
hbase(main):001:0> create 'scores', 'studentid', 'course'0 row(s) in 1.8970 secondslistHBasehbase(main):002:0> listTABLEscores1 row(s) in 0.0170 seconds6161HBase(3/10)> describe []hbase(main):003:0> describe 'scores'DESCRIPTION ENABLED BLOCKCACHE => 'true'}]}1 row(s) in 0.0260 secondsscoresJohnstudentidcolumn1> put [], [row], [column], []hbase(main):004:0> put 'scores', 'John', 'studentid:', '1'0 row(s) in 0.0600 seconds6262HBase(4/10)Johncourse:mathcolumn80hbase(main):005:0> put 'scores', 'John', 'course:math', '80'0 row(s) in 0.0100 secondsJohncourse:historycolumn85hbase(main):006:0> put 'scores', 'John', 'course:history', '85'0 row(s) in 0.0080 seconds6363HBase(5/10)Adamstudentid2course:math75course:history90 hbase(main):007:0> put 'scores', 'Adam', 'studentid:', '2'0 row(s) in 0.0130 seconds
hbase(main):008:0> put 'scores', 'Adam', 'course:math', '75'0 row(s) in 0.0100 seconds
hbase(main):009:0> put 'scores', 'Adam', 'course:history', '90'0 row(s) in 0.0080 seconds6464HBase(6/10)scores> scan []
hbase(main):011:0> scan 'scores'ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=12 row(s) in 0.0420 seconds6565HBase(7/10)scoresJohn> get [], [row]
hbase(main):010:0> get 'scores', 'John'COLUMN CELL course:history timestamp=1305704046378, value=85 course:math timestamp=1305703949662, value=80 studentid: timestamp=1305703742527, value=13 row(s) in 0.0440 seconds6666HBase(8/10)scorescoursescolumn family> scan [], {COLUMNS => [column family]}
hbase(main):011:0> scan 'scores', {COLUMNS => 'course:'}ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=802 row(s) in 0.0250 seconds6767HBase(9/10)scorescolumn> scan [], {COLUMNS => [[column1], [column2],]}
hbase(main):012:0> scan 'scores', {COLUMNS => ['studentid','course:']}ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=12 row(s) in 0.0290 seconds6868HBase(10/10)disabledrophbase(main):003:0> disable 'scores'0 row(s) in 2.1510 seconds
hbase(main):004:0> drop 'scores'0 row(s) in 1.7780 seconds6969HadoopHBaseHadoopHBase
7070(1/2)HadoopHadoopHDFSMapReduceJobtracker(Mozilla Firefox)http://localhost:50070http://localhost:50030Jobtracker
7171(2/2)HBaseMasterMasterhttp://localhost:60010/Region ServerSlavehttp://localhost:60030/ ZooKeeperMasterhttp://localhost:60010/zk.jsp7272