25
東東東東東東東 Hadoop 2.2.0 Multi-node Installation on Ubuntu 東東東 G02357004 2014/1/3

Hadoop 2.2.0 Multi-node cluster Installation on Ubuntu

  • Upload
    -

  • View
    5.186

  • Download
    1

Embed Size (px)

Citation preview

東海大學資工系

Hadoop 2.2.0

Multi-node Installation on Ubuntu

康志強 G023570042014/1/3

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

一、前言.........................................................................................................................................2

二、安裝環境.................................................................................................................................3

三、安裝步驟.................................................................................................................................4

1. 安裝環境說明.................................................................................................................4

2. 設定.................................................................................................................................5

3. 增加三台機器的 ip 和 hostname 的對應...................................................................6

4. 打通 cloud001 到 cloud002、cloud003 的 SSH 無密碼登入.............................7

5. 安裝 JDK..........................................................................................................................9

6. 關閉防火牆.....................................................................................................................9

7. Hadoop 2.2 安裝.......................................................................................................10

8. Hadoop 2.2 啟動.......................................................................................................16

五、本文的引用網址:..................................................................................................................21

1

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

一、 前言

2

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

二、安裝環境

CPU Intel Core i7-4470 3.40GHz

RAM 8 GB * 2

HD 128 SSD + 1TB HD

Network 100M/1000M bps Ethernet

OS Windows 7_64-bit

VM Platform VMware® Workstation 10.0.0 build-1295980

VM Guest OS ubuntu-12.04.3-desktop-amd64

VMRAM 2.0 GB

VM HD 40 GB

3

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

三、安裝步驟1. 安裝環境說明

這裡我們建構一個由三台機器組成的叢集

Hostname User/Password Cluster 角色 OS

cloud001 hduser/adm123 Name nodeSecondary Name nodeResource manager

ubuntu-12.04.3 64 bits

cloud002 hduser/adm123 Data node Node manager

ubuntu-12.04.3 64 bits

cloud003 hduser/adm123 Data node Node manager

ubuntu-12.04.3 64 bits

4

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

2. 設定

(1)修改 hostname , 改成 cloud001

vim /etc/hostname

(2)修改 hduser 權限 :

vim /etc/sudoers

(3)系统升级到最新

sudo apt-get update

sudo apt-get upgrade

基本上先把 cloud001 裝好,再 clone 成 002,003 後,改 hotname 就可以了

5

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

3. 增加三台機器的 ip 和 hostname 的對應

hduser@cloud001:~$ vim /etc/hosts

6

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

4. 打通 cloud001 到 cloud002、cloud003 的 SSH 無密碼登入

(1) 安裝 SSHsudo apt-get install ssh

(2) 設置 local 無密碼登陸,在登入目錄下執行下面指令建立 .ssh 目錄,進入hduser@ubuntu:~$ mkdir .sshhduser@ubuntu:~$ cd .ssh

產生金鑰(一直 Enter 就可以)hduser@ubuntu:~/.ssh$ ssh-keygen -t rsa

把 id_rsa.pub 追加到授權的 key 裡面去hduser@ubuntu:~/.ssh$ cat id_rsa.pub >> authorized_keys

重啟 SSH 服務hduser@ubuntu:~/.ssh$ service ssh restart

測試

7

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

ssh localhos

8

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

5. 安裝 JDK

下載 jdk-7u45-linux-x64.tar.gz,copy 到 /usr/lib/jvm, 執行 chmod

hduser@ubuntu:/usr/lib/jvm$ chmod 755 jdk-7u45-linux-x64.gz

安裝hduser@ubuntu:/usr/lib/jvm$ sudo tar zxvf ./jdk-7u45-linux-x64.gz -C /usr/lib/jvm

環境變數hduser@ubuntu:/usr/lib/jvm$ vim ~/.bashrc

最後面增加export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_45export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport PATH=${JAVA_HOME}/bin:$PATH

輸入下面的命令來使之生效hduser@ubuntu:/usr/lib/jvm$ source ~/.bashrc

測試hduser@ubuntu:/usr/lib/jvm$ java -versionjava version "1.7.0_45"Java(TM) SE Runtime Environment (build 1.7.0_45-b18)Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)hduser@ubuntu:/usr/lib/jvm$

6. 關閉防火牆

hduser@ubuntu:/usr/lib/jvm$ sudo ufw disableFirewall stopped and disabled on system startuphduser@ubuntu:/usr/lib/jvm$

重啟生效

9

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

7. Hadoop 2.2 安裝

(1) 下載檔案 hadoop-2.2.tar.gz, 解壓到/home/hduser 路径下hduser@ubuntu:~$ chmod 755 hadoop-2.2.0.tar.gzhduser@ubuntu:~$ tar zxvf hadoop-2.2.0.tar.gz

(2) hadoop 配置配置之前,需要在 cloud001 新增以下資料夾/home/hduser/dfs/name

/home/hduser/dfs/data

/home/hduser/temp

修改相關設定擋案內容,清單如下~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh

~/hadoop-2.2.0/etc/hadoop/yarn-env.sh

~/hadoop-2.2.0/etc/hadoop/slaves

~/hadoop-2.2.0/etc/hadoop/core-site.xml

~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

~/hadoop-2.2.0/etc/hadoop/mapred-site.xml (不存在,直接 rename mapred-site.xml.temp)

~/hadoop-2.2.0/etc/hadoop/yarn-site.xml

修改 hadoop-env.sh修改 JAVA_HOME 值(export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_45)

修改 yarn-env.sh修改 JAVA_HOME 值(exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_45)

修改 slaves (這個文件裡面 KEEP 所有 slave 節點)

寫入以下內容:

cloud002

cloud003

修改 core-site.xml

10

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

<configuration><property>

<name>fs.defaultFS</name>

<value>hdfs://cloud001:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/home/hduser/temp</value>

<description>Abase for other temporary directories.</description>

</property>

<property>

<name>hadoop.proxyuser.hduser.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.hduser.groups</name>

<value>*</value>

</property></configuration>

修改 hdfs-site.xml<configuration>

11

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>cloud001:9001</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hduser/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hduser/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

</configuration>

修改 mapred-site.xml<configuration>

12

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>cloud001:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>cloud001:19888</value>

</property>

</configuration>

修改 yarn-site.xml<configuration>

<!-- Site specific YARN configuration properties --><property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

13

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

<property>

<name>yarn.resourcemanager.address</name>

<value>cloud001:8040</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>cloud001:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>cloud001:8025</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>cloud001:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>cloud001:8088</value></property>

</configuration>

設定環境變數hduser@cloud001:~$ vim ~/.bashrc

最後面貼上export HADOOP_HOME=/home/hduser/hadoop-2.2.0export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

14

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

(3)clone image cloud001 to cloud002 & cloud003 , 然後修改 hostname

15

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

8. Hadoop 2.2 啟動

(1) 進入安裝目錄: cd ~/hadoop-2.2.0/,格式化 namenode./bin/hdfs namenode –format

16

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

(2) 啟動 hdfs ./sbin/start-dfs.sh

此時在 001 上面運行的進程有:namenode secondarynamenode002 和 003 上面運行的進程有:datanode

(3) 啟動 yarn ./sbin/start-yarn.sh

此時在 001 上面運行的進程有:namenode secondarynamenoderesourcemanager

002 和 003 上面運行的進程有:datanode nodemanaget

17

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

18

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

(4) 查看叢集狀態./bin/hdfs dfsadmin –report

(5) 查看文件組成./bin/hdfs fsck / -files –blocks

19

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

(6) 查看 HDFS

(7) 查看 RM

20

Hadoop 2.2.0 (multi-node) Installation on Ubuntu

五、本文的引用網址:

1. http://blog.csdn.net/licongcong_0224/article/details/12972889

2. http://blog.csdn.net/focusheart/article/details/14005893 (單機板)

3. http://dawndiy.com/archives/155/ (Linux 下安装配置 JDK7)

4. http://www.ithome.com.tw/itadm/article.php?c=73978&s=1 (Hadoop 簡介)

5. http://www.runpc.com.tw/content/cloud_content.aspx?id=105318 (Hadoop 簡介)

21