簡単にApache Hadoopのインストール

Enrique Davila Big Data Instructor enrique.davila@gm

ail.com

1

Installing Hadoop on Ubuntu 16INSTALL OPEN JDK

10/24/2016

Enrique Davila Big Data Instructor [email protected]

2Install Java

Do I have Java? Type on terminal: java -version If I see the output below, then I don’t have java installed, follow

instructions next slide

10/24/2016


3Install Java

Type: sudo apt-get install openjdk-8-jdk Type Y to continue the installation process (it will take a while to

complete the installation)

10/24/2016


4Do I have java?

To confirm java ins installed on my Ubuntu system type: java –version You will see output below

10/24/2016


5Install Openssh

Is mandatory to install openssh server:

sudo apt-get install openssh-server If ssh server is installed then generate keys, run command below:

ssh-keygen -t rsa Enter file, press enter Enter passphrase, press enter Enter same passphrase again press enter

10/24/2016


6SSH Keys

Now we will copy the key to the user and host, in my case my user is hadoop and host is hadoopdev

ssh-copy-id hadoop@hadoopdev

10/24/2016


ail.com

7

Download and Install HadoopDOWNLOAD HADOOP FROM APACHE WEB PAGE

10/24/2016


8Download Apache Hadoop

Type in the terminal the following command to create new folder within my home linux folder, in this case/home/Hadoop/:

mkdir hadoop_install Then go into this new folder: cd hadoop_install And copy the command below: wget http://

www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

10/24/2016

http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz




9Download Apache Hadoop

You will see windows reflecting the progress of the download

10/24/2016


10Unzip Hadoop folder

Once download is complete Type the following command: tar -xvf hadoop-2.7.3.tar.gz Now you will see 2 folders, the new directory is called hadoop-2.7.3:

10/24/2016


11Setup bashrc

This is the java location (very important for next steps):

Edit bashrc Type:

Sudo gedit ~/.bashrc

10/24/2016


12Setup ~/.bashrc

Add this lines to the .bashrc Pls note on previous slide the java path is displayed, need to point

bashrc to the actual java path #HADOOP VARIABLES START export JAVA_HOME=/usr/lib/jvm/ java-1.8.0-openjdk-amd64 export HADOOP_INSTALL=/home/hadoop/hadoop_install export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin

10/24/2016


13Testing hadoop installation

Type the following command to refresh ~/.bashrc changes (no need to restart)source ~/.basrch

Type the command below (if at this point you see an output like this you’re doing well)hadoop version

10/24/2016


ail.com

14

Setup single nodeINSTALL OPEN JDK

10/24/2016


15Point your java to hadoop conf file

Go to the path: /home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop Edit the file: sudo gedit Hadoop-env.sh

10/24/2016


16Modifying hadoop-env.sh

Modify the value for Java Home in the file: hadoop-env.sh

10/24/2016


17Modify core-site.xml

Create a folder called tmp in /home/hadoop/hadoop_install Add the following text to the core-site.xml , file is on the path:

/home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop<configuration><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop_install/tmp</value> <description>A base for other temporary directories.</description> </property><property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system.</description> </property> </configuration>

10/24/2016


18Modify mapred-site.xml

By default there is a file called: mapred-site.xml.template, needs to be renamed to mapred-site.xml and then add the code below:

File is on path: /home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop<configuration><property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. </description> </property>

10/24/2016


19Modify hdfs-site.xml

We need to créate 2 new folders which will contain name node and data node:

I placed these 2 folders on: /home/hadoop/hadoop_install/

10/24/2016


20Modify hdfs-site.xml

Add the code below in the file hdfs-site.xml, the paths for namnode and datanode are the 2 new folders you just created on previous slide.

<configuration>

<property> <name>dfs.replication</name> <value>1</value> </property>

<property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hadoop_install/namenode</value> </property>

<property> <name>dfs.data.node.name.dir</name> <value>file:///home/hadoop/hadoop_install/datanode</value> </property>

</configuration>

#hdfs-site.xml is located on the path: /home/hadoop/hadoop_install/hadoop-2.7.3/etc/hadoop

10/24/2016


21Format the namenode

Run the following command: hadoop namenode –format

10/24/2016


22Format the namenode part 2

If everything is ok you will see message below:

10/24/2016


23Running Hadoop Single node

Run the command: startall.sh Then execute the command:

jps, you will see the following output

10/24/2016


24Stop Cluster

We run stop-all.sh

10/24/2016


25Web Interface: localhost:50070

In the browser go to: localhost:50070

10/24/2016


26Applies for:

This installation runs under: Ubuntu 16 Hadoop 2.7.3 Virtual Machine:

2 Processors 2 Gb Ram 2 Network Interface, 1 as Bridge, 2nd as Nat

10/24/2016


27You need help?

Contact name: Enrique Davila Gutierrez [email protected]

10/24/2016

Data & Analytics

簡単にApache Hadoopのインストール