Hadoop completereference

Embed Size (px)

Text of Hadoop completereference

  • To change update alternatives add entry in java.dpkg-tmp file.to check whether the java path is changed check below command:arun@ubuntu:~$update-alternatives config javaarun@ubuntu:~$update-alternatives config javacarun@ubuntu:~$update-alternatives config javaws

  • see highlighted text it is not allowing to change past entry to change it change /etc/alternatives/java.pkg-tmp file

  • To remove a entry

  • To remove all entry see below alternatives config java return no alternatives for java

    to which java is in use

  • Correct the file ownership and the permissions of the executables:

    NOTE: use sudo if user in sudo list else use without sudo in below

    arun@ubuntu$chmod a+x /home/hdadmin/java/jdk1.7.0_79/bin/javaarun@ubuntu$chmod a+x /home/hdadmin/java/jdk1.7.0_79/bin/javacarun@ubuntu$chmod a+x /home/hdadmin/java/jdk1.7.0_79/bin/javawsarun@ubuntu$chown -R root:root /home/hdadmin/java/jdk1.7.0_79N.B.: Remember - Java JDK has many more executables that you can similarly install as above. java, javac, javaws are probably the most frequently required. This answer lists the other executables available.

    NOTE:if still showing error may be problem with compatibility check the linux and jdk compatibility.If linux is 64 then jdk should be 64. may be problem with the Architectureconfirm that I'm using a 64-bit architecture should install 64 bit jdk.

    To test whether working properly first followarun@ubuntu$export PATH=/home/hdadmin/java/jdk.17.0_79running this will return versionjava version jdk.17.0_79java (TM) SE Runtime Environment (build 1.7.0_79-b15)java HotSpot (TM) 64-Bit Server Vm(build 24.79-b02,mixed mode)

    also can check whether 64 or 32 from this java -version.


  • Setting Bashrc.sh Open bashrc from home and add the following lines exportarun@ubuntu$nano /etc/bash.bashrcHADOOP_HOME=/home/hdadmin/bigdata/hadoop/hadoop-2.5.0 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin minimal needed:

  • if any thing commented or ignored like above HADOOP_PREFIX and other entry is missing only java and JAVA_HOME,HADOOP_HOME is there,then above error will be thrown.so it is better to add all the below values.If it says couldn't find or load JAVA_HOME like above.error above could not find or load main class JAVA_HOME=.home.hadmn.java.jdk.1.7.0_79 because of this reason check whether by mistake use SET keyword any where. Linux wont support

    # Set Hadoop-related environment variablesexport HADOOP_PREFIX=/home/hdadmin/bigdata/hadoop/hadoop-2.5.0export HADOOP_HOME=/home/hdadmin/bigdata/hadoop/hadoop-2.5.0export HADOOP_MAPRED_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop# Native Pathexport HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"#Java pathexport JAVA_HOME='/home/hdadmin/Java/jdk1.7.0_79' //always at the bottom after hadoop entry# Add Hadoop bin/ directory to PATHexport PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

    Prerequisites:1.Installing Java v1.7

    2.Adding dedicated Hadoop system user.

    3.Configuring SSH access.

    4.Disabling IPv6.

    Before starting of installing any applications or softwares, please makes sure your list of packages from all

    repositories and PPAs is up to date or if not update them by using this command:

    sudo apt-get update

  • 1. Installing Java v1.7:

    For running Hadoop it requires Java v1. 7+

    a. Download Latest oracle Java Linux version of the oracle website by using this command

    wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz

    If it fails to download, please check with this given command which helps to avoid passing username and


    wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F

    %2Fwww.oracle.com" "https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-


    b. Unpack the compressed Java binaries, in the directory:

    sudo tar xvzf jdk-7u25-linux-x64.tar.gz

    c. Create a Java directory using mkdir under /user/local/ and change the directory to /usr/local/Java by using this


    mkdir -R /usr/local/Java

    cd /usr/local/Java


  • d. Copy the Oracle Java binaries into the /usr/local/Java directory.

    sudo cp -r jdk-1.7.0_45 /usr/local/java

    e. Edit the system PATH file /etc/profile and add the following system variables to your system path

    sudo nano /etc/profile or sudo gedit /etc/profile

    f. Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your

    /etc/profile file:



    export JAVA_HOME

    export PATH

    g. Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. This will tell the system that the

    new Oracle Java version is available for use.

    sudo update-alternatives --install "/usr/bin/javac" "javac"

    "/usr/local/java/jdk1.7.0_45/bin/javac" 1

    sudo update-alternatvie --set javac /usr/local/Java/jdk1.7.0_45/bin/javac


  • This command notifies the system that Oracle Java JDK is available for use

    h. Reload your system wide PATH /etc/profile by typing the following command:

    . /etc/profile

    Test to see if Oracle Java was installed correctly on your system.

    Java -version

    2. Adding dedicated Hadoop system user.

    We will use a dedicated Hadoop user account for running Hadoop. While thats not required but it is recommended,

    because it helps to separate the Hadoop installation from other software applications and user accounts running on

    the same machine.

    a. Adding group:

    sudo addgroup Hadoop

    b. Creating a user and adding the user to a group:

    sudo adduser ingroup Hadoop hduser

    It will ask to provide the new UNIX password and Information as shown in below image.


  • 3. Configuring SSH access:

    The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and

    the secondary node) to start/stop them and also local machine if you want to use Hadoop with it. For our single-

    node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in

    the previous section.

    Before this step you have to make sure that SSH is up and running on your machine and configured it to allow SSH

    public key authentication.

    Generating an SSH key for the hduser user.

    a. Login as hduser with sudo

    b. Run this Key generation command:

    ssh-keyegen -t rsa -P ""


  • c. It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key

    at /home/hduser/ .ssh

    d. Enable SSH access to your local machine with this newly created key.

    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

    e. The final step is to test the SSH setup by connecting to your local machine with the hduser user.

    ssh hduser@localhost

    This will add localhost permanently to the list of known hosts


  • 4. Disabling IPv6.

    We need to disable IPv6 because Ubuntu is using IP for different Hadoop configurations. You will need to

    run the following commands using a root account:

    sudo gedit /etc/sysctl.conf

    Add the following lines to the end of the file and reboot the machine, to update the configurations correctly.

    #disable ipv6

    net.ipv6.conf.all.disable_ipv6 = 1

    net.ipv6.conf.default.disable_ipv6 = 1

    net.ipv6.conf.lo.disable_ipv6 = 1


  • Hadoop Installation:Go to Apache Downloadsand download Hadoop version 2.2.0 (prefer to download any stable versions)

    i. Run this following command to download Hadoop version 2.2.0

    wget http://apache.mirrors.pair.com/hadoop/common/stable2/hadoop-2.2..tar.gz

    ii. Unpack the compressed hadoop file by using this command:

    tar xvzf hadoop-2.2.0.tar.gz

    iii. move hadoop-2.2.0 to hadoop directory by using give command

    mv hadoop-2.2.0 hadoop

    iv. Move hadoop package of your choice, I picked /usr/local