Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize...

Apache HadoopDaniel Lust , Anthony Taliercio

What is Apache Hadoop?

Allows applications to utilize thousands of nodes while exchanging thousands of terabytes of data to complete a task

Supports distributed applications under a free license

Used by many popular companiesSuch as: Facebook, Twitter, Ebay, IBM, Apple, Microsoft, Hewlett-Packard, and many others…

Continued…Written in Java

Scales wellCan be used with thousands of nodes

Can be used with just a few nodes and inexpensive hardware

Your average Hadoop cluster will consist of two major parts

A single master node and multiple working nodes. The master node is made up of four parts: the Job Tracker, Task Tracker, NameNode, and DataNode.

A worker node, which is also known as a slave node, can either be a DataNode and TaskTracker or just one of the two.

Overview Of Hadoop

- Hadoop uses whats called an HDFSHadoop Distributed File System

HDFS takes files and splits them across the network redundantly in a cluster

The redundancy to eliminate possible data loss

MapReduce

MapReduceSoftware wrote by google to process massive amounts of unstructured data in a parallel process across a distributed cluster of processors

MapReduce.

Offers a clean abstraction between data analysis tasks, organizing the jobs Issued by the HDFS, so no jobs are unnecessarily repeated.

- If one of them fail, a node may point to a different node to complete the task

Running Hadoop

First run of Hadoop on Master ComputerVarious processes are started including:

TaskTracker

JobTracker

DataNode

Secondary Node

NameNode It also makes a connection through SSH to other SLAVE computers to start a DataNode and TaskTracker

Running Hadoop

Used Hadoop to do a word count on six different books.

HDFS copied the books to different clusters, and ran a pre-written program to do a word count on the books.

Each node returned data, using the DataNode proccess to save its results.

When a node failed, it will issue the job to another node

Example Output of Job Processes

Word count Output

Tested on 1-3 Nodes

1 NODE: JOB COMPLETION 00:01:45

2 NODES: JOB COMPLETION

00:01:28

3 NODE : JOB COMPLETION

00:01:00

Conclusion

Our guide covered everything you need to get started with Apache Hadoop

Although, there are many problems you can see along the way

Troubleshooting was a large part of our project

Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize...

Documents

Apache hadoop-yarn chap06

Apache Tez : Accelerating Hadoop Data Processing

MapReduce & Apache Hadoop

Streamline Hadoop DevOps with Apache Ambari

Das Apache Hadoop Framework im industriellen Einsatz€¦ · Apache Hadoop (HDP, Spark, Hive, HBase) Maschinelles Lernen (TF, CNTK, Keras) Idee entstand durch Forschungsprojekt und

Administración de Apache Hadoop a través de Cloudera

Apache Hadoop€¦ · Apache Hadoop之管理者訓練課程 Cloudera Administrator Training for Apache Hadoop -Configuring, Deploying, and Maintaining a Hadoop Cluster 時數：28

Apache Hadoop

Apache Eagle: Secure Hadoop in Real Time

Hadoop Summit Tokyo Apache NiFi Crash Course

Apache Hadoop - A Deep Dive (Part 2 - MapReduce)

Apache hadoop q&a.pptx

Arquitetura do Framework Apache Hadoop 2.6

Cifrado CP-ABE para el ecosistema Apache Hadoop•Diseñando la gestión de las claves de cifrado y descifrado •Tanto a nivel de Apache Hadoop, • Apache Hadoop utiliza el algoritmo

Apache Hadoop: Introduzione all’architettura ed approcci applicativi

Apache Hadoop - Conceitos teóricos e práticos, evolução e ... · Apache Hadoop Apache Hadoop Conceitosteóricosepráticos,evolução enovaspossibilidades DanielCordeiro Departamento

Wprowadzenie do technologi Big Data i Apache Hadoop

Apache Ambari Overview -- Hadoop for Everyone

Andrey Pivovarov Oracle Data Management Platform Overview · Hadoop •Apache Hadoop - распределенная масштабируемая вычислительная ... Single,

Apache Hadoop - Introdução