最短路径系列之一从零开始学习Hadoop

Hadoop

csdnhttp://blog.csdn.net/lizhe_dashujuhttp://www.douban.com/people/79049984/http://blog.sina.com.cn/u/2436144507

.................................................................................................................................................................................................5 1 Hadoop..................................................................................................................................................................7

1. ...............................................................................................................................................................................72. Hadoop......................................................................................................................................................................73. Hadoop..........................................................................................................................................................................74. Java JDK.......................................................................................................................................................................85. hadoop..........................................................................................................................................................................96. rsync ssh.................................................................................................................................................................107. hadoop........................................................................................................................................................................108. hadoop.........................................................................................................................................................................119. .....................................................................................................................................................................................11

2 MapReduce................................................................................................................................................121. Hadoop....................................................................................................................................................................12

1.1 Google.....................................................................................................................................121.2 MapReduceHDFS.........................................................................................................................131.3 API.......................................................................................................................................................................13

2. .............................................................................................................................................................................133. MapReduce............................................................................................................................................14

3.1 .........................................................................................................................................................143.2 TokenizerMapper .java.......................................................................................................................143.3 IntSumReducer.java............................................................................................................................163.4 WordCount.java..................................................................................................................................183.5 .............................................................................................................................................................................193.6 ..............................................................................................................................................................................203.7 ..............................................................................................................................................................................203.8 ......................................................................................................................................................................20

4. MapReduce.....................................................................................................................................................204.1 .........................................................................................................................................................214.2 WordCount.java..................................................................................................................................214.3 ..............................................................................................................................................................................224.4 ..............................................................................................................................................................................234.5 ..............................................................................................................................................................................234.6 ......................................................................................................................................................................23

3 HDFS..................................................................................................................................................241. .................................................................................................................................................................242. Hadoop...........................................................................................................................................................253. HDFS..................................................................................................................................................25

3.1 .........................................................................................................................................................253.2 FileCopy.java......................................................................................................................................253.3 ..............................................................................................................................................................................26

3.4..............................................................................................................................................................................263.5 ..............................................................................................................................................................................273.6 ......................................................................................................................................................................27

4. HDFS.............................................................................................................................................................274.1 .........................................................................................................................................................274.2 FileDelete.java....................................................................................................................................274.3 ..............................................................................................................................................................................284.4 ..............................................................................................................................................................................284.5 ..............................................................................................................................................................................284.6 ......................................................................................................................................................................29

5. HDFS.............................................................................................................................................................295.1 .........................................................................................................................................................295.2 FileCat.java.........................................................................................................................................295.3 ..............................................................................................................................................................................305.4 ..............................................................................................................................................................................305.5 ..............................................................................................................................................................................31

6. .........................................................................................................................................................316.1 .........................................................................................................................................................316.2 FileInfo.java........................................................................................................................................316.3 ..............................................................................................................................................................................326.4 ..............................................................................................................................................................................326.5 ..............................................................................................................................................................................33

7. .........................................................................................................................................................337.1 .........................................................................................................................................................337.2 FileList.java........................................................................................................................................347.3 ..............................................................................................................................................................................347.4 ..............................................................................................................................................................................347.5 ..............................................................................................................................................................................35

4 ..............................................................................................................................................................................361. .....................................................................................................................................................................362. Hadoop.......................................................................................................................................................363. IntWritable......................................................................................................................................................37

3.1 .........................................................................................................................................................373.2 IntSer.jav.............................................................................................................................................373.3 ..............................................................................................................................................................................393.4 ..............................................................................................................................................................................393.5 ..............................................................................................................................................................................39

5 ..................................................................................................................................................................................401. .........................................................................................................................................................................402. .........................................................................................................................................................40

2.1 .........................................................................................................................................................402.2 CprsF2F.java .......................................................................................................................................412.3 ..............................................................................................................................................................................422.4 ..............................................................................................................................................................................42

2.5 ..............................................................................................................................................................................423. .................................................................................................................................................42

3.1 .........................................................................................................................................................423.2 CprsIn2F.java ......................................................................................................................................423.3 ..............................................................................................................................................................................433.4 ..............................................................................................................................................................................433.5 ..............................................................................................................................................................................44

4 ......................................................................................................................................................444.1 .........................................................................................................................................................444.2 DcprsF2F.java .....................................................................................................................................444.3 ..............................................................................................................................................................................454.4 ..............................................................................................................................................................................464.5 ..............................................................................................................................................................................46

6 MapReduce................................................................................................................................................471. InputFormat....................................................................................................................................................................472. OutputFormat.................................................................................................................................................................50

7 Hadoop MapReduce..................................................................................................................................51 8 .............................................................................................................................................................54

1. Pi.....................................................................................................................................................................542. API PiMapReduce...........................................................................................................................543. API PiMapReduce.......................................................................................................................61

3.1 NewPiEst.java.....................................................................................................................................613.2 ..................................................................................................................................................................66

.......................................................................................................................................................................................68...............................................................................................................................................................................................69.................................................................................................................................................................69.............................................................................................................................................................69.............................................................................................................................................................70

Hadoop Java

Hadoop Hadoop HadoopAPI

Linux Java LinuxLinux Java Linux JavaHelloWorld

HadoopHadoop Java Streaming PythonC++Ruby Hadoop Java StreamingHadoop

Hadoop HadoopHadoopHadoop

Hadoop1898 1213 47 4

Hadoop Hadoop

C C C

API BugGoogle

Master Mascot

API

Hadoop Hadoop Hadoop

MMHadoopHadoop

Java LinuxHadoop Linux HadoopAPI

Hadoop Hadoop HadoopDemo Hadoop Hadoop Hadoop

1 Hadoop

HadoopHadoopHadoopHadoop stable

brian brian

1.

Ubuntu 10.04

Ubuntu LinuxHadoop

2. Hadoop

Hadoop stable 1.2.1

3. Hadoop

3.1 Hadoop http://www.apache.org/dyn/closer.cgi/hadoop/common/ 3.2 http://mirror.esocc.com/apache/hadoop/common/ 3.3 stable stable 1.2.1 http://mirror.esocc.com/apache/hadoop/common/stable/ linux 3.4 hadoop-1.2.1.tar.gz hadoop-1.2.1.tar.gz.mds stable Hadoop ~/setup/hadoop /home/brian/setup/hadoop

3.4.1 mkdir -p ~/setup/hadoop mkdir -p hadoop

man mkdir q brian

~ /home/brian john ~ /home/john 3.4.2 "cd ~/setup/hadoop" cd change directory 3.4.3 "wget http://mirror.esocc.com/apache/hadoop/common/stable/hadoop-1.2.1.tar.gz.mds" wget man wget 3.4.4 "wget http://mirror.esocc.com/apache/hadoop/common/stable/hadoop-1.2.1.tar.gz" 3.4.5 "md5sum hadoop-1.2.1.tar.gz" md5sum md5 md5md5 32md5md5md5md5hadoop-1.2.1.tar.gz"8D 79 04 80 56 17 C1 6C B2 27 D1 CC BF E9 38 5A hadoop-1.2.1.tar.gz" 32md5 3.4.6 "cat hadoop-1.2.1.tar.gz.mds" catcat catenate hadoop-1.2.1.tar.gz.mds "md5" md5sum

4. Java JDK

4.1 jdk 1.7 http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html CPU 32 Linux x86 CPU 64 Linux x64 64 uname -a Linux brian-i3 2.6.32-51-generic #113-Ubuntu SMP Wed Aug 21 19:46:35 UTC 2013 x86_64 GNU/Linux x86_64 64 Java SE Development Kit 7u40 Accept License Agreement License jdk 1.7 /home/brian/setup/java-jdk-1.7/ java-jdk-7u40-linux-i586.tar.gzjava jdk 40 4.2 "sudo su -" root man sudo rootjdk/usr/local root 4.3 "cd /usr/local/lib" 4.4 "tar -zxvf /home/brian/setup/java-jdk-1.7/java-jdk-7u40-linux-i586.tar.gz" tar linux man tar java-jdk-7u40-linux-i586.tar.gz"ls""jdk1.7.0_40" 4.5

4.5.1 gedit /etc/profile gedit linuxWindoes/etc/profile linux 4.5.2 /etc/profile

export JAVA_HOME=/usr/local/lib/jdk1.7.0_40 export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin

Linux/etc/profileexport JAVA_HOME, CLASSPATH, PATH 4.5.3 "chown root:root -R /usr/local/lib/jdk1.7.0_40" chown jdk1.7.0_40 root root -R jdk1.7.0_40 4.5.4 "chmod 755 -R /usr/local/lib/jdk1.7.0_40" chmod jdk1.7.0_40

-R 4.5.5 "source /etc/profile" /etc/profile source 4.4.2

. /etc/profile 4.5.6 "java -version" jdk

5. hadoop

5.1 "su brian" su jdk root brian 5.2 "mkdir -p ~/usr/hadoop" Hadoop 5.3 "cd ~/usr/hadoop" 5.4 "tar -xvzf ~/setup/hadoop/hadoop-1.2.1.tar.gz" ~/usr/hadoop/hadoop-1.2.1 hadoop 5.5 hadoop http://hadoop.apache.org/docs/stable/single_node_setup.pdf NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker 5 5.5.1 ~/usr/hadoop/hadoop-1.2.1/conf/ 5.5.2 core-site.xml

fs.default.name hdfs://localhost:9000 5.5.3 mapred-site.xml mapred.job.tracker localhost:9001 5.5.4 hdfs-site.xml dfs.replication 1 5.5.5 hadoop-env.sh export JAVA_HOME=/usr/local/lib/jdk1.7.0_40

6. rsync ssh

6.1 "sudo apt-get install ssh rsync" ssh rsyncssh Secure Shell Protocolrsync 6.2 ssh 6.2.1 "ssh-keygen -t dsa -f ~/.ssh/id_dsa" ssh/ 6.2.2 "cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys" ssh// 6.2.3 ssh localhost yes

7. hadoop

7.1 "cd ~/usr/hadoop/hadoop-1.2.1" 7.2 "./bin/hadoop namenode -format" NameNode 7.3 "./bin/start-all.sh"

NameNode, SecondaryNameNode, JobTracker, TaskTracker, DataNode 7.4 jps 6 java Jps, NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker 6 jps

source /etc/profile

8. hadoop

8.1 "cd ~/usr/hadoop/hadoop-1.2.1" 8.2 "./bin/hadoop fs -put README.txt readme.txt" README.txt hadoop README.txtHadoop README.txtHadoopHDFS readme.txt 8.3 "./bin/hadoop jar hadoop-examples-1.2.1.jar wordcount readme.txt output" hadoop examples wordcount hadoop Hadoop examples wordcount readme.txtHDFS output 8.4 "./bin/hadoop fs -cat output/part-r-00000" part-r-00000 wordcountcatwordcount

9.

50 Hadoop Java JDK 202~5Hadoopwget, ssh, rsync

2 MapReduce

1. Hadoop

1.1 Google

Google GFS, MapReduce BigTableGoogleGFS http://labs.google.com/papers/gfs-sosp2003.pdf MapReduce http://labs.google.com/papers/mapreduce-osdi04.pdfBigtablehttp://labs.google.com/papers/bigtable-osdi06.pdf GoogleDoug Cutting Nutch HadoopDoug CuttingYahooHadoopHadoop Hadoop MapReduceMapReduceHadoop Distributed File System (HDFS) GFSHBase BigTableHadoopHadoopHadoop MapReduceHDFSHBase

Hadoop

HadoopMapReduce

HDFS

HBase

Google

MapReduce

GFS

BigTable

1.2 MapReduce HDFS

MapReduceHDFS 1000G 1000 1000G 1000 1G 999 Hadoop 1000GHDFS 1000 1000G 64M 1000HDFS 35 MapReduceMap Reduce

This is a small cat. That is a small dog.Map, , , , , , , , , this 1 1 2Reduce, , , , , , 1GMap Reduce 1G 999 Reduce 1000GG ReduceMap Reduce Hadoop 1)HDFS 1000GHDFS 2) JavaMapReduce Jar 3)Hadoop JarHadoop 1000G 4)HDFS

1.3 API

API Java APIHadoop API Hadoop 1.2.1APIhttp://hadoop.apache.org/docs/r1.2.1/api/index.html Java JDK1.7APIhttp://docs.oracle.com/javase/7/docs/api/

2.

Word CountMapReduceWord CountHadoop

3. MapReduce

MapReduceMapReduceMapJava Reduce Java JavaMapReduce

3.1

wordcount_01/home/brian/wordcount_01 wordcount_01 src classessrc Javaclasses src IntSumReducer.javaTokenizerMapper.javaWordCount.javaMapReduce TokenizerMapper.java Map IntSumReducer.java Reduce WordCount.java Java

3.2 TokenizerMapper .java

package com.brianchen.hadoop;

import java.io.IOException;import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;

public class TokenizerMapper extends Mapper{ IntWritable one = new IntWritable(1); Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()){ word.set(itr.nextToken()); context.write(word, one); } }}

1.3API 1 package com.brianchen.hadoop Java package

com.brianchen.hadoopSun org.apache.hadoop

2)import java.io.IOException java JDK1.7API java io IOExceptionIOException

Exception Java IOExceptionI InputOOutput 3) import java.util.StringTokenizer java util StringTokenizerStringTokenizer This is a cat StringTokenizer This is a cat 14;229;37StringTokenizer 14 229 37StringTokenizer

4)import org.apache.hadoop.io.IntWritable org.apache.hadoop Hadoop 1.2.1API hadoop io IntWritableIntWritable Java 15 intint Java 4 IntegerInteger intIntegerHadoop Java IntWritable 5) import org.apache.hadoop.io.Text hadoop io TextText 6) import org.apache.hadoop.mapreduce.Mapper MapperMapReduceMap 7)public class TokenizerMapper extends Mapper Map TokenizerMapper Hadoop Mapper Object key Text Text IntWritable ObjectHadoop

This is a cat.Map This Text 1 IntWritable 8)IntWritable one = new IntWritable(1) 1 9) Text word = new Text() 10) public void map(Object key, Text value, Context context) throws IOException, InterruptedException mapkeyvaluemap

throws IOException, InterruptedException Context TokenizerMapper Mapper MapperContextMapContext 11) StringTokenizer itr = new StringTokenizer(value.toString())

StringTokenizer itr StringTokenizer Java String value Text value String value.toString() 12)Map while (itr.hasMoreTokens()){ word.set(itr.nextToken()); context.write(word, one); } StringTokenizer hasMoreTokens nextToken() wordword one contextcontext Reduce

3.3 IntSumReducer.java


import java.io.IOException;

import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;

public class IntSumReducer extends Reducer { IntWritable result = new IntWritable();

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); }}

1)import org.apache.hadoop.mapreduce.Reducer hadoop ReducerMapReduce Reduce 2) public class IntSumReducer extends Reducer Reduce IntSumReducer Hadoop Reducer

3)IntWritable result = new IntWritable() 4) public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException reducekeyvalues Iterable values IntWritable ContextMapper Context Redurer This is a That is aMap reduce reduce key=This values= key = is values= key = a, values=key=That, values= key = is key=avalues 1 5) Reduce int sum = 0; for (IntWritable val : values) { sum += val.get(); }

result.set(sum); context.write(key, result); values result context context

3.4 WordCount.java


import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount "); System.exit(2); }

Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}

1)import org.apache.hadoop.conf.Configuration Configuration 2) import org.apache.hadoop.fs.Path PathPath 3) import org.apache.hadoop.mapreduce.Job Job hadoop Job Job

MapReduceHadoop 4)import org.apache.hadoop.mapreduce.lib.input.FileInputFormat FileInputFormat split 5) import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat FileOutputFormat 6) import org.apache.hadoop.util.GenericOptionsParser GenericOptionsParser hadoop 7)public class WordCount word count JobMapper Reducer 8)Configuration conf = new Configuration() ConfigurationHadoop 9)String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs() Hadoopargs 10) if (otherArgs.length != 2) 2 11) Job job = new Job(conf, "wordcount") Job worodcount Job 12) job.setJarByClass(WordCount.class) Jar Java JarsetJarByClassWordCount Jar 13) job.setMapperClass(TokenizerMapper.class) Mapper 14) job.setReducerClass(IntSumReducer.class) Reducer 15) job.setOutputKeyClass(Text.class) 16)job.setOutputValueClass(IntWritable.class) 17) FileInputFormat.addInputPath(job, new Path(otherArgs[0])) otherArgs 18) FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])) otherArgs 19) System.exit(job.waitForCompletion(true) ? 0 : 1) job

3.5

javac javac Java programming language compiler Java JDK wordcount_01 3.5.1 cd ~/wordcount_01 wordcount_01 3.5.2 javac -classpath /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar:/home/brian/usr/hadoop/hadoop-1.2.1/lib/commons-cli-1.2.jar -d ./classes/ ./src/*.java

-classpath :-d class

3.6

3.6.1 jar -cvf wordcount.jar -C ./classes/ . class Jarjar JDK tar-Cjar classes classwordcount.jar

3.7

3.7.1 Hadoop 1 7 3.7.2 cd ~/usr/hadoop/hadoop-1.2.1 Hadoop 3.7.3 ./bin/hadoop fs -put READER.txt readme.txt README.txtHDFS readme.txt 3.7.4 ./bin/hadoop fs -rmr output HDFS outputHadoop 3.7.5 ./bin/hadoop jar /home/brian/wordcount_01/wordcount.jar com.brianchen.hadoop.WordCount readme.txt output readme.txt output jar jar

com.brianchen.hadoop.WordCount com.brianchen.hadoop WordCount

3.8

3.8.1 ./bin/hadoop fs -cat output/part-r-00000 output part-r-00000 cat

4. MapReduce

Word CountMapReduceHadoop JavaMapper ReducerWordCount

4.1

~/wordcount_02 classes srcclassessrcsrc java WordCount.java

4.2 WordCount.java


import java.io.IOException;import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount { public static class TokenizerMapper extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()){ word.set(itr.nextToken()); context.write(word, one); } } }

public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context)

throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount "); System.exit(2); }

Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}

1)public static class TokenizerMapper TokenizerMapperWordCount TokenizerMapperWordCount TokenizerMapperWordCount 2) private final static IntWritable one private final static private final

staticone

4.3

4.3.1 cd ~/wordcount_02 4.3.2 javac -classpath /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar:/home/brian/usr/hadoop/hadoop-

1.2.1/lib/commons-cli-1.2.jar -d ./classes/ ./src/WordCount.java

4.4

4.4.1 jar -cvf wordcount.jar -C ./classes/ .

4.5

4.5.1 cd ~/usr/bin/hadoop/hadoop-1.2.1 4.5.2 ./bin/hadoop fs -rmr output 4.5.3 ./bin/hadoop jar /home/brian/wordcount_02/wordcount.jar com.brianchen.hadoop.WordCount readme.txt output

4.6

4.6.1 ./bin/hadoop fs -cat output/part-r-00000

3 HDFS

1.

SD SD SD

LinuxMinixMinix 64M 14LinuxVFSVirtual File SystemVFS Linux Linxu VFSLinuxVFS extext2ext3VFSLinuxWindowsFATNTFS

LinuxNFSNetWork File System VFS SunNFS-NFS AVFS BVFS A BAAVFS-->ANFS-->-->BNFS-->BVFS-->B

A

VFS

NFS

B

VFS

NFS

2. Hadoop

HadoopVFSHDFSHadoopHDFSHadoop HTTPHFTP Amazon S3

HDFS PBHadoop

HDFS NameNode SecondaryNameNode DataNode NameNodeSecodaryNameNodeDataNode

NameNodeHDFSSecondaryNameNodeNameNode NameNodeNameNode NameNodeDataNode HDFSHDFS 64M 128M 256M DataNodeHDFS

hadoop fs -helpHDFS

HDFSHDFSHadoop API FileSystem

3. HDFS

3.1

hadoop fs -put~/filecopy filecopy src classessrc Javaclass src FileCopy.java

3.2 FileCopy.java

package com.brianchen.hadoop; import java.net.URI; import java.io.InputStream; import java.io.OutputStream; import java.io.BufferedInputStream; import java.io.FileInputStream;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils;

public class FileCopy{ public static void main(String[] args) throws Exception{ if (args.length != 2){ System.err.println("Usage: filecopy "); System.exit(2); }

Configuration conf = new Configuration(); InputStream in = new BufferedInputStream(new FileInputStream(args[0])); FileSystem fs = FileSystem.get(URI.create(args[1]), conf); OutputStream out = fs.create(new Path(args[1])); IOUtils.copyBytes(in, out, 4096, true); }}

3.3

cd ~/filecopy javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./classes ./src/*.java

3.4

jar -cvf filecopy.jar -C ./classes/ .

3.5

cd /home/brian/usr/hadoop/hadoop-1.2.1 ./bin/hadoop jar ~/filecopy/filecopy.jar com.brianchen.hadoop.FileCopy README.txt readme.txt Hadoop Hadoop README.txtHDFS readme.txt

3.6

./bin/hadoop fs -ls readme.txt ./bin/hadoop fs -ls cat readme.txt readme.txt

4. HDFS

4.1

hadoop fs -rm~/filedelete filedelete src classessrc Javaclass src FileDelete.java

4.2 FileDelete.java


import java.net.URI;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;

public class FileDelete{ public static void main(String[] args) throws Exception{ if (args.length != 1){ System.err.println("Usage: filedelete "); System.exit(2); }

Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(args[0]), conf); fs.delete(new Path(args[0]), false); }}

4.3

cd ~/filedelete javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./classes ./src/*.java

4.4

jar -cvf filedelete.jar -C ./classes/ .

4.5

cd /home/brian/usr/hadoop/hadoop-1.2.1 ./bin/hadoop jar ~/filedelete/filedelete.jar com.brianchen.hadoop.FileDelete readme.txt HadoopHadoop readme.txt

4.6

./bin/hadoop fs -ls readme.txtreadme.txt

5. HDFS

5.1

hadoop fs -cat~/filecat filecat src classessrc Javaclass src FileCat.java

5.2 FileCat.java


import java.net.URI; import java.io.InputStream;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils;

public class FileCat{ public static void main(String[] args) throws Exception{ if (args.length != 1){ System.err.println("Usage: filecat "); System.exit(2); }

Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(args[0]), conf); InputStream in = null; try{ in = fs.open(new Path(args[0])); IOUtils.copyBytes(in, System.out, 4096, false); }finally{ IOUtils.closeStream(in); } } }

5.3

cd ~/filecat javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./classes ./src/*.java

5.4

jar -cvf filecat.jar -C ./classes/ .

5.5

cd /home/brian/usr/hadoop/hadoop-1.2.1 ./bin/hadoop fs -put README.txt readme.txt ./bin/hadoop jar ~/filecat/filecat.jar com.brianchen.hadoop.FileCat readme.txt HadoopHadoop readme.txt

6.

6.1

hadoop fs -ls~/fileinfo fileinfo src classessrc Javaclass src FileInfo.java

6.2 FileInfo.java



import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.Path;

public class FileInfo{ public static void main(String[] args) throws Exception{ if (args.length != 1){ System.err.println("Usage: fileinfo "); System.exit(2); }

Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(args[0]),conf); FileStatus stat = fs.getFileStatus(new Path(args[0]));

System.out.println(stat.getPath()); System.out.println(stat.getLen()); System.out.println(stat.getModificationTime()); System.out.println(stat.getOwner()); System.out.println(stat.getReplication()); System.out.println(stat.getBlockSize()); System.out.println(stat.getGroup()); System.out.println(stat.getPermission().toString()); } }

6.3

cd ~/fileinfo javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./classes ./src/*.java

6.4

jar -cvf fileinfo.jar -C ./classes/ .

6.5

cd /home/brian/usr/hadoop/hadoop-1.2.1 ./bin/hadoop fs -put README.txt readme.txt ./bin/hadoop jar ~/fileinfo/fileinfo.jar com.brianchen.hadoop.FileInfo readme.txt HadoopHadoop readme.txt

7.

7.1

~/filelist fileinfo src classessrc Javaclass src FileList.java

7.2 FileList.java



import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FileUtil;

public class FileList{ public static void main(String[] args) throws Exception{ if (args.length != 1){ System.err.println("Usage: filelist "); System.exit(2); }

Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(args[0]),conf); FileStatus[] status = fs.listStatus(new Path(args[0])); Path[] listedPaths = FileUtil.stat2Paths(status);

for(Path p : listedPaths){ System.out.println(p); } } }

7.3

cd ~/filelist javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./classes ./src/*.java

7.4

jar -cvf filelist.jar -C ./classes/ .

7.5

cd /home/brian/usr/hadoop/hadoop-1.2.1 ./bin/hadoop fs -put README.txt readme.txt ./bin/hadoop fs -put README.txt readme2.txt ./bin/hadoop jar ~/filelist/filelist.jar com.brianchen.hadoop.FileList hdfs://localhost:9000/user/brian Hadoop Hadoop/user/brian readme.txt readme2.txt

4

1.

Java implements Serializable public class Building implements Serializable

JavaJava

Java Java HadoopHadoop

Hadoop Hadoop RPCHadoop

2. Hadoop

Writable org.apache.hadoop.io.WritableHadoop Writable write readFields

Java API Comparable java.lang.Comparable compareTo

WritableComparableWritable Comparable

Hadoop IntWritableDoubleWritable ByteWritable WritableComparable IntWritableDoubleWritable ByteWritable Writable

Hadoop

3. IntWritable

3.1

IntWritable IntWritable IntWritable

~/intser intser src classessrc Javaclass src IntSer.java

3.2 IntSer.jav


import java.io.DataOutputStream;import java.io.DataInputStream;import java.io.ByteArrayOutputStream;import java.io.ByteArrayInputStream;import java.io.IOException;

import org.apache.hadoop.io.Writable;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.util.StringUtils;

public class IntSer{ public byte[] serialize(Writable w)throws IOException{ ByteArrayOutputStream out = new ByteArrayOutputStream(); DataOutputStream dataout = new DataOutputStream(out); w.write(dataout); dataout.close(); return out.toByteArray(); }

public byte[] deserialize(Writable w, byte[] bytes) throws IOException{ ByteArrayInputStream in = new ByteArrayInputStream(bytes); DataInputStream datain = new DataInputStream(in); w.readFields(datain); datain.close(); return bytes; }

public static void main(String[] args) throws Exception{ IntWritable intw = new IntWritable(7); byte[] bytes = serialize(intw); String bytes_str = StringUtils.byteToHexString(bytes); System.out.println(bytes_str);

IntWritable intw2 = new IntWritable(0); deserialize(intw2, bytes); System.out.println(intw2); }}

3.3

cd ~/intser javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./classes src/*.java

3.4

jar -cvf intser.jar -C ./classes .

3.5

cd ~/usr/hadoop/hadoop-1.2.1 ./bin/hadoop jar /home/brian/intser/intser.jar com.brianchen.hadoop.IntSer Hadoop Hadoop

00000007 7

5

1.

Hadoop HDFS MapReduceMapReduce Java

HadoopA BA B B CA C

A C C B

2.

2.1

HDFS ~/cprsf2f cprsf2f src classessrc Javaclass src CprsF2F.java

2.2 CprsF2F.java


import java.net.URI;import java.io.InputStream;import java.io.OutputStream;

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.fs.Path;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.util.ReflectionUtils;

public class CprsF2F{ public static void main(String[] args) throws Exception{ if (args.length != 3){ System.err.println("Usage: CprsF2F cmps_name src target"); System.exit(2); }

Class codecClass = Class.forName(args[0]); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf); InputStream in = null; OutputStream out = null; FileSystem fs = FileSystem.get(URI.create(args[1]), conf);

try{ in = fs.open(new Path(args[1])); out = codec.createOutputStream(fs.create(new Path(args[2]))); IOUtils.copyBytes(in, out, conf); }finally{ IOUtils.closeStream(in); IOUtils.closeStream(out); } }}

2.3

cd ~/cprsf2f jvac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./class/ src/*.java

2.4

jar -cvf cprsf2f.jar -C ./class/ .

2.5

cd ~/usr/hadoop/hadoop-1.2.1 echo "hellowrod" >> a.txt ./bin/hadoop fs -put a.txt a.txt ./bin/hadoop jar /home/brian/cprsf2f/cprsf2f.jar com.brianchen.hadoop.CprsF2F org.apache.hadoop.io.compress.GzipCodec a.txt b.txt ./bin/hadoop fs -cat b.txt ./bin/hadoop fs -cat b.txt|gunzip Hadoop echo hello world a.txt a.txtHDFS jar a.txt b.txt cat b.txt

./bin/hadoop fs -cat b.txt | gunzip hello world gunzip

3.

3.1

HDFS~/cprsin2f cprsin2f src classessrc Javaclass src CprsIn2F.java

3.2 CprsIn2F.java


import java.net.URI;import java.io.OutputStream;


public class CprsIn2F{ public static void main(String[] args) throws Exception{ if (args.length != 2){ System.err.println("Usage: CprsIn2F cmps_name target"); System.exit(2); }

Class codecClass = Class.forName(args[0]); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf); OutputStream out = null; FileSystem fs = FileSystem.get(URI.create(args[1]), conf); try{ out = codec.createOutputStream(fs.create(new Path(args[1]))); IOUtils.copyBytes(System.in, out, 4096, false); }finally{ IOUtils.closeStream(out); } }}

3.3

cd ~/cprsin2f jvac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./class/ src/*.java

3.4

jar -cvf cprsf2f.jar -C ./class/ .

3.5

cd ~/usr/hadoop/hadoop-1.2.1 echo hello, world! | ./bin/hadoop jar /home/brian/cprsf2f/cprsf2f.jar com.brianchen.hadoop.CprsF2F org.apache.hadoop.io.compress.GzipCodec hello.txt ./bin/hadoop fs -cat hello.txt ./bin/hadoop fs -cat hello.txt|gunzip Hadoop echo

hello world jar HDFS hello.txt org.apache.hadoop.io.compress.GzipCodecHadoopGzipcat hello.txt ./bin/hadoop fs -cat hello.txt | gunzip hello, world! gunzip

4

4.1

~/dcprsf2f dcprsf2f src classessrc Javaclass srcDcprsF2F.java

4.2 DcprsF2F.java


import java.net.URI;import java.io.InputStream;import java.io.OutputStream;


public class DcprsF2F{ public static void main(String[] args) throws Exception{ if (args.length != 3){ System.err.println("Usage: CprsF2F cmps_name src target"); System.exit(2); }

Class codecClass = Class.forName(args[0]); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf); InputStream in = null; OutputStream out = null; FileSystem fs = FileSystem.get(URI.create(args[1]), conf); try{ in = codec.createInputStream(fs.open(new Path(args[1])), codec.createDecompressor()); out = fs.create(new Path(args[2])); IOUtils.copyBytes(in, out, conf); }finally{ IOUtils.closeStream(in); IOUtils.closeStream(out); } }}

4.3

cd ~/dcprsf2f jvac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar -d ./class/ src/*.java

4.4

jar -cvf dcprsf2f.jar -C ./class/ .

4.5

cd ~/usr/hadoop/hadoop-1.2.1 ./bin/hadoop jar /home/brian/cprsf2f/cprsf2f.jar com.brianchen.hadoop.DcprsF2F org.apache.hadoop.io.compress.GzipCodec hello.txt hello_dec.txt ./bin/hadoop fs -cat hello.txt ./bin/hadoop fs -cat hello_dec.txt Hadoop hello.txt 5.3 hello.txt hello_c.txt org.apache.hadoop.io.compress.GzipCodecHadoopGzip cat hello.txt ./bin/hadoop fs -cat hello_dec.txt

hello, world!

6 MapReduce

Hadoop

HDFS

64M 128M 255M

2Word CountMapReduceMapper ReducerMapReuceMap ReduceHadoopMapper ReducerHDFSMapperReducerHDFS

HDFSMapper InputFormat

ReducerHDFSOutputFormat

JavaA do_it do_it B AB dot_it B do_it

1. InputFormat

InputFormatInputFormat abstract List getSplits(JobContext context) abstract RecordReader createRecordReader(InputSplit split, TaskAttemptContext context) InputFormat InputFormat.java

getSplitsHDFS spitHadoopMapReduce split splitMapReduce

createRecordReader RecordReader RecordReader split splitMapReduceMappper RecordReader RecordReadermap

InputFormat 3DBInputFormat , DelegatingInputFormat FileInputFormat DBInputFormat DelegatingInputFormatFileInputFormat

fileInputFormatFileInputFormat InputFormat getSplits createRecordReader createRecordReader getSplits

// generate splits // List splits = new ArrayList(); //HDFSFileStatus Listfiles = listStatus(job); // for (FileStatus file: files) { // Path path = file.getPath(); FileSystem fs = path.getFileSystem(job.getConfiguration()); // long length = file.getLen(); //HDFS BlockLocation[] blkLocations = fs.getFileBlockLocations(file, 0, length);

if ((length != 0) && isSplitable(job, path)) { // long blockSize = file.getBlockSize(); // split long splitSize = computeSplitSize(blockSize, minSize, maxSize);

// splitSize split // split long bytesRemaining = length; while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) { int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining); splits.add(new FileSplit(path, length-bytesRemaining, splitSize, blkLocations[blkIndex].getHosts())); bytesRemaining -= splitSize; }

// splitSize split if (bytesRemaining != 0) { splits.add(new FileSplit(path, length-bytesRemaining, bytesRemaining, blkLocations[blkLocations.length-1].getHosts())); } } else if (length != 0) { // split splits.add(new FileSplit(path, 0, length, blkLocations[0].getHosts())); } else { // 0 splits.add(new FileSplit(path, 0, length, new String[0])); } }

FileInputFormat 5 CombineFileInputFormat KeyValueTextInputFormat

NLineInputFormatSequenceFileInputFormat TextInputFormat

TextInputFormat createRecordReaderLineRecordReader

LineRecordReader RecordReader RecordReaderLineRecordReader InputSplitMapper

2. OutputFormat

OutputFormatMapper ReducerOutputFormatMapperReducer

OutputFormat 3 public abstract RecordWriter getRecordWriter(TaskAttemptContext context) public abstract void checkOutputSpecs(JobContext context) public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context) getRecordWriter RecordWriter checkOutputSpecs getOutputCommitter OutputCommitter OutputCommitter

OutputFormat 4DBOutputFormatFileOutputFormatFilterOutputFormatNullOutputFormatDBOutputFormat FileOutputFormat FilterOutputFormat OutputFormat Java Filter NullOutputFormat/dev/null

FileOutputFormat FileOutputFormat SequenceFileOutputFormat TextOutputFormatSequenceFileOutputFormatHDFSTextOutputFormatHDFS 3HDFS

7 Hadoop MapReduce

Hadoop

1. Hadoop 5 1.1 NameNodeHDFS 1.2 DataNodeDataNode 1.3 SecondaryNameNodeNameNode 1.4 JobTrackerHadoopMapReduce 1.5 TaskTrackerTaskTracker

2. Hadoop 5 Hadoop jps 5

3. Hadoop NameNodeSecondaryNameNode JobTracker DataNodeTaskNode

4. Hadoop ./bin/hadoop namenode -format NameNodeNameNode

5. 4 ./bin/start-all.shHadoop

6. 6.1 MapReduce tar 6.2 LinuxHDFSHDFSNameNodeDataNode 6.3 HadoopSecondaryNameNodeNameNode

7. LinuxHadoop"./bin/hadoop jar hadoop-examples-1.2.1.jar wordcount readme.txt output" Hadoop Linux MapReduce

8. 8.1 JobTracker ID 8.2 8.3 InputFormat InputSplit 8.4 Mapreduce jarInputSplitHDFS ID 8.5 JobTracker

8.6

9. JobTracker 9.1 JobTracker 9.2 JobTracker 9.3 InputSplit InputSplit Mapmap.reduce.reduce Reduce 9.4 HadoopJobTracker TaskTracker TaskTracker TaskTracker 9.5 JobTracker 9.6 JobTracker

10. TaskTracker 10.1 HadoopTaskTracker JobTracker JobTracer TaskTracker 10.2 TaskTrackerJobTracer TaskTracker 10.3 TaskTracker jar JobTrackerHDFS 10.4 TaskTracker TasckRunner JobTracker 10.5

NameNode ./bin/hadoop namenode -format

Hadoop ./bin/start-all.sh

NameNode

SecondaryNameNode

DataNode

HDFS

JobTracker

TaskTracker

8

1. Pi

Hadoop PiAPIAPI

1 1 1 0.25*Pi n nc nc/n nc/n = 0.25*Pi/1 Pi = 4*nc/n

[0, 1] x yHalton Halton Halton http://orion.math.iastate.edu/reu/2001/voronoi/halton_sequence.html http://www.aae.wisc.edu/dphaneuf/AAE%20875/Halton%20sequences.pdf

PiHadoop 1000 10MapMap 100 100MapMap 10

2. API Pi MapReduce

Hadoop 10Map 1000 1) PiEstimatorMapReduce 101000 10MapMap1000

2) PiEstimator HDFS10MapPiEstimator 10Halton 0, 1000 1000, 1000 2000, 1000 3000, 1000 Halton Map 0 999Map 1000 1999Map 2000 2999 Halton

3) PiEstimatorMapReduce

4) PiEstimatorMapReduce

5) 4) Pi

PiEstimator.java PiEstimatorPiEstimatorHalthonSequencePiMapperPiReducerHalthonSequencePiMapperMapPiReducer Reduce

PiRefuce.javapackage org.apache.hadoop.examples;

import java.io.IOException;import java.math.BigDecimal;import java.util.Iterator;

import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.BooleanWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.Writable;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.io.SequenceFile.CompressionType;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.mapred.SequenceFileInputFormat;import org.apache.hadoop.mapred.SequenceFileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;

public class PiEstimator extends Configured implements Tool { // static private final Path TMP_DIR = new Path( PiEstimator.class.getSimpleName() + "_TMP_3_141592654");

//Halton

private static class HaltonSequence { static final int[] P = {2, 3}; static final int[] K = {63, 40};

private long index; private double[] x; private double[][] q; private int[][] d;

// HaltonSequence(long startindex) { index = startindex; x = new double[K.length]; q = new double[K.length][]; d = new int[K.length][]; for(int i = 0; i < K.length; i++) { q[i] = new double[K[i]]; d[i] = new int[K[i]]; }

for(int i = 0; i < K.length; i++) { long k = index; x[i] = 0; for(int j = 0; j < K[i]; j++) { q[i][j] = (j == 0? 1.0: q[i][j-1])/P[i]; d[i][j] = (int)(k % P[i]); k = (k - d[i][j])/P[i]; x[i] += d[i][j] * q[i][j]; } } }

// double[] nextPoint() { index++; for(int i = 0; i < K.length; i++) { for(int j = 0; j < K[i]; j++) { d[i][j]++; x[i] += q[i][j]; if (d[i][j] < P[i]) { break; } d[i][j] = 0; x[i] -= (j == 0? 1.0: q[i][j-1]); } } return x; }

}

//PiMapperMap public static class PiMapper extends MapReduceBase implements Mapper { public void map(LongWritable offset, LongWritable size, OutputCollector out, Reporter reporter) throws IOException { /* map offset sizeoffsetHaltonsizeMap */ final HaltonSequence haltonsequence = new HaltonSequence(offset.get()); long numInside = 0; long numOutside = 0;

for(long i = 0; i < size.get(); ) { // final double[] point = haltonsequence.nextPoint(); // final double x = point[0] - 0.5; final double y = point[1] - 0.5; if (x*x + y*y > 0.25) { numOutside++; } else { numInside++; } i++; // 1000 if (i % 1000 == 0) { reporter.setStatus("Generated " + i + " samples."); } }

//Map out.collect(new BooleanWritable(true), new LongWritable(numInside)); out.collect(new BooleanWritable(false), new LongWritable(numOutside)); } }

//PiReduce Reduce public static class PiReducer extends MapReduceBase implements Reducer

@Override public void configure(JobConf job) { conf = job; }

public void reduce(BooleanWritable isInside, Iterator values, OutputCollector

jobConf.setReducerClass(PiReducer.class); jobConf.setNumReduceTasks(1);

/* speculative execution speculative executionHadoop Reduce HDFS Reduce */ jobConf.setSpeculativeExecution(false);

// final Path inDir = new Path(TMP_DIR, "in"); final Path outDir = new Path(TMP_DIR, "out"); FileInputFormat.setInputPaths(jobConf, inDir); FileOutputFormat.setOutputPath(jobConf, outDir);

// final FileSystem fs = FileSystem.get(jobConf); if (fs.exists(TMP_DIR)) { throw new IOException("Tmp directory " + fs.makeQualified(TMP_DIR) + " already exists. Please remove it first."); }

if (!fs.mkdirs(inDir)) { throw new IOException("Cannot create input directory " + inDir); }

try { //Map for(int i=0; i < numMaps; ++i) { final Path file = new Path(inDir, "part"+i); final LongWritable offset = new LongWritable(i * numPoints); final LongWritable size = new LongWritable(numPoints); final SequenceFile.Writer writer = SequenceFile.createWriter( fs, jobConf, file, LongWritable.class, LongWritable.class, CompressionType.NONE); try { writer.append(offset, size); } finally { writer.close(); } System.out.println("Wrote input for Map #"+i); }

//MapReduce

System.out.println("Starting Job"); final long startTime = System.currentTimeMillis(); JobClient.runJob(jobConf); final double duration = (System.currentTimeMillis() - startTime)/1000.0; System.out.println("Job Finished in " + duration + " seconds");

// Path inFile = new Path(outDir, "reduce-out"); LongWritable numInside = new LongWritable(); LongWritable numOutside = new LongWritable(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, inFile, jobConf); try { reader.next(numInside, numOutside); } finally { reader.close(); }

// Pi return BigDecimal.valueOf(4).setScale(20) .multiply(BigDecimal.valueOf(numInside.get())) .divide(BigDecimal.valueOf(numMaps)) .divide(BigDecimal.valueOf(numPoints)); } finally { fs.delete(TMP_DIR, true); } }

// Tool run public int run(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: "+getClass().getName()+" "); ToolRunner.printGenericCommandUsage(System.err); return -1; }

final int nMaps = Integer.parseInt(args[0]); final long nSamples = Long.parseLong(args[1]);

System.out.println("Number of Maps = " + nMaps); System.out.println("Samples per Map = " + nSamples);

final JobConf jobConf = new JobConf(getConf(), getClass()); System.out.println("Estimated value of Pi is " + estimate(nMaps, nSamples, jobConf)); return 0; }

//PiEstimatormain

public static void main(String[] argv) throws Exception { System.exit(ToolRunner.run(null, new PiEstimator(), argv)); }}

PiEstimator1)PiEstimatormain2)PiEsitmator run3)PiEsitmator estimat

3. API Pi MapReduce

APIAPIMapperReducerJobAPIAPI

3.1 NewPiEst.java


import java.lang.Exception;import java.lang.Integer;import java.math.BigDecimal;import java.io.IOException;import java.lang.InterruptedException;import java.lang.Iterable;

import org.apache.hadoop.conf.Configured; import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.SequenceFile.CompressionType;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;import org.apache.commons.logging.Log;

import org.apache.commons.logging.LogFactory;import org.apache.hadoop.util.GenericOptionsParser;

public class NewPiEst extends Configured implements Tool{ // static private final Path TMP_DIR = new Path("pitmp"); //Log static final Log LOG = LogFactory.getLog(NewPiEst.class);

//Halton private static class HaltonSequence{ // bases static final int[] P = {2, 3}; // maximum number of digits allowed static final int[] K = {63, 40};

private long index; private double[] x; private double[][] q; private int[][] d;

HaltonSequence(long startindex){ index = startindex; x = new double[K.length]; q = new double[K.length][]; d = new int[K.length][]; for(int i = 0; i < K.length; i++){ q[i] = new double[K[i]]; d[i] = new int[K[i]]; }

for(int i = 0; i < K.length; i++){ long k = index; x[i] = 0;

for(int j = 0; j < K[i]; j++){ q[i][j] = (j == 0? 1.0: q[i][j-1])/P[i]; d[i][j] = (int)(k % P[i]); k = (k - d[i][j])/P[i]; x[i] += d[i][j] * q[i][j]; } } }

double[] nextPoint(){ index++; for(int i = 0; i < K.length; i++){

for(int j = 0; j < K[i]; j++){

d[i][j]++; x[i] += q[i][j]; if (d[i][j] < P[i]){

break; } d[i][j] = 0; x[i] -= (j == 0? 1.0: q[i][j-1]); } } return x; } }

//APIMapper public static class PiMapper extends Mapper{ public void map(LongWritable offset, LongWritable size, Context context) throws IOException, InterruptedException{ final HaltonSequence hs = new HaltonSequence(offset.get()); long nInside = 0; long nOutside = 0;

for(int i = 0; i < size.get(); i++){final double[] point = hs.nextPoint();

if (point[0]*point[0] + point[1]*point[1] > 1){ nOutside++;

}else{ nInside++;

}

context.write(new LongWritable(1), new LongWritable(nOutside)); context.write(new LongWritable(2), new LongWritable(nInside)); } } }

//API Reducer public static class PiReducer extends Reducer { long nInside = 0; long nOutside = 0; public void reduce(LongWritable isInside, Iterable values, Context context) throws IOException, InterruptedException{ if (isInside.get() == 2 ){

for (LongWritable val : values) { nInside += val.get();

}

}else{ for (LongWritable val : values) {

nOutside += val.get(); } }

LOG.info("reduce-log:" + "isInside = " + isInside.get() + ", nInside = "+ nInside + ", nOutSide = "+nOutside ); }

//Reducer cleanup reduce nInside nOutSide @Override protected void cleanup(Context context) throws IOException, InterruptedException{ Path OutDir = new Path(TMP_DIR, "out"); Path outFile = new Path(OutDir, "reduce-out"); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); SequenceFile.Writer writer = SequenceFile.createWriter( fs, conf, outFile, LongWritable.class, LongWritable.class, CompressionType.NONE); writer.append(new LongWritable(nInside), new LongWritable(nOutside)); writer.close(); } }

public static BigDecimal estimate(int nMaps, int nSamples, Job job)throws Exception{ LOG.info("\n\n estimate \n\n");

// Job JarMapperReducer job.setJarByClass(NewPiEst.class); job.setMapperClass(PiMapper.class); job.setReducerClass(PiReducer.class); job.setNumReduceTasks(1);

// job.setInputFormatClass(SequenceFileInputFormat.class); job.setOutputFormatClass(SequenceFileOutputFormat.class);

// job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(LongWritable.class);

job.setSpeculativeExecution(false);

Path inDir = new Path(TMP_DIR, "in"); Path outDir = new Path(TMP_DIR, "out");

//

FileInputFormat.addInputPath(job, inDir); FileOutputFormat.setOutputPath(job, outDir); // FileSystem fs = FileSystem.get(job.getConfiguration()); if (fs.exists(TMP_DIR)){ throw new IOException("Tmp directory " + fs.makeQualified(TMP_DIR) + " already exists, pls remove it."); }

// if (!fs.mkdirs(inDir)){ throw new IOException("Cannot create input directory " + inDir); }

try{ //Map for(int i = 0; i < nMaps; i++){

final Path file = new Path(inDir, "part"+i); final LongWritable offset = new LongWritable(i*nSamples); final LongWritable size = new LongWritable(nSamples); final SequenceFile.Writer writer = SequenceFile.createWriter(

fs, job.getConfiguration(), file, LongWritable.class, LongWritable.class, CompressionType.NONE); writer.append(offset, size); writer.close(); System.out.println("wrote input for Map #" + i); }

//MapReduce System.out.println("starting mapreduce job"); final long startTime = System.currentTimeMillis(); boolean ret = job.waitForCompletion(true); final double duration = (System.currentTimeMillis() - startTime)/1000.0; System.out.println("Job finished in " + duration + " seconds.");

//HDFSMapReduce Path inFile = new Path(outDir, "reduce-out"); LongWritable nInside = new LongWritable(); LongWritable nOutside = new LongWritable(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, inFile, job.getConfiguration()); reader.next(nInside, nOutside); reader.close(); LOG.info("estimate-log: " + "nInside = "+nInside.get()+", nOutSide = "+nOutside.get());

// Pi return BigDecimal.valueOf(4).multiply( BigDecimal.valueOf(nInside.get())

).divide( BigDecimal.valueOf(nInside.get() + nOutside.get()), 20, BigDecimal.ROUND_HALF_DOWN ); }finally{ fs.delete(TMP_DIR, true); } }

public int run(String[] args) throws Exception{ LOG.info("\n\n run \n\n"); if (args.length != 2){ System.err.println("Use: NewPieEst 10 10000"); System.exit(1); }

// int nMaps = Integer.parseInt(args[0]); int nSamples = Integer.parseInt(args[1]); Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); Job job = new Job(conf, "Pi estimating job");

System.out.println("Pi = " + estimate(nMaps, nSamples, job));

return 0; }

public static void main(String[] argv) throws Exception{ LOG.info("\n\n main \n\n"); System.exit(ToolRunner.run(null, new NewPiEst(), argv)); }}

3.2

javac -cp /home/brian/usr/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar:/home/brian/usr/hadoop/hadoop-1.2.1/lib/commons-logging-1.1.1.jar:/home/brian/usr/hadoop/hadoop-1.2.1/lib/commons-cli-1.2.jar -d ./classes/ src/*.java jar -cvf newpiest.jar -C ./classes/ . ./bin/hadoop jar ~/all/work/ifitus/ifitus-dev-svn/hadoop_from_zero/src/chp08/newpiest/newpiest.jar com.brianchen.hadoop.NewPiEst 10 1000

Hadoop

1. Java2 JavaHadoop

2. Hadoop 3 2APIHadoopHadoopHadoop

3. Hadoop-MapReduceHadoop-Hadoop CommonHDFSHadoop

4. HadoopHadoop in ActionHadoop

http://www.geekonomics10000.com/639IBMWatson Siri 500

http://www.36kr.com/p/202520.html

FaceBook1000 3000Twitter 100 900Google 2500 3Google 2500 8

2013 8 10 1195%

--

--

Linux

123 IT

HadoopHadoopHadoopHadoop

1. UnixCJavaJavaScriptHadoopAPI

2. HadoopMapReduceHDFSMapReduceHDFSHadoopHadoop---

3. Project java hadoop API

1 Hadoop1. 2. Hadoop3. Hadoop4. Java JDK5. hadoop6. rsyncssh7. hadoop8. hadoop9.

2 MapReduce1. Hadoop1.1 Google1.2 MapReduceHDFS1.3 API

2. 3. MapReduce3.1 3.2 TokenizerMapper .java3.3 IntSumReducer.java3.4 WordCount.java3.5 3.6 3.7 3.8

4. MapReduce4.1 4.2 WordCount.java4.3 4.4 4.5 4.6

3 HDFS1. 2. Hadoop3. HDFS3.1 3.2 FileCopy.java3.3 3.43.5 3.6

4. HDFS4.1 4.2 FileDelete.java4.3 4.4 4.5 4.6

5. HDFS5.1 5.2 FileCat.java5.3 5.4 5.5

6. 6.1 6.2 FileInfo.java6.3 6.4 6.5

7. 7.1 7.2 FileList.java7.3 7.4 7.5

4 1. 2. Hadoop3. IntWritable3.1 3.2 IntSer.jav3.3 3.4 3.5

5 1. 2. 2.1 2.2 CprsF2F.java 2.3 2.4 2.5

3. 3.1 3.2 CprsIn2F.java 3.3 3.4 3.5

4 4.1 4.2 DcprsF2F.java 4.3 4.4 4.5

6 MapReduce1. InputFormat2. OutputFormat

7 Hadoop MapReduce8 1. Pi2. APIPiMapReduce3. APIPiMapReduce3.1 NewPiEst.java3.2

Documents

最短路径系列之一从零开始学习Hadoop