View
159
Download
7
Category
Preview:
DESCRIPTION
大规模数据处理 / 云计算 Lecture 2 – "Hello World" in Hadoop. 彭波 北京大学信息科学技术学院 7/3/2014 http://net.pku.edu.cn/~course/cs402/. Jimmy Lin University of Maryland. SEWMGroup. - PowerPoint PPT Presentation
Citation preview
大规模数据处理 / 云计算 Lecture 2 – "Hello World" in Hadoop
彭波北京大学信息科学技术学院
7/3/2014http://net.pku.edu.cn/~course/cs402/
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Jimmy LinUniversity of Maryland SEWMGroup
• 貌似 pdf 里给的代码不能用,点那个“ source code here” 出来的代码是能用的……呃……不过我跑出来的结果和 pdf 里的不一样……
• The method setInputPath(Path) is undefined for the type JobConf WordCount/srcWordCount.java line 21 1404272734726310不知道什么原因。。
• 编译通不过 求助• FileInputPath cannot be resolved
FileOutputPath cannot be resolved这是什么情况。
• Exception in thread "main" java.io.IOException: Cannot run program "chmod": CreateProcess error=2, ?????????我运行的时候报的这个错误
Historical background
• The C programming language– early 1970s– UNIX
• The C++ programming language– early 1980s– object-oriented– a wide variety of application programming
• The Java programming language– early 1990s– originally for consumer electronic devices– enterprise application development
Java SDK
• Software Development Kit – a group of command-line tools and packages
that you will need to write and run Java programs
– base classes (Library)
Working with the SDK
• Factorial– input: a value as a command-line argument– output: factorial of that number OR exception
• Java Specification– every Java source code file must have the
exact same name as the class that is defined inside of it
Operators
•+ is overloaded•If you use the + operator with a String and another operand that is not a String, the other operand is converted into a String
C/C++ functions versus Java methods
• In Java terminology, functions are called methods.
• Methods can only be declared as members of a class; you can't define a method outside of a Java class
Arrays
• objects, so they are declared using the new operator• scores.length• the bracket characters ([ ]) that are used to indicate
arrays are bound to the array type, not the array name• java.lang.ArrayIndexOutOfBounds exception
Strings
• objects of the String class
• String objects are immutable
• same string literals
• String class has a rich interface
The main() method
• a strict naming convention• first element in the array is the first argument, not the
name of the program.
Other differences
• Pointers:– Java references are pointers to Java objects– cannot be incremented or decremented– no address of operators
• Global variables– no way to declare global variables (or methods)
• no struct, union, typedef, enum• Freely placed methods• Garbage collection
– no malloc() and free()
Defining a Java class
• Each member must have its own public or private modifier
• You don't use semicolons (;) after the closing brackets in class and method definitions.
• The main() method is a member of the class
• You call the constructor using the new keyword
The Object class
• All Java classes are ultimately subclasses of class Object
• a centrally rooted class hierarchy
• usage– toString()– define data structures that take objects of
class Object , it can hold any Java object .vs. C++ template
Interfaces
• All interfaces are implicitly abstract• All members of an interface are implicitly
public• All fields defined in an interface are
implicitly static and final• A Java class can extend only one class,
but it can implement any number of interfaces
• Best practice for polymorphism
Using Library(Java API)
• Java API, classes are grouped into packages
• you already been using classes from a default package: java.lang when call System.out.println()
• import java.util.ArrayList; or java.util.ArrayList<xx> list = ....
Deploying your application
• A Java program is a bunch of classes.
• A JAR file is Java Archive– create a manifest.txt state which class has
main() method• Main-Class: MyApp
– use jar tool to package all classes files and manifest.txt
– $jar -cvmf manifest.txt app.jar *.class– $java -jar app.jar
Package
• put your classes in packages– java.util, java.net, java.text ....
• preface your package with your reverse domain name
• setup a matching directory structure
What is MapReduce?• Programming model for expressing distributed
computations at a massive scale• Execution framework for organizing and
performing such computations• Open-source implementation called Hadoop
40
Brief History of Hadoop
• Hadoop was created by Doug Cutting, the creator of Apache Lucene/Nutch,
• 2003, Google published GFS• 2004, Google published MapReduce• 2005, Nutch ported to Mapreduce/HDFS• 2006, Cutting join Yahoo!• 2008.1, Hadoop became top-level project
at Apache• 2008.2, Hadoop run on 10000-core cluster
New MapReduce API
• favors abstract classes over interfaces• new API in org.apache.hadoop.mapreduce, old
in org.apache.hadoop.mapred• new Context class
– JobConf, OutputCollector,Reporter• new Job class
– JobClient• reduce() method passes values
– new: java.lang.Iterable, for (VALUEIN value : values) { ... }
– old: java.lang.Iterator, hasNext(), next()
Hadoop Streaming & Pipes
• Streaming– support any programming language, even
shell scripts– uses standard input and output to
communicate with the map and reduce code
• Pipes– C++ interface to Hadoop MapReduce– uses sockets as the communication channel
Changping Cluster
• 28 Nodes, 12 Cores/48GB RAM/10T DISK– Namenode/JobTracker server - changping11– ip : 222.29.134.11– hdfs port : 9000– mapreduce port: 9001
How to use ChangpingCluster
• 1. 添加一个域名解析– windows: 编辑 C:\WINDOWS\system32\
drivers\etc\hosts 文件 ,– linux : /etc/hosts 添加一行如下 : 222.29.134.11 changping11
• 否则运行 job 会报告名字解析错误
How to use ChangpingCluster
• 2. 身份设置• 1). 输出文件统一到 "/cs402/YourName" 目录下• 代码中是: FileOutputFormat.setOutputPath(conf,
new Path("/cs402/YourName"));
• 2). Mapred Location 里设置好 hadoop.job.ugi = YourName, cs402
• 用户名和上面文件路径中的名字一致,• 组名必须是 cs402• 或者在 driver 程序里直接设置好。
• Configuration conf = new Configuration();• conf.set("hadoop.job.ugi", "YourName,cs402");
Recommended