+ 應用實務 ( 一 ) + 應用實務 ( 一 ) V 0.20. 2 使用 Hadoop 的例子( 1 )...

Preview:

Citation preview

++ 應用實務應用實務 (( 一一 ))

V 0.20V 0.20

2

使用 使用 Hadoop Hadoop 的例子(的例子( 11 )) Freestylers - Image retrieval engine

用 Hadoop 影像處理 Hosting Habitat

取得所有 clients 的軟體資訊 分析並告知 clients 未安裝或未更新的軟體

Journey Dynamics 用 Hadoop MapReduce 分析 billions of lines of GPS

data 並產生交通路線資訊 .

Krugle 用 Hadoop and Nutch 建構 原始碼搜尋引擎

3

使用 使用 Hadoop Hadoop 的例子(的例子( 22 )) SEDNS - Security Enhanced DNS Group

收集全世界的 DNS 以探索網路分散式內容 .

Technical analysis and Stock Research 分析股票資訊

University of Nebraska Lincoln, Research Computing Facility 用 Hadoop 跑約 200TB 的 CMS 經驗分析 緊湊渺子線圈( CMS , Compact Muon Solenoid )為瑞士歐洲核子研究組織 CERN 的大型強子對撞器計劃的兩大通用型粒子偵測器中的一個。

4

使用 使用 Hadoop Hadoop 的例子(的例子( 33 )) Yahoo!

Used to support research for Ad Systems and Web Search 使用 Hadoop 平台來發現發送垃圾郵件的殭屍網絡

趨勢科技 過濾像是釣魚網站或惡意連結的網頁內容

5

Hadoop Hadoop 提供提供 Automatic parallelization & distribution Fault-tolerance Status and monitoring tools A clean abstraction and API for

programmers

6

Hadoop Hadoop 整體運算示意圖整體運算示意圖

7

HadoopHadoop 程式設計程式設計I/O I/O 操作程式操作程式

MapReduceMapReduce 運算程式運算程式HadoopHadoop 編譯與運作方法編譯與運作方法

結論結論

8

HadoopHadoop 程式設計程式設計

Hadoop Hadoop 的的基本 基本 I/O I/O 操作程式操作程式

9

傳送檔案至傳送檔案至 HDFSHDFS

public class PutToHdfs { static boolean putToHdfs(String src, String dst, Configuration conf) { Path dstPath = new Path(dst); try { // 產生操作 hdfs 的物件 FileSystem hdfs = dstPath.getFileSystem(conf); // 上傳 hdfs.copyFromLocalFile(false, new Path(src),new Path(dst)); } catch (IOException e) { e.printStackTrace(); return false; } return true; }

// 將檔案從 local上傳到 hdfs , src 為 local 的來源 , dst 為 hdfs 的目的端

10

從從 HDFSHDFS 取回檔案取回檔案

public class GetFromHdfs {static boolean getFromHdfs(String src,String dst, Configuration conf)

{ Path dstPath = new Path(src); try { // 產生操作 hdfs 的物件 FileSystem hdfs = dstPath.getFileSystem(conf); // 下載 hdfs.copyToLocalFile(false, new Path(src),new Path(dst)); } catch (IOException e) { e.printStackTrace(); return false; } return true; }

// 將檔案從 hdfs下載回 local, src 為 hdfs的來源 , dst 為 local 的目的端

11

檢查與刪除檔案檢查與刪除檔案

public class CheckAndDelete { static boolean checkAndDelete(final String path, Configuration conf) { Path dst_path = new Path(path); try { // 產生操作 hdfs 的物件 FileSystem hdfs = dst_path.getFileSystem(conf); // 檢查是否存在 if (hdfs.exists(dst_path)) { // 有則刪除 hdfs.delete(dst_path, true); } } catch (IOException e) { e.printStackTrace(); return false; } return true; }

// checkAndDelete函式,檢查是否存在該資料夾,若有則刪除之

12

HadoopHadoop 程式設計程式設計

MapReduceMapReduce運算程式運算程式

13

Class MR{static public Class Mapper …{ } static public Class Reducer …{ }main(){

Configuration conf = new Configuration();Job job = new Job(conf, “job name");job.setJarByClass(thisMainClass.class);job.setMapperClass(Mapper.class);job.setReduceClass(Reducer.class);

FileInputFormat.addInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);}}

Program Prototype (v 0.20)Program Prototype (v 0.20)

Map 程式碼

Reduce 程式碼

其他的設定參數程式碼

Map 區

Reduce 區

設定區

14

Class Mapper (v 0.20)Class Mapper (v 0.20)

class MyMap extends Mapper < , , , > {// 全域變數區public void map ( key, value,

Context context )throws IOException,InterruptedException {// 區域變數與程式邏輯區context.write( NewKey, NewValue);}

}

1

234

56789

INPUT KEYClass

OUTPUT VALUE

Class

OUTPUT KEYClass

INPUT VALUE

Class

INPUT VALUE

Class

INPUT KEYClass

import org.apache.hadoop.mapreduce.Mapper;

15

Class Reducer (v 0.20)Class Reducer (v 0.20)

class MyRed extends Reducer < , , , > {// 全域變數區public void reduce ( key, Iterable< > values,

Context context) throws IOException,InterruptedException

{// 區域變數與程式邏輯區context.write( NewKey, NewValue);}

}

1

234

56789

INPUT KEYClass

OUTPUT VALUE

Class

OUTPUT KEYClass

INPUT VALUE

Class

INPUT KEYClass

INPUT VALUE

Class

import org.apache.hadoop.mapreduce.Reducer;

16

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: hadoop jar WordCount.jar <input> <output>"); System.exit(2); } Job job = new Job(conf, "Word Count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); CheckAndDelete.checkAndDelete(args[1], conf); System.exit(job.waitForCompletion(true) ? 0 : 1); }

範例 範例 (1)(1)

17

class TokenizerMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map( LongWritable key, Text value, Context context)

throws IOException , InterruptedException {String line = ((Text) value).toString();StringTokenizer itr = new StringTokenizer(line);while (itr.hasMoreTokens()) {

word.set(itr.nextToken());context.write(word, one);

}}}

1

234

56789

範例 範例 (2)(2)

Input key

Input value

………………….…………………No news is a good news.…………………

/user/hadooper/input/a.txt

itr

< no , 1 >

< news , 1 >

< is , 1 >

< a, 1 >

< good , 1 >

< news, 1 >

<word,one>

line

no news newsis gooda

itr itr itr itr itr itr

18

class IntSumReducer extends Reducer< Text, IntWritable, Text, IntWritable> {

IntWritable result = new IntWritable();public void reduce( Text key, Iterable <IntWritable> values, Context context)throws IOException, InterruptedException {

int sum = 0;for ( IntWritable val : values )

sum += val.get(); result.set(sum);context.write ( key, result);

}}

12 3

45678

範例 範例 (3)(3)

< news , 2 >

<key,SunValue>news

1 1

< a, 1 >

< good, 1 >

< is, 1 >

< news, 11 >

< no, 1 >

for ( int i ; i < values.length ; i ++ ){ sum += values[i].get() }

19

HadoopHadoop 編譯與運作方法編譯與運作方法1. 編輯 WordCount.java

http://trac.nchc.org.tw/cloud/attachment/wiki/jazz/Hadoop_Lab6/WordCount.java?format=raw

2. mkdir MyJava3. javac -classpath hadoop-*-core.jar -d MyJava

WordCount.java

4. jar -cvf wordcount.jar -C MyJava .5. bin/hadoop jar wordcount.jar WordCount [ 程式參數 ]

•所在的執行目錄為 Hadoop_Home(因為 hadoop-*-core.jar )•javac編譯時需要 classpath, 但 hadoop jar時不用•wordcount.jar = 封裝後的編譯檔,但執行時需告知 class name

•Hadoop進行運算時,只有 input 檔要放到 hdfs上,以便hadoop分析運算;執行檔( wordcount.jar)不需上傳,也不需每個 node都放,程式的載入交由 java處理

20

ConclusionsConclusions

以上範例程式碼包含 Hadoop 的 key,value 架構 操作 Hdfs 檔案系統 Map Reduce 運算方式

執行 hadoop 運算時,程式檔不用上傳至hadoop 上,但資料需要再 HDFS 內

可運用範例七的程式達成連續運算 Hadoop 0.20 與 Hadoop 0.18 有些 API 有些許差異,盡可能完全改寫

21

HBase HBase 介紹介紹此篇介紹此篇介紹 HBase HBase 概念概念

22

HBaseHBase

HBase 是具有以下特點的儲存系統: 類似表格的資料結構 (Multi-Dimensional Map) 分散式 高可用性、高效能 很容易擴充容量及效能

Hbase 適用於利用數以千計的一般伺服器上,來儲存 Petabytes 級的資料。

HBase 以 Hadoop 分散式檔案系統 (HDFS) 為基礎, 提供類 Bigtable 功能

Hbase 同時提供 Hadoop MapReduce 程式設計。

23

為什麼使用為什麼使用 HBase?HBase? 不是關聯式 (Relational) 資料庫系統

表格 (Table) 只有一個主要索引 (primary index) 即 row key. 不提供 join 不提供 SQL 語法。

提供 Java 函式庫 , 與 REST 與 Thrift 等介面。 提供 getRow(), Scan() 存取資料。

getRow() 可以取得一筆 row range 的資料,同時也可以指定版本 (timestamp) 。

Scan() 可以取得整個表格的資料或是一組 row range ( 設定 start key, end key)

有限的單元性 (Aatomicity) 與交易 (transaction) 功能 . 只有一種資料型態 (bytes)

24

為什麼使用為什麼使用 HBase?HBase? 關聯式資料庫 (Relational Database) 適用做資料異動

C.R.U.D (create, retrieval, update, delete) 等操作,主要因為這動作在記憶體中進行

對於超大量的資料分析,資料分散在多個節點的情況下,關聯式資料庫系統就不適用了

現有可於大量資料的分析的聯式資料庫軟體價格昂貴,且仍只適用在特定用途。

這裏所謂的大量資料分析是指 Big queries – 整個資料表的存取 Big databases - 100 Terabytes 以上的資料

Bigtable 的可以配合MapReduce框架,進行複雜的分析與查詢。

25

誰使用 誰使用 HBase HBase Adobe

內部使用 (Structure data)

Kalooga 圖片搜尋引擎 http://www.kalooga.com/

Meetup 社群聚會網站 http://www.meetup.com/

Streamy 成功從 MySQL 移轉到 Hbase http://www.streamy.com/

Trend Micro 雲端掃毒架構 http://trendmicro.com/

Yahoo! 儲存文件 fingerprint 避免重複 http://www.yahoo.com/

More http://wiki.apache.org/hadoop/Hbase/PoweredBy

26

Data ModelData Model Table依 row key 來自動排序 Table schema 只要定義 column families .

每個 column family 可有無限數量的 columns 每個 column 的值可有無限數量的時間版本 (timestamp) Column 可以動態新增,每個 row 可有不同數量的 columns 。 同一個 column family 的 columns會群聚在一個實體儲存單元上,且依 column 的名稱排序。

byte[] 是唯一的資料型態 (Row, Family: Column, Timestamp) Value

Row key

Column Family

valueTimeStamp

27

Data ModelData Model

HBase實際上儲存 Table 時,是以 column family 為單位來存放

28

HTable HTable 成員成員

Contents Department

news bid sport

t1com.yahoo.news.t

w

“我研發水下 6 千公尺機器人”

“tech”

t2 “蚊子怎麼搜尋人肉” “tech”

t3 “用腦波「發聲」 ” “tech”

t1com.yahoo.bid.tw

“… ipad …” “ 3C ”

t1com.yahoo.sport.t

w “… Wang 40…” “MBA”

Table, Family, Column, Qualifier , Row, TimeStamp

29

RegionsRegions 表格是由一或多個 region 所構成

Region 是由其 startKey 與 endKey 所指定 每個 region 可能會存在於多個不同節點上,而且是由

數個 HDFS 檔案與區塊所構成,這類 region 是由 Hadoop 負責複製

30

HBase HBase 與 與 Hadoop Hadoop 搭配的架構搭配的架構

31

HBase HBase 程式設計程式設計常用的常用的 HBase API HBase API 說明說明

實做 實做 I/O I/O 操作操作搭配搭配 Map Reduce Map Reduce 運算運算

3232

1. Java 1. Java 之編譯與執行之編譯與執行1. 將 hbase_home 目錄內的 .jar 檔全部拷貝至

hadoop_home/lib/ 資料夾內2. 編譯

javac Δ -classpath Δ hadoop-*-core.jar:hbase-*.jar Δ -d Δ

MyJava Δ MyCode.java

3. 封裝 jar Δ -cvf Δ MyJar.jar Δ -C Δ MyJava Δ .

4. 執行 bin/hadoop Δ jar Δ MyJar.jar Δ MyCode Δ {Input/ Δ Output/ }

•所在的執行目錄為 Hadoop_Home

•./MyJava = 編譯後程式碼目錄•Myjar.jar = 封裝後的編譯檔

•先放些文件檔到 HDFS上的 input目錄•./input; ./ouput 不一定為 hdfs的輸入、輸出目錄

33

HTable HTable 成員成員

Contents Department

news bid sport

t1com.yahoo.news.t

w

“我研發水下 6 千公尺機器人”

“tech”

t2 “蚊子怎麼搜尋人肉” “tech”

t3 “用腦波「發聲」 ” “tech”

t1com.yahoo.bid.tw

“… ipad …” “ 3C ”

t1com.yahoo.sport.t

w “… Wang 40…” “MBA”

Table, Family, Column, Qualifier , Row, TimeStamp

34

HBase HBase 常用函式常用函式 HBaseAdmin HBaseConfiguration HTable HTableDescriptor Put Get Scanner

Database

Table

Family

Column Qualifier

35

HBaseConfigurationHBaseConfiguration Adds HBase configuration files to a

Configuration = new HBaseConfiguration ( ) = new HBaseConfiguration (Configuration

c) 繼承自

org.apache.hadoop.conf.Configuration

回傳值 函數 參數void addResource (Path file)void clear ()String get (String name)String getBoolean (String name, boolean defaultValue )void set (String name, String value)void setBoolean (String name, boolean value)

<property> <name> name </name> <value> value </value></property>

363636

HBaseAdmin HBaseAdmin HBase 的管理介面

= new HBaseAdmin( HBaseConfiguration conf ) Ex:

回傳值 函數 參數

void

addColumn(String tableName, HColumnDescriptor column)

checkHBaseAvailable

(HBaseConfiguration conf)

createTable (HTableDescriptor desc)deleteTable (byte[] tableName)deleteColumn (String tableName, String columnName)enableTable (byte[] tableName)disableTable (String tableName)

HTableDescriptor[] listTables ()void modifyTable (byte[] tableName, HTableDescriptor htd)boolean tableExists (String tableName)

HBaseAdmin admin = new HBaseAdmin(config);admin.disableTable (“tablename”);

37

HTableDescriptorHTableDescriptor HTableDescriptor contains the name of an HTable, and its column

families. = new HTableDescriptor() = new HTableDescriptor(String name)

Constant-values org.apache.hadoop.hbase.HTableDescriptor.TABLE_DESCRIPTOR_VER

SION Ex:

回傳值 函數 參數void addFamily (HColumnDescriptor family) HColumnDescriptor

removeFamily (byte[] column)

byte[] getName ( ) = Table namebyte[] getValue (byte[] key) = 對應 key 的 valuevoid setValue (String key, String value)

HTableDescriptor htd = new HTableDescriptor(tablename);

htd.addFamily ( new HColumnDescriptor (“Family”));

38

HColumnDescriptorHColumnDescriptor An HColumnDescriptor contains information about a column family

= new HColumnDescriptor(String familyname) Constant-values

org.apache.hadoop.hbase.HTableDescriptor.TABLE_DESCRIPTOR_VERSION

Ex:

回傳值 函數 參數byte[] getName ( ) = Family namebyte[] getValue (byte[] key) = 對應 key 的 valuevoid setValue (String key, String value)

HTableDescriptor htd = new HTableDescriptor(tablename);HColumnDescriptor col = new HColumnDescriptor("content:");htd.addFamily(col);

393939

HTableHTable Used to communicate with a single HBase table.

= new HTable(HBaseConfiguration conf, String tableName) Ex:

回傳值 函數 參數

void checkAndPut(byte[] row, byte[] family, byte[] qualifier, byte[] value, Put put)

void close ()boolean exists (Get get)Result get (Get get)byte[][] getEndKeys ()ResultScanner getScanner (byte[] family)HTableDescriptor

getTableDescriptor ()

byte[] getTableName ()static boolean isTableEnabled (HBaseConfiguration conf, String tableName)void put (Put put)

HTable table = new HTable (conf, Bytes.toBytes ( tablename ));ResultScanner scanner = table.getScanner ( family );

40

PutPut Used to perform Put operations for a single row.

= new Put(byte[] row) = new Put(byte[] row, RowLock rowLock)

Ex:

Put add (byte[] family, byte[] qualifier, byte[] value)Put add (byte[] column, long ts, byte[] value)byte[] getRow ()RowLock getRowLock ()long getTimeStamp ()boolean isEmpty ()Put setTimeStamp (long timestamp)

HTable table = new HTable (conf, Bytes.toBytes ( tablename ));Put p = new Put ( brow );p.add (family, qualifier, value);table.put ( p );

414141

GetGet Used to perform Get operations on a single row.

= new Get (byte[] row) = new Get (byte[] row, RowLock rowLock)

Ex:

Get addColumn (byte[] column)Get addColumn (byte[] family, byte[] qualifier)Get addColumns (byte[][] columns)Get addFamily (byte[] family)TimeRange getTimeRange ()Get setTimeRange (long minStamp, long maxStamp)Get setFilter (Filter filter)

HTable table = new HTable(conf, Bytes.toBytes(tablename));Get g = new Get(Bytes.toBytes(row));

42

ResultResult Single row result of a Get or Scan query.

= new Result()

Ex:

boolean containsColumn (byte[] family, byte[] qualifier)NavigableMap <byte[],byte[]> getFamilyMap (byte[] family)

byte[] getValue (byte[] column)byte[] getValue (byte[] family, byte[] qualifier)int Size ()

HTable table = new HTable(conf, Bytes.toBytes(tablename));Get g = new Get(Bytes.toBytes(row));Result rowResult = table.get(g);Bytes[] ret = rowResult.getValue( (family + ":"+ column ) );

43

ScannerScanner All operations are identical to Get

Rather than specifying a single row, an optional startRow and stopRow may be defined.

If rows are not specified, the Scanner will iterate over all rows. = new Scan () = new Scan (byte[] startRow, byte[] stopRow) = new Scan (byte[] startRow, Filter filter)

Get addColumn (byte[] column)Get addColumn (byte[] family, byte[] qualifier)Get addColumns (byte[][] columns)Get addFamily (byte[] family)TimeRange getTimeRange ()Get setTimeRange (long minStamp, long maxStamp)Get setFilter (Filter filter)

44

Interface ResultScannerInterface ResultScanner Interface for client-side scanning. Go to HTable to

obtain instances. HTable.getScanner (Bytes.toBytes(family));

Ex:

void close ()Result next ()

ResultScanner scanner = table.getScanner (Bytes.toBytes(family));for (Result rowResult : scanner) {

Bytes[] str = rowResult.getValue ( family , column );}

45

HBase Key/Value HBase Key/Value 的格式的格式 org.apache.hadoop.hbase.KeyValue getRow(), getFamily(), getQualifier(), getTimestamp(), and

getValue(). The KeyValue blob format inside the byte array is:

Key 的格式 :

Rowlength 最大值為 Short.MAX_SIZE, column family length 最大值為 Byte.MAX_SIZE, column qualifier + key length 必須小於 Integer.MAX_SIZE.

<keylength> <valuelength> <key> <value>

< row- length >

< row>< column-

family-length >

< column-family >

< column-qualifier >

< time-stamp >

< key-type >

46

HBase HBase 程式設計程式設計

實做實做 I/OI/O 操作操作

47

範例一:新增範例一:新增 TableTable

$ hbase shell> create ‘tablename', ‘family1', 'family2', 'family3‘0 row(s) in 4.0810 seconds> Listtablename1 row(s) in 0.0190 seconds

< 指令 >

create < 表名 >, {<family>, ….}

48

範例一:新增範例一:新增 TableTable public static void createHBaseTable ( String tablename, String

familyname ) throws IOException

{

HBaseConfiguration config = new HBaseConfiguration();

HBaseAdmin admin = new HBaseAdmin(config);

HTableDescriptor htd = new HTableDescriptor( tablename );

HColumnDescriptor col = new HColumnDescriptor( familyname );

htd.addFamily ( col );

if( admin.tableExists(tablename))

{ return () }

admin.createTable(htd);

}

< 程式碼>

49

範例二:範例二: PutPut 資料進資料進 ColumnColumn< 指令

>

> put 'tablename','row1', 'family1:qua1', 'value'

0 row(s) in 0.0030 seconds

put ‘ 表名’ , ‘ 列’ , ‘column’, ‘值’ , [‘ 時間’ ]

50

範例二: 範例二: PutPut 資料進資料進 ColumnColumnstatic public void putData(String tablename, String row, String

family,String column, String value) throws IOException {

HBaseConfiguration config = new HBaseConfiguration();HTable table = new HTable(config, tablename);byte[] brow = Bytes.toBytes(row);byte[] bfamily = Bytes.toBytes(family);byte[] bcolumn = Bytes.toBytes(column);byte[] bvalue = Bytes.toBytes(value);Put p = new Put(brow);p.add(bfamily, bcolumn, bvalue);table.put(p);table.close();

}

< 程式碼>

51

範例三: 範例三: Get ColumnGet Column ValueValue< 指令

>

> get 'tablename', 'row1' COLUMN CELL family1:column1 timestamp=1265169495385, value=value1 row(s) in 0.0100 seconds

get ‘ 表名’ , ‘列’

52

範例三: 範例三: Get ColumnGet Column ValueValueString getColumn ( String tablename, String row, Strin

g family, String column ) throws IOException {HBaseConfiguration conf = new HBaseConfiguration();HTable table;table = new HTable( conf, Bytes.toBytes( tablename));Get g = new Get(Bytes.toBytes(row));Result rowResult = table.get(g);return Bytes.toString( rowResult.getValue ( Bytes.toBytes (family + “:” + column)));}

< 程式碼>

53

範例四: 範例四: Scan all ColumnScan all Column

> scan 'tablename'ROW COLUMN+CELL row1 column=family1:column1, timestamp=1265169415385, value=value1row2 column=family1:column1, timestamp=1263534411333, value=value2row3 column=family1:column1, timestamp=1263645465388, value=value3row4 column=family1:column1, timestamp=1264654615301, value=value4row5 column=family1:column1, timestamp=1265146569567, value=value55 row(s) in 0.0100 seconds

< 指令>

scan ‘ 表名’

54

範例四:範例四: Scan all ColumnScan all Columnstatic void ScanColumn(String tablename, String family, String

column) throws IOException {HBaseConfiguration conf = new HBaseConfiguration();HTable table = new HTable ( conf,

Bytes.toBytes(tablename));ResultScanner scanner = table.getScanner(

Bytes.toBytes(family));int i = 1;for (Result rowResult : scanner) {

byte[] by = rowResult.getValue( Bytes.toBytes(family),

Bytes.toBytes(column) );String str = Bytes.toString ( by );System.out.println("row " + i + " is \"" + str +"\"");i++;

}}}

< 程式碼>

55

範例五: 刪除資料表範例五: 刪除資料表

> disable 'tablename'0 row(s) in 6.0890 seconds> drop 'tablename' 0 row(s) in 0.0090 seconds0 row(s) in 0.0090 seconds0 row(s) in 0.0710 seconds

< 指令>

disable ‘ 表名’ drop ‘ 表名’

56

範例五: 刪除資料表範例五: 刪除資料表static void drop ( String tablename ) throws IOExceptions {

HBaseConfiguration conf = new HBaseConfiguration();HBaseAdmin admin = new HBaseAdmin (conf);if (admin.tableExists(tablename)) {

admin.disableTable(tablename);admin.deleteTable(tablename);

}else{System.out.println(" [" + tablename+ "] not

found!"); }}

< 程式碼>

57

HBase HBase 程式設計程式設計

MapReduceMapReduce 與與HBaseHBase 的搭配的搭配

58

範例六:範例六: WordCountHBaseWordCountHBase說明:

此程式碼將輸入路徑的檔案內的字串取出做字數統計 再將結果塞回 HTable 內運算方法:

將此程式運作在 hadoop 0.20 平台上,用 ( 參考 2) 的方法加入 hbase 參數後,將此程式碼打包成 XX.jar

結果:> scan 'wordcount'

ROW COLUMN+CELL am column=content:count, timestamp=1264406245488, value=1

chen column=content:count, timestamp=1264406245488, value=1 hi, column=content:count, timestamp=1264406245488,

value=2注意:1. 在 hdfs 上來源檔案的路徑為 "/user/$YOUR_NAME/input"

請注意必須先放資料到此 hdfs 上的資料夾內,且此資料夾內只能放檔案,不可再放資料夾

2. 運算完後,程式將執行結果放在 hbase 的 wordcount 資料表內

595959

範例六:範例六: WordCountHBaseWordCountHBasepublic class WordCountHBase{ public static class Map extends

Mapper<LongWritable,Text,Text, IntWritable>

{ private IntWritable i = new

IntWritable(1); public void map(LongWritable

key,Text value,Context context) throws IOException, InterruptedException

{ String s[] =

value.toString().trim().split(" "); for( String m : s) { context.write(new Text(m), i);}}}

public static class Reduce extends TableReducer<Text, IntWritable, NullWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

int sum = 0; for(IntWritable i : values) { sum += i.get(); }

Put put = new Put(Bytes.toBytes(key.toString()));

put.add(Bytes.toBytes("content"), Bytes.toBytes("count"), Bytes.toBytes(String.valueOf(sum)));

context.write(NullWritable.get(), put);

} }

<1>

60

範例六:範例六: WordCountHBaseWordCountHBase public static void

createHBaseTable(String tablename)throws IOException

{ HTableDescriptor htd = new

HTableDescriptor(tablename); HColumnDescriptor col = new

HColumnDescriptor("content:"); htd.addFamily(col); HBaseConfiguration config = new

HBaseConfiguration(); HBaseAdmin admin = new

HBaseAdmin(config); if(admin.tableExists(tablename)) { admin.disableTable(tablename); admin.deleteTable(tablename); } System.out.println("create new table:

" + tablename); admin.createTable(htd); }

public static void main(String args[]) throws Exception

{

String tablename = "wordcount";

Configuration conf = new Configuration();

conf.set(TableOutputFormat.OUTPUT_TABLE, tablename);

createHBaseTable(tablename);

String input = args[0];

Job job = new Job(conf, "WordCount " + input);

job.setJarByClass(WordCountHBase.class);

job.setNumReduceTasks(3);

job.setMapperClass(Map.class);

job.setReducerClass(Reduce.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TableOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(input));

System.exit(job.waitForCompletion(true)?0:1);

}}

<2>

616161

範例七:範例七: LoadHBaseMapper LoadHBaseMapper 說明:

此程式碼將 HBase 的資料取出來,再將結果塞回 hdfs 上運算方法:

將此程式運作在 hadoop 0.20 平台上,用 ( 參考 2) 的方法加入 hbase 參數後,將此程式碼打包成 XX.jar

結果: $ hadoop fs -cat <hdfs_output>/part-r-00000 ---------------------------

54 30 31 GunLong54 30 32 Esing54 30 33 SunDon54 30 34 StarBucks

---------------------------注意:1. 請注意 hbase 上必須要有 table, 並且已經有資料2. 運算完後,程式將執行結果放在你指定 hdfs 的 <hdfs_output> 內 請注意 沒有 <hdfs_output> 資料夾

62

範例七:範例七: LoadHBaseMapperLoadHBaseMapperpublic class LoadHBaseMapper {public static class HtMap extends

TableMapper<Text, Text> {public void

map(ImmutableBytesWritable key, Result value,

Context context) throws IOException, InterruptedException {String res = Bytes.toString(value.getValue(Bytes.toBytes("Detail"),

Bytes.toBytes("Name")));context.write(new Text(key.toString()), new Text(res));

}}

public static class HtReduce extends Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterable<Text> values, Context context)

throws IOException, InterruptedException {

String str = new String("");

Text final_key = new Text(key);

Text final_value = new Text();

for (Text tmp : values) {

str += tmp.toString(); }

final_value.set(str);

context.write(final_key, final_value);

}}

<1>

63

範例七: 範例七: LoadHBaseMapperLoadHBaseMapperpublic static void main(String

args[]) throws Exception {String input = args[0];String tablename = "tsmc";Configuration conf = new

Configuration();Job job = new Job (conf,

tablename + " hbase data to hdfs");

job.setJarByClass (LoadHBaseMapper.class);

TableMapReduceUtil. initTableMapperJob

(tablename, myScan, HtMap.class,Text.class, Text.class, job);

job.setMapperClass (HtMap.class);

job.setReducerClass (HtReduce.class);

job.setMapOutputKeyClass (Text.class);

job.setMapOutputValueClass (Text.class);

job.setInputFormatClass ( TableInputFormat.class);

job.setOutputFormatClass ( TextOutputFormat.class);

job.setOutputKeyClass( Text.class);

job.setOutputValueClass( Text.class);

FileOutputFormat.setOutputPath ( job, new Path(input));

System.exit (job.waitForCompletion (true) ? 0 : 1);

}}

<2>

Recommended