Download pdf - Building apps with HBase - Data Days Texas March 2013

1

Headline Goes HereSpeaker Name or Subhead Goes Here

Building applica2ons with HBaseAmandeep Khurana | Solu7ons ArchitectData Days Texas, Aus2n, March 2013

Monday, April 1, 13

About me

• Solu7ons Architect, Cloudera Inc• Amazon Web Services• Interested in large scale distributed systems• Co-‐author, HBase In Ac7on• TwiGer: amansk

2

Monday, April 1, 13

I’m a wanna-‐be (author)

www.hbaseinac7on.com

3

M A N N I N G

Nick DimidukAmandeep Khurana

Monday, April 1, 13

http://www.hbaseinaction.com

http://www.hbaseinaction.com

About the workshop

• Install HBase• Basic interac7ons with the shell• Use of HBase Java API• Data model• HBase and Hadoop integra7on

4

Monday, April 1, 13

Get HBase and tutorial instruc7ons

• Get it here -‐> hGp://hbase.apache.org/• Downloads -‐> Choose a mirror -‐> 0.94.1• Download the file hbase-‐0.94.1.tar.gz

• Get the tutorial handout -‐> hGp://people.apache.org/~ak/ddtx13

5

Monday, April 1, 13

http://hbase.apache.org/


http://people.apache.org/~ak/

http://people.apache.org/~ak/

6

... except another marke7ng term

What’s Big Data?

Monday, April 1, 13

Data management

7

Monday, April 1, 13

Data has changed

8

DAT

A G

RO

WTH

END-USER APPLICATIONS THE INTERNET

MOBILE DEVICES SOPHISTICATED

MACHINES

STRUCTURED DATA – 10%

1980 2012

UNSTRUCTURED DATA – 90%

Monday, April 1, 13

“Big Data”

9

Monday, April 1, 13

Hadoop -‐ Big Data poster child

10

HDFS, MAPREDUCE

Monday, April 1, 13

Hadoop -‐ Big Data poster child

10

Coordination

Data Integration Fast Read/Write Access

Languages / Compilers

Workflow Scheduling Metadata

APACHE ZOOKEEPER

APACHE FLUME, APACHE SQOOP APACHE HBASE

APACHE PIG, APACHE HIVE, APACHE MAHOUT

APACHE OOZIE APACHE OOZIE APACHE HIVE

File System Mount UI Framework Machine Learning FUSE-DFS HUE APACHE MAHOUT

HDFS, MAPREDUCE

Monday, April 1, 13

What is HBase?

• Scalable, distributed data store• Sorted map of maps• Open source avatar of Google’s Bigtable• Sparse• Mul7 dimensional• Tightly integrated with Hadoop• Not a RDBMS

11

Monday, April 1, 13

So, what’s the big deal?

• Give up system complexity and sophis7cated features for ease of scale and manageability

• Simple system != easy to build applica7ons on top• Goal: predictable read and write performance at scale

• Effec7ve API usage• Op7mal schema design• Understand internals and leverage at scale

12

Monday, April 1, 13

Use cases

• Canonical use case: storing crawl data and indices for search

13

1

1 Crawlers constantly scour the Internet for new pages.Those pages are stored as individual records in Bigtable. 3

2 A MapReduce job runs over the entire table, generatingsearch indexes for the Web Search application.

4

2

5

Web Searchpowered by Bigtable

Indexing the Internet

3 The user initiates a Web Search request.

4 The Web Search application queries the Search Indexesand retries matching documents directly from Bigtable.

5 Search results are presented to the user.

Searching the Internet

BigtableInternetsCrawlers

CrawlersCrawlers

Crawlers

MapReduce

You

Search IndexSearch

IndexSearch Index

Web Search

Monday, April 1, 13

Use cases (con7nued)

• Capturing incremental data• Content serving• Informa7on exchange

14

Monday, April 1, 13

Ok, enough of high level stuff

• Big data is a buzzword but there is more to it• New products, applica7ons, types of data

• Hadoop ecosystem is the poster child of big data architectures• Consists of several components

• HBase is a part of the ecosystem• Scalable database• Lots of produc7on deployments

15

Monday, April 1, 13

16

Installa7on and basic interac7on

HBase -‐ GeEng Started

Monday, April 1, 13

Installing HBase

• Get it here -‐> hGp://hbase.apache.org/• Downloads -‐> Choose a mirror -‐> 0.94.1• Download the file hbase-‐0.94.1.tar.gz• Untar it

• mkdir ~/ddtx• cp hbase-‐0.94.1.tar.gz ~/ddtx• cd ~/ddtx• tar xvf hbase-‐0.94.1.tar.gz

17

Monday, April 1, 13



Basics

• Modes of opera2on• Standalone• Pseudo-‐distributed• Fully distributed

• Start up in standalone mode• cd hbase-‐0.94.1• bin/start-‐hbase.sh• Logs are in ./logs/

• Check out hSp://localhost:60010 in your browser

18

Monday, April 1, 13

Shell

• Basic interac7on tool• WriGen in JRuby• Uses the Java client library• Good place to start poking around

19

Monday, April 1, 13

Shell (con7nued)

• bin/hbase shell• Create table

• create ‘users’ , ‘info’• List tables

• list• Describe table

• describe ‘users’

20

Monday, April 1, 13

Shell (con7nued)

• Put a row• put ‘users’ , ‘u1’, ‘info:name’ , ‘Mr. Rogers’

• Get a row• get ‘users’ , ‘u1’

• Put more• put ‘users’ , ‘u1’ , ‘info:email’ , ‘[email protected]’• put ‘users’ , ‘u1’ , ‘info:password’ , ‘foobar’

• Get a row• get ‘users’ , ‘u1’

• Scan table• scan ‘users’

21

Monday, April 1, 13

22

... can’t really build an applica7on just with the shell

Java API

Monday, April 1, 13

Important terms• Table

• Consists of rows and columns• Row

• Has a bunch of columns.• Iden2fied by a rowkey (primary key)

• Column Qualifier• Dynamic column name

• Column Family• Column groups -‐ logical and physical (Similar access paSern)

• Cell• The actual element that contains the data for a row-‐column intersec2on

• Version• Every cell has mul2ple versions.

23

Monday, April 1, 13

Introducing TwitBase

• TwiSer clone built on HBase• Designed to scale• It’s an example, not a produc2on app• Start with users and twits for now• Users

• Have profiles• Post twits

• Twits• Have text posted by users

• Get sample code at: hSps://github.com/hbaseinac2on/twitbase

24

Monday, April 1, 13

https://github.com/hbaseinaction/twitbase

https://github.com/hbaseinaction/twitbase

Connec7ng to HBase

• Using default confs:HTableInterface usersTable = new HTable("users");

• Passing custom conf:Configuration myConf = HBaseConfiguration.create();myConf.set("parameter_name", "parameter_value");HTableInterface usersTable = new HTable(myConf, "users");

• Connection poolingHTablePool pool = new HTablePool();HTableInterface usersTable = pool.getTable("users"); // work with the tableusersTable.close();

25

Monday, April 1, 13

Inser7ng data

• Put call on HTablePut p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mark Twain")); p.add(Bytes.toBytes("info"), Bytes.toBytes("email"), Bytes.toBytes("[email protected]")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("Langhorne"));

usersTable.put(p);

26

Monday, April 1, 13

mailto:[email protected]

mailto:[email protected]

Data coordinates

• Row is addressed using rowkey• Cell is addressed using [rowkey + family + qualifier]

27

Monday, April 1, 13

Upda7ng data

•Update == WritePut p = new Put(Bytes.toBytes("TheRealMT"));

p.add(Bytes.toBytes("info"),

Bytes.toBytes("password"),

Bytes.toBytes("abc123"));

usersTable.put(p);

28

Monday, April 1, 13

Write path

29

Memstore

HFile

HFile

WAL

Write goes to the write-ahead log (WAL)as well as the in memory write buffer called

the Memstore.Client's don't interact directly with the

underlying HFiles during writes.Client

Memstore

New HFile

HFile

Memstore gets flushed to disk toform a new HFile

when filled up with writes.

HFile

Monday, April 1, 13

Reading data

• Get the en2re rowGet g = new Get(Bytes.toBytes("TheRealMT"));

Result r = usersTable.get(g);

• Get selected columnsGet g = new Get(Bytes.toBytes("TheRealMT"));

g.addColumn(Bytes.toBytes("info"),

Bytes.toBytes("password"));

Result r = usersTable.get(g);

• Extract the value out of the Result objectbyte[] b = r.getValue(Bytes.toBytes("info"),

Bytes.toBytes("email"));

String email = Bytes.toString(b);

30

Monday, April 1, 13

Reading data (con7nued)

• ScanScan s = new Scan(startRow, stopRow);

ResultsScanner rs = twits.getScanner(s);

for(Result r : rs) {

// iterate over scanner results

}

• Single threaded iteration over defined set of rows. Could be the entire table too.

31

Monday, April 1, 13

What happens when you read data?

32

Memstore

HFile

HFile

Client

BlockCache

Monday, April 1, 13

Dele7ng data

• Another simple API call• Delete en7re rowDelete d = new Delete(Bytes.toBytes("TheRealMT"));

usersTable.delete(d);

• Delete selected columnsDelete d = new Delete(Bytes.toBytes("TheRealMT"));

d.deleteColumns(Bytes.toBytes("info"),

Bytes.toBytes("email"));

usersTable.delete(d);

33

Monday, April 1, 13

Delete internals

• Tombstone markers• Dele7on only happens during major compac7ons• Compac7ons

• Minor• Major

34

Monday, April 1, 13

Compac7ons

35

Memstore

HFile

HFile

Memstore

CompactedHFile

Two or more HFiles get combinedto form a single HFile during

a compaction.

HFile HFile

Monday, April 1, 13

Hands-‐on: Java client to HBase

• Create a table from the HBase shell• create ‘users’ , ‘info’

• Write a program to do the following• Add 10 users to the ‘users’ table.

• userid as rowkey• user’s name in the column ‘info:name’• userid could just be user + serial number. eg: user1, user2, user3

• Get a user from the ‘users’ table and display on screen• Scan the en2re ‘users’ table and display on screen

36

Monday, April 1, 13

DAO

• Like any other applica7on• Makes data access management easy

• Cleaner code• Table info at a single place

• Op7miza7ons like connec7on pooling etc• Easier management of database configs

37

Monday, April 1, 13

AdminHBaseAdmin

• Administra7ve API to the cluster.

• Provides an interface to manage HBase database tables metadata and general administra7ve func7ons.

Table Opera7ons

• createTable()

• deleteTable()

• addColumn()

• modifyColumn()

• deleteColumn()

• listTables()

Monday, April 1, 13

39

It’s not a rela7onal database system

Data Model

Monday, April 1, 13

HBase is ...

• Column family oriented database• Column family oriented• Tables consis2ng of rows and columns

• Persisted Map• Sparse• Mul2 dimensional• Sorted• Indexed by rowkey, column and 2mestamp

• Key Value store• [rowkey, col family, col qualifier, 2mestamp] -‐> cell value

40

Monday, April 1, 13

HBase is not ...

• A rela7onal database• No SQL query language• No joins• No secondary indexing• No transac7ons

41

Monday, April 1, 13

Tabular representa7on

42

Column Family - Info

password

abc123

abc123

abc123

[email protected]

TheRealMT

HMS_Surprise

[email protected]

Sir Arthur Conan Doyle [email protected]

[email protected]

SirDoyle

GrandpaD

Fyodor Dostoyevsky

Patrick O'Brien

Mark Twain

Rowkey name email

Cells

Each cell has multiple versions,

typically represented by the timestamp

of when they were inserted into the table

(ts2>ts1)

ts1=1329088321289 ts2=1329088818321

The table is lexicographicallysorted on the rowkeys

Langhorneabc123

12

3

4

The coordinates used to identify data in an HBase table are:(1) rowkey, (2) column family, (3) column qualifier, (4) version

Monday, April 1, 13

Key-‐Value store

43

[TheRealMT, info, password, 1329088818321]

[TheRealMT, info, password, 1329088321289]

abc123

Langhorne

Keys Values

A single KeyValue instance

Monday, April 1, 13

Key-‐Value store

44

[TheRealMT, info, password, 1329088818321] abc123

[TheRealMT, info, password] 1329088818321 : "abc123",1329088321289 : "Langhorne"

}

{

[TheRealMT, info] },

"name" : { 1329088321289 : "Mark Twain"

"email" : { 1329088321289 : "[email protected]" },

"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } }

{

},

"info" : {

"name" : { 1329088321289 : "Mark Twain"


"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } }

{

[TheRealMT]

Keys

Values

Start with coordinates of full precision1

Drop version and you're left with a map of version to values2

Omit qualifier and you have a map of qualifiers to the previous maps3

Finally, drop the column family and you have a row, a map of maps4

Monday, April 1, 13

Sorted map of maps

45

Rowkey

Column family

Column qualifiers

Versions

Values

{

},

"TheRealMT" : { "info" : {

"name" : { 1329088321289 : "Mark Twain"


"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } }}

Monday, April 1, 13

HFiles and physical data model

• HFiles are• Immutable• Sorted on rowkey + qualifier + 7mestamp• In the context of a column family per region

46

, , , "TheRealMT" "info" "password" , 1329088321289 "Langhorne", , , "TheRealMT" "info" "password" , 1329088818321 "abc123",

, , , , "TheRealMT" "info" "email" 1329088321289 "[email protected]", , , , "TheRealMT" "info" "name" 1329088321289 "Mark Twain"

HFile for the info column family in the users table

Monday, April 1, 13

47

... what really goes on under the hood

HBase in distributed mode

Monday, April 1, 13

Distributed mode

• Table is split into Regions

48

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988



Full table T1 Table T1 split into 3 regions - R1, R2 and R3

T1R1

T1R2

T1R3

Monday, April 1, 13

Distributed mode (con7nued)

• Regions are served by RegionServers

• RegionServers are Java processes, typically collocated with the DataNodes

49

DataNode RegionServer



00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988



Host1

Host2

Host3

T1R1

T1R2

T1R3

Monday, April 1, 13

Finding your region

• Two special tables: -‐ROOT-‐ and .META.

50

-ROOT-

M1 M2 M3

-ROOT- table, consisting of only 1 regionhosted on region server RS1

.META. table, consisting of 3 regions,hosted on RS1, RS2 and RS3

User tables T1 and T2, consisting of4 and 3 regions respectively,

distributed amongst RS1, RS2 and RS3

RS1

RS2

T2R1

RS2

T2R2

RS2

T2R3

RS1

RS1RS3

T1R1

RS3

T1R3

RS3

T1R2

RS1

T2R4

RS1

Monday, April 1, 13

Regions distributed across the cluster

51


00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988



Host1

Host2

M:T100001-M:T100009 M1 RS2M:T100010-M:T100012 M2 RS1

T1:00001-T1:00004 R1 RS1T1:00005-T1:00008 R2 RS2

T1:00009-T1:00012 R3 RS2

T1-R1

T1-R2

T1-R3

.META. - M1

-ROOT-

.META. - M2

RegionServer 1 (RS1), hosting regions R1 of the user table T1 and M2 of .META.

RegionServer 2 (RS2), hosting regions R2, R3 of the user table and M1 of .META.

RegionServer 3 (RS3), hosting just -ROOT-

Host3



Monday, April 1, 13

What the client does

52

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Monday, April 1, 13


52

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Monday, April 1, 13


52

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Row per META region

Monday, April 1, 13


52

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Row per META region

Row per table region

Monday, April 1, 13


52

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Row per META region

Row per table region

Monday, April 1, 13

53

... they play well together

HBase and Hadoop

Monday, April 1, 13

HBase and Hadoop

• HBase has 7ght integra7on with Hadoop• Two separate concerns

• Access via MapReduce• Storage on HDFS

54

Monday, April 1, 13

Parallel processing

55

Monday, April 1, 13

MapReduce

56

Monday, April 1, 13

HBase and MapReduce

• HBase is 7ghtly integrated with the Hadoop stack• MapReduce• HDFS

• Source for MapReduce jobs• Read from an HBase Table with a Map-‐Reduce Job (Export, Row Count, ...)

• Sink for MapReduce jobs• Wri7ng to an HBase Table from a Map-‐Reduce Job

• Lookups during MapReduce jobs• Join between hbase and/or another source

57

Monday, April 1, 13

HBase as MR source

• One mapper per region

58

RegionServer

RegionServer

RegionServer

….

Map Tasks - one per region.The tasks take the key range

of the region as their input splitand scan over it

Regions beingserved by theRegion Server

Monday, April 1, 13

HBase as MR source (con7nued)

• Mapperprotected void map(ImmutableBytesWritable rowkey, Result result,

Context context) { // mapper logic goes here }

• Job setup Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("twits"), Bytes.toBytes("twit")); TableMapReduceUtil.initTableMapperJob("twits", scan, Map.class, ImmutableBytesWritable.class, Result.class, job);

59

Monday, April 1, 13

HBase as MR sink

• Write to regions via MR job

60

RegionServer

RegionServer

RegionServer

….

Reduce tasks writing to HBaseregions. Reduce tasks don't necessarily

write to a region on the same physical host. They will write to whichever region contains

the key range that they are writing into.This could potentially mean that all reduce

tasks talk to all regions on the cluster


Monday, April 1, 13

HBase as MR sink (con7nued)

• Reducerprotected void reduce(ImmutableBytesWritable rowkey,

Iterable<Put> values, Context context) { // reduce logic goes here }

• Job setupTableMapReduceUtil.initTableReducerJob("users",

Reducer.class, job);

61

Monday, April 1, 13

HBase as a shared resource

• Lookups from MR jobs• Map-‐side joins• Simple Get/Scan calls

62

Datanode

RegionServer

Datanode

RegionServer

Datanode

RegionServer

….

Map Tasks readingfiles from HDFS as

their sourceand doing randomgets or short scans

from HBase


HBase

Monday, April 1, 13

Integra7on with HDFS

• HDFS is a filesystem• Underlying storage fabric for HBase• HBase 7ghtly integrated with HDFS and leverages the seman7cs

• LSM trees fit well into a write once filesystem• Availability (fault tolerance) and reliability (durability) at scale

63

Monday, April 1, 13

Availability and fault tolerance

64

HDFS

Physical host

Region Server

Datanode

Region Server Region Server Region Server

HDFS

Datanode Datanode Datanode

HFile1 HFile1 HFile1

Physical host Physical host Physical host

Region Server

Datanode

Region Server Region Server Region Server

Datanode Datanode Datanode

HFile1 HFile1 HFile1

Physical host Physical host Physical host

Serving region R1

Serving region R1

Physical host

Disaster strikes onthe host which was serving

region R1

Another server picks up theregion and starts serving it

from the persisted HFilefrom HDFS. The lost copy ofthe HFile will get replicated

to another Datanode

Monday, April 1, 13

Durability

• WAL keeps writes intact even if RS fails• HDFS is reliable and doesn’t lose data if individual nodes fail

65

Monday, April 1, 13

Summary

• Big data = new products, applica7ons• Hadoop ecosystem -‐ poster child• HBase serves many use cases• Allows to solve aspects of big data problems• Simple Java API• Tight Hadoop integra7on• Low latency access to large amounts of data• Large scale computa7on using MR

66

Monday, April 1, 13

67

Monday, April 1, 13

68

... it’s a database aper-‐all

Schema design

Monday, April 1, 13

But isn’t HBase schema-‐less?

• Number of tables• Rowkey design • Number of column families per table. What goes into what column family

• Column qualifier names• What goes into the cells• Number of versions

69

Monday, April 1, 13

Rowkeys

• Rowkey design is the single most important aspect of HBase table designs

• The only way to address rows in HBase

70

Monday, April 1, 13

TwitBase rela7onships

• Users follow users• Rela2onships need to be persisted for usage later on• Model tables for the expected access paSerns• Read paSern

• Who does A follow?• Who follows A?• Does A follow B?

• Write paSern• A follows B• A unfollows B

71

Monday, April 1, 13

Start simple

• Adjacency list

72

Column Family : followsrow key:userid

cell value: followed userid

column qualifier: followed user number

4:HRogers1:HRogers

3:Olivia1:TheRealMTTheFakeMT2:Olivia

2:MTFanBoyTheRealMT

followsCol Qualifier

Cell value

Monday, April 1, 13

Op7mizing the adjacency list

• We need a count• Where does a new followed user go?

73

2:Olivia count:2count:4TheFakeMT 4:HRogers3:Olivia2:MTFanBoy1:TheRealMT

TheRealMT 1:HRogers

follows

Monday, April 1, 13

Adding a new user

74

2:Olivia count:2count:4TheFakeMT 4:HRogers3:Olivia2:MTFanBoy1:TheRealMT

TheRealMT 1:HRogers

follows

Row that needs to be updated

Client code:Step 1: Get current countStep 2: Update countStep 3: Add new entryStep 4: Write the new data to HBase

1

2

TheFakeMT : follows: {count -> 4}

increment count

TheFakeMT : follows: {count -> 5}

3 add new entry

TheFakeMT : follows: {5 -> MTFanBoy2, count -> 5}

count:52:Olivia count:2

5:MTFanBoy2TheFakeMT 4:HRogers3:Olivia2:MTFanBoy1:TheRealMTTheRealMT 1:HRogers

follows

4

Monday, April 1, 13

Transac7ons == not good

• HBase doesn’t have na7ve support (think scale)• Don’t want to complicate client side logic• Only solu7on -‐> simplify schema

75

Olivia:1MTFanBoy:1TheFakeMTOlivia:1HRogers:1

HRogers:1TheRealMT:1TheRealMT

follows

Monday, April 1, 13

Revisit the ques7ons

• Read paGern• Who all does A follow?• Who all follows A?• Does A follow B?

• Write paGern• A follows B• A unfollows B

76

Monday, April 1, 13

Revisit the ques7ons

76

Monday, April 1, 13

Denormaliza7on

• Second table for reverse rela7onship• Otherwise scan across en7re table and affect read performance

77

DenormalizationPoor design

DreamlandNormalization

Read performance

Writ

e pe

rform

ance

Monday, April 1, 13

More op7miza7ons

• Convert into tall-‐narrow table• Leverage rowkey indexing beGer• Gets -‐> short Scans

78

CF : f

row key:follower + followed

cell value: 1

CQ: followed user's nameThe + in the row key refers to concatenating

the two values. You could delimitateusing any character you like.

eg: A-B or A,B

Keeping the column family and column qualifiernames short reduces the data transferred over thenetwork back to the client. The KeyValue objects

become smaller.

Monday, April 1, 13

Tall-‐narrow table example

• Denormaliza7on is the way to go

79

TheRealMT+HRogers Henry Rogers:1Olivia Clemens:1TheRealMT+Olivia

Amandeep Khurana:1TheFakeMT+MTFanBoyOlivia Clemens:1

Mark Twain:1TheFakeMT+TheRealMT

TheFakeMT+OliviaHenry Rogers:1TheFakeMT+HRogers

f Putting the user name in the columnqualifier saves you from looking upthe users table for the name of theuser given an id. You can simply

list out names or ids while lookingat relationships just from this table.

The downside of this is that you needto update the name in all the cellsif the user updates their name in

their profile.This is classic Denormalization.

Monday, April 1, 13

Uniform rowkey length

• MD5 the userids -‐> 16 bytes + 16 bytes rowkeys• BeGer distribu7on of load

80

CF : f

row key:md5(follower)md5(followed)

cell value: followed users name

CQ: followed useridUsing MD5 of the user ids gives you

fixed lengths instead of variablelength user ids. You don't needconcatenation logic anymore.

Monday, April 1, 13

Uniform rowkey length (con7nued)

81

MD5(TheRealMT) MD5(HRogers) HRogers:Henry RogersOlivia:Olivia ClemensMD5(TheRealMT) MD5(Olivia)

MTFanBoy:Amandeep KhuranaMD5(TheFakeMT) MD5(MTFanBoy)Olivia:Olivia Clemens

TheRealMT:Mark TwainMD5(TheFakeMT) MD5(TheRealMT)

MD5(TheFakeMT) MD5(Olivia)HRogers:Henry RogersMD5(TheFakeMT) MD5(HRogers)

f

Monday, April 1, 13

Tall v/s Wide tables storage footprint

82

r5 c3:v5 c7:v8c1:v1

r4 c2:v4

c2:v3 c5:v6r3

c1:v2 c3:v6r2

c6:v2c1:v9c1:v1r1

CF1 CF2

r1:CF1:c1:t1:v1r2:CF1:c1:t2:v2r2:CF1:c3:t3:v6r3:CF1:c2:t1:v3r4:CF1:c2:t1:v4r5:CF1:c1:t2:v1r5:CF1:c3:t3:v5

r1:CF2:c1:t1:v9r1:CF2:c6:t4:v2r3:CF2:c5:t4:v6r5:CF2:c7:t3:v8

HFile for CF1 HFile for CF2

r5:cf2:c7:t3:v8r5:CF1:c3:t3:v5r5:CF1:c1:t2:v1

Result object returned for a Get() on row r5

KeyValue objects

Cell Value

TimeStamp

Col Qual

Col Fam

Row Key

Key Value

Logical representation of an HBase table.We'll look at what it means to Get() row r5 from this table. Actual physical storage of the table

Structure of a KeyValue object

Monday, April 1, 13

Rowkey design

• Single most important aspect of designing tables• Depends on expected access paGerns• HFiles are sorted on Key part of KeyValue objects

83

, , , "TheRealMT" "info" "password" , 1329088321289 "Langhorne", , , "TheRealMT" "info" "password" , 1329088818321 "abc123",

, , , , "TheRealMT" "info" "email" 1329088321289 "[email protected]", , , , "TheRealMT" "info" "name" 1329088321289 "Mark Twain"

HFile for the info column family in the users table

Monday, April 1, 13

Write op7mized

• Distribute writes across the cluster• Issue most pronounced with 7me series data

• Hashinghash("TheRealMT") -> random byte[]

• Sal7ngint salt = new Integer(new Long(timestamp).hashCode()).shortValue() % <number of region servers>;byte[] rowkey = Bytes.add(Bytes.toBytes(salt) + Bytes.toBytes("|") + Bytes.toBytes(timestamp));

84

Monday, April 1, 13

Read op7mized

• Data to be accessed together should be stored together• eg: twit streams -‐ last 10 twits by the users I follow

85

Olivia1Olivia2Olivia5Olivia7Olivia9TheFakeMT2TheFakeMT3TheFakeMT4TheFakeMT5TheFakeMT6TheRealMT1TheRealMT2TheRealMT5TheRealMT8

1Olivia1TheRealMT2Olivia2TheFakeMT2TheRealMT3TheFakeMT4TheFakeMT5Olivia5TheFakeMT5TheRealMT6TheFakeMT7Olivia8TheRealMT9Olivia

Monday, April 1, 13

Rela7onal to Non rela7onal

• Rela7onal concepts• En77es• AGributes• Rela7onships

• En77es• Table is a table. Not much going on there• Users table contains... users. Those are en77es

• Good place to start. Denormaliza7on bends that rule a bit

86

Monday, April 1, 13

Rela7onal to Non rela7onal (con7nued)

• AGributes• Iden7fying

• Primary keys. Compound keys• Maps to rowkeys

• Non-‐iden7fying• Other columns• Maps to column qualifiers and cells

• Rela7onships

87

Monday, April 1, 13

Nested En77es

• Column Qualifiers can contain data instead of just a column name

88

hbase tablerow key

column family

repeating entity

fixed qualifier → timestamp → value

variable qualifier → timestamp → value

Nested entities

Monday, April 1, 13

Schema design summary

• Schema can make or break the performance you get• Rowkey is the single most important thing

• Use tricks like hashing and sal2ng• Denormalize to your advantage

• There are no joins• Isolate access paSerns

• Separate CFs or even separate tables• Shorter names -‐> lower storage footprint• Column qualifiers can be used to store data and not just column names

• Big difference as compared to RDBMS

89

Monday, April 1, 13

90

... for some real stuff, beyond your laptop

Deployments and Opera2ons

Monday, April 1, 13

Components

• HDFS (DataNodes)• Storage

• ZooKeeper• Membership management

• RegionServers• Serve the regions

• HBase Masters• Janitorial work

91

Monday, April 1, 13

Hardware

• DataNodes and RegionServers collocated• Plenty of RAM and Disk• 8-‐12 cores• 48 GB RAM• 12 x 1 TB disks

92

Monday, April 1, 13

Hardware (con7nued)

• HBase Master and ZooKeeper collocated• Don’t necessarily need to be though• Keep odd number of ZKs

• ~3 is good enough for upto 50-‐70 node cluster typically

• Give ZK independent spindle for log wri7ng• I/O latency sensi7ve

• 4-‐6 cores• 24 GB RAM

93

Monday, April 1, 13

Types of deployments• Standalone for dev purpose• Prototype

• < 10 nodes• No real SLAs• 1 ZK + Master, rest are RS

• Small produc2on cluster• 10-‐20 nodes• 1 ZK + Master, maybe NNHA, rest are RS

• Medium produc2on cluster• 20-‐50 nodes• 3 ZK + Master, NNHA, rest are RS

• Large produc2on cluster• 50+ nodes• 3 or 5 ZK + Master, NNHA, rest are RS

• Hardware choices remain the same mostly

94

Monday, April 1, 13

Deploying sopware and configura7on

• Automated sopware bits deployment highly recommended• Chef• Puppet• Cloudera Manager

• Centralized configura7on management highly recommended• Consistency• Ease of management

95

Monday, April 1, 13

Configura7ons

• $HBASE_HOME/conf• hbase-‐env.sh

• Environment configura2ons

• hbase-‐site.xml• HBase system configura2ons

• Linux configura2ons• Number of open files• swappiness

• HDFS configura2ons• $HADOOP_HOME/hdfs-‐site.xml

96

Monday, April 1, 13

Monitoring

• Metrics context from Hadoop• Ganglia• File context

• JMX• Cac7

• Several metrics available• Write path specific• Read path specific

97

Monday, April 1, 13

Monitoring (con7nued)

98

Monday, April 1, 13

Performance tuning

• Performance tes2ng• Use real (or synthe2c) workloads• YCSB

• Tuning• Write op2mized

• Memstore

• Random-‐read op2mized• Block cache• Smaller block size

• Sequen2al-‐read op2mized• Block cache• Higher block size

• GC tuning

99

Monday, April 1, 13

100

Monday, April 1, 13