1
Headline Goes HereSpeaker Name or Subhead Goes Here
Building applica2ons with HBaseAmandeep Khurana | Solu7ons ArchitectData Days Texas, Aus2n, March 2013
Monday, April 1, 13
About me
• Solu7ons Architect, Cloudera Inc• Amazon Web Services• Interested in large scale distributed systems• Co-‐author, HBase In Ac7on• TwiGer: amansk
2
Monday, April 1, 13
I’m a wanna-‐be (author)
www.hbaseinac7on.com
3
M A N N I N G
Nick DimidukAmandeep Khurana
Monday, April 1, 13
About the workshop
• Install HBase• Basic interac7ons with the shell• Use of HBase Java API• Data model• HBase and Hadoop integra7on
4
Monday, April 1, 13
Get HBase and tutorial instruc7ons
• Get it here -‐> hGp://hbase.apache.org/• Downloads -‐> Choose a mirror -‐> 0.94.1• Download the file hbase-‐0.94.1.tar.gz
• Get the tutorial handout -‐> hGp://people.apache.org/~ak/ddtx13
5
Monday, April 1, 13
6
... except another marke7ng term
What’s Big Data?
Monday, April 1, 13
Data management
7
Monday, April 1, 13
Data has changed
8
DAT
A G
RO
WTH
END-USER APPLICATIONS THE INTERNET
MOBILE DEVICES SOPHISTICATED
MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
Monday, April 1, 13
“Big Data”
9
Monday, April 1, 13
Hadoop -‐ Big Data poster child
10
HDFS, MAPREDUCE
Monday, April 1, 13
Hadoop -‐ Big Data poster child
10
Coordination
Data Integration Fast Read/Write Access
Languages / Compilers
Workflow Scheduling Metadata
APACHE ZOOKEEPER
APACHE FLUME, APACHE SQOOP APACHE HBASE
APACHE PIG, APACHE HIVE, APACHE MAHOUT
APACHE OOZIE APACHE OOZIE APACHE HIVE
File System Mount UI Framework Machine Learning FUSE-DFS HUE APACHE MAHOUT
HDFS, MAPREDUCE
Monday, April 1, 13
What is HBase?
• Scalable, distributed data store• Sorted map of maps• Open source avatar of Google’s Bigtable• Sparse• Mul7 dimensional• Tightly integrated with Hadoop• Not a RDBMS
11
Monday, April 1, 13
So, what’s the big deal?
• Give up system complexity and sophis7cated features for ease of scale and manageability
• Simple system != easy to build applica7ons on top• Goal: predictable read and write performance at scale
• Effec7ve API usage• Op7mal schema design• Understand internals and leverage at scale
12
Monday, April 1, 13
Use cases
• Canonical use case: storing crawl data and indices for search
13
1
1 Crawlers constantly scour the Internet for new pages.Those pages are stored as individual records in Bigtable. 3
2 A MapReduce job runs over the entire table, generatingsearch indexes for the Web Search application.
4
2
5
Web Searchpowered by Bigtable
Indexing the Internet
3 The user initiates a Web Search request.
4 The Web Search application queries the Search Indexesand retries matching documents directly from Bigtable.
5 Search results are presented to the user.
Searching the Internet
BigtableInternetsCrawlers
CrawlersCrawlers
Crawlers
MapReduce
You
Search IndexSearch
IndexSearch Index
Web Search
Monday, April 1, 13
Use cases (con7nued)
• Capturing incremental data• Content serving• Informa7on exchange
14
Monday, April 1, 13
Ok, enough of high level stuff
• Big data is a buzzword but there is more to it• New products, applica7ons, types of data
• Hadoop ecosystem is the poster child of big data architectures• Consists of several components
• HBase is a part of the ecosystem• Scalable database• Lots of produc7on deployments
15
Monday, April 1, 13
16
Installa7on and basic interac7on
HBase -‐ GeEng Started
Monday, April 1, 13
Installing HBase
• Get it here -‐> hGp://hbase.apache.org/• Downloads -‐> Choose a mirror -‐> 0.94.1• Download the file hbase-‐0.94.1.tar.gz• Untar it
• mkdir ~/ddtx• cp hbase-‐0.94.1.tar.gz ~/ddtx• cd ~/ddtx• tar xvf hbase-‐0.94.1.tar.gz
17
Monday, April 1, 13
Basics
• Modes of opera2on• Standalone• Pseudo-‐distributed• Fully distributed
• Start up in standalone mode• cd hbase-‐0.94.1• bin/start-‐hbase.sh• Logs are in ./logs/
• Check out hSp://localhost:60010 in your browser
18
Monday, April 1, 13
Shell
• Basic interac7on tool• WriGen in JRuby• Uses the Java client library• Good place to start poking around
19
Monday, April 1, 13
Shell (con7nued)
• bin/hbase shell• Create table
• create ‘users’ , ‘info’• List tables
• list• Describe table
• describe ‘users’
20
Monday, April 1, 13
Shell (con7nued)
• Put a row• put ‘users’ , ‘u1’, ‘info:name’ , ‘Mr. Rogers’
• Get a row• get ‘users’ , ‘u1’
• Put more• put ‘users’ , ‘u1’ , ‘info:email’ , ‘[email protected]’• put ‘users’ , ‘u1’ , ‘info:password’ , ‘foobar’
• Get a row• get ‘users’ , ‘u1’
• Scan table• scan ‘users’
21
Monday, April 1, 13
22
... can’t really build an applica7on just with the shell
Java API
Monday, April 1, 13
Important terms• Table
• Consists of rows and columns• Row
• Has a bunch of columns.• Iden2fied by a rowkey (primary key)
• Column Qualifier• Dynamic column name
• Column Family• Column groups -‐ logical and physical (Similar access paSern)
• Cell• The actual element that contains the data for a row-‐column intersec2on
• Version• Every cell has mul2ple versions.
23
Monday, April 1, 13
Introducing TwitBase
• TwiSer clone built on HBase• Designed to scale• It’s an example, not a produc2on app• Start with users and twits for now• Users
• Have profiles• Post twits
• Twits• Have text posted by users
• Get sample code at: hSps://github.com/hbaseinac2on/twitbase
24
Monday, April 1, 13
Connec7ng to HBase
• Using default confs:HTableInterface usersTable = new HTable("users");
• Passing custom conf:Configuration myConf = HBaseConfiguration.create();myConf.set("parameter_name", "parameter_value");HTableInterface usersTable = new HTable(myConf, "users");
• Connection poolingHTablePool pool = new HTablePool();HTableInterface usersTable = pool.getTable("users"); // work with the tableusersTable.close();
25
Monday, April 1, 13
Inser7ng data
• Put call on HTablePut p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mark Twain")); p.add(Bytes.toBytes("info"), Bytes.toBytes("email"), Bytes.toBytes("[email protected]")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("Langhorne"));
usersTable.put(p);
26
Monday, April 1, 13
Data coordinates
• Row is addressed using rowkey• Cell is addressed using [rowkey + family + qualifier]
27
Monday, April 1, 13
Upda7ng data
•Update == WritePut p = new Put(Bytes.toBytes("TheRealMT"));
p.add(Bytes.toBytes("info"),
Bytes.toBytes("password"),
Bytes.toBytes("abc123"));
usersTable.put(p);
28
Monday, April 1, 13
Write path
29
Memstore
HFile
HFile
WAL
Write goes to the write-ahead log (WAL)as well as the in memory write buffer called
the Memstore.Client's don't interact directly with the
underlying HFiles during writes.Client
Memstore
New HFile
HFile
Memstore gets flushed to disk toform a new HFile
when filled up with writes.
HFile
Monday, April 1, 13
Reading data
• Get the en2re rowGet g = new Get(Bytes.toBytes("TheRealMT"));
Result r = usersTable.get(g);
• Get selected columnsGet g = new Get(Bytes.toBytes("TheRealMT"));
g.addColumn(Bytes.toBytes("info"),
Bytes.toBytes("password"));
Result r = usersTable.get(g);
• Extract the value out of the Result objectbyte[] b = r.getValue(Bytes.toBytes("info"),
Bytes.toBytes("email"));
String email = Bytes.toString(b);
30
Monday, April 1, 13
Reading data (con7nued)
• ScanScan s = new Scan(startRow, stopRow);
ResultsScanner rs = twits.getScanner(s);
for(Result r : rs) {
// iterate over scanner results
}
• Single threaded iteration over defined set of rows. Could be the entire table too.
31
Monday, April 1, 13
What happens when you read data?
32
Memstore
HFile
HFile
Client
BlockCache
Monday, April 1, 13
Dele7ng data
• Another simple API call• Delete en7re rowDelete d = new Delete(Bytes.toBytes("TheRealMT"));
usersTable.delete(d);
• Delete selected columnsDelete d = new Delete(Bytes.toBytes("TheRealMT"));
d.deleteColumns(Bytes.toBytes("info"),
Bytes.toBytes("email"));
usersTable.delete(d);
33
Monday, April 1, 13
Delete internals
• Tombstone markers• Dele7on only happens during major compac7ons• Compac7ons
• Minor• Major
34
Monday, April 1, 13
Compac7ons
35
Memstore
HFile
HFile
Memstore
CompactedHFile
Two or more HFiles get combinedto form a single HFile during
a compaction.
HFile HFile
Monday, April 1, 13
Hands-‐on: Java client to HBase
• Create a table from the HBase shell• create ‘users’ , ‘info’
• Write a program to do the following• Add 10 users to the ‘users’ table.
• userid as rowkey• user’s name in the column ‘info:name’• userid could just be user + serial number. eg: user1, user2, user3
• Get a user from the ‘users’ table and display on screen• Scan the en2re ‘users’ table and display on screen
36
Monday, April 1, 13
DAO
• Like any other applica7on• Makes data access management easy
• Cleaner code• Table info at a single place
• Op7miza7ons like connec7on pooling etc• Easier management of database configs
37
Monday, April 1, 13
AdminHBaseAdmin
• Administra7ve API to the cluster.
• Provides an interface to manage HBase database tables metadata and general administra7ve func7ons.
Table Opera7ons
• createTable()
• deleteTable()
• addColumn()
• modifyColumn()
• deleteColumn()
• listTables()
Monday, April 1, 13
39
It’s not a rela7onal database system
Data Model
Monday, April 1, 13
HBase is ...
• Column family oriented database• Column family oriented• Tables consis2ng of rows and columns
• Persisted Map• Sparse• Mul2 dimensional• Sorted• Indexed by rowkey, column and 2mestamp
• Key Value store• [rowkey, col family, col qualifier, 2mestamp] -‐> cell value
40
Monday, April 1, 13
HBase is not ...
• A rela7onal database• No SQL query language• No joins• No secondary indexing• No transac7ons
41
Monday, April 1, 13
Tabular representa7on
42
Column Family - Info
password
abc123
abc123
abc123
TheRealMT
HMS_Surprise
Sir Arthur Conan Doyle [email protected]
SirDoyle
GrandpaD
Fyodor Dostoyevsky
Patrick O'Brien
Mark Twain
Rowkey name email
Cells
Each cell has multiple versions,
typically represented by the timestamp
of when they were inserted into the table
(ts2>ts1)
ts1=1329088321289 ts2=1329088818321
The table is lexicographicallysorted on the rowkeys
Langhorneabc123
12
3
4
The coordinates used to identify data in an HBase table are:(1) rowkey, (2) column family, (3) column qualifier, (4) version
Monday, April 1, 13
Key-‐Value store
43
[TheRealMT, info, password, 1329088818321]
[TheRealMT, info, password, 1329088321289]
abc123
Langhorne
Keys Values
A single KeyValue instance
Monday, April 1, 13
Key-‐Value store
44
[TheRealMT, info, password, 1329088818321] abc123
[TheRealMT, info, password] 1329088818321 : "abc123",1329088321289 : "Langhorne"
}
{
[TheRealMT, info] },
"name" : { 1329088321289 : "Mark Twain"
"email" : { 1329088321289 : "[email protected]" },
"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } }
{
},
"info" : {
"name" : { 1329088321289 : "Mark Twain"
"email" : { 1329088321289 : "[email protected]" },
"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } }
{
[TheRealMT]
Keys
Values
Start with coordinates of full precision1
Drop version and you're left with a map of version to values2
Omit qualifier and you have a map of qualifiers to the previous maps3
Finally, drop the column family and you have a row, a map of maps4
Monday, April 1, 13
Sorted map of maps
45
Rowkey
Column family
Column qualifiers
Versions
Values
{
},
"TheRealMT" : { "info" : {
"name" : { 1329088321289 : "Mark Twain"
"email" : { 1329088321289 : "[email protected]" },
"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } }}
Monday, April 1, 13
HFiles and physical data model
• HFiles are• Immutable• Sorted on rowkey + qualifier + 7mestamp• In the context of a column family per region
46
, , , "TheRealMT" "info" "password" , 1329088321289 "Langhorne", , , "TheRealMT" "info" "password" , 1329088818321 "abc123",
, , , , "TheRealMT" "info" "email" 1329088321289 "[email protected]", , , , "TheRealMT" "info" "name" 1329088321289 "Mark Twain"
HFile for the info column family in the users table
Monday, April 1, 13
47
... what really goes on under the hood
HBase in distributed mode
Monday, April 1, 13
Distributed mode
• Table is split into Regions
48
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Full table T1 Table T1 split into 3 regions - R1, R2 and R3
T1R1
T1R2
T1R3
Monday, April 1, 13
Distributed mode (con7nued)
• Regions are served by RegionServers
• RegionServers are Java processes, typically collocated with the DataNodes
49
DataNode RegionServer
DataNode RegionServer
DataNode RegionServer
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Host1
Host2
Host3
T1R1
T1R2
T1R3
Monday, April 1, 13
Finding your region
• Two special tables: -‐ROOT-‐ and .META.
50
-ROOT-
M1 M2 M3
-ROOT- table, consisting of only 1 regionhosted on region server RS1
.META. table, consisting of 3 regions,hosted on RS1, RS2 and RS3
User tables T1 and T2, consisting of4 and 3 regions respectively,
distributed amongst RS1, RS2 and RS3
RS1
RS2
T2R1
RS2
T2R2
RS2
T2R3
RS1
RS1RS3
T1R1
RS3
T1R3
RS3
T1R2
RS1
T2R4
RS1
Monday, April 1, 13
Regions distributed across the cluster
51
DataNode RegionServer
00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988
00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432
00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298
Host1
Host2
M:T100001-M:T100009 M1 RS2M:T100010-M:T100012 M2 RS1
T1:00001-T1:00004 R1 RS1T1:00005-T1:00008 R2 RS2
T1:00009-T1:00012 R3 RS2
T1-R1
T1-R2
T1-R3
.META. - M1
-ROOT-
.META. - M2
RegionServer 1 (RS1), hosting regions R1 of the user table T1 and M2 of .META.
RegionServer 2 (RS2), hosting regions R2, R3 of the user table and M1 of .META.
RegionServer 3 (RS3), hosting just -ROOT-
Host3
DataNode RegionServer
DataNode RegionServer
Monday, April 1, 13
What the client does
52
ZooKeeper
TheRealMT
-ROOT-
.META.
MyTable
Monday, April 1, 13
What the client does
52
ZooKeeper
TheRealMT
-ROOT-
.META.
MyTable
Monday, April 1, 13
What the client does
52
ZooKeeper
TheRealMT
-ROOT-
.META.
MyTable
Row per META region
Monday, April 1, 13
What the client does
52
ZooKeeper
TheRealMT
-ROOT-
.META.
MyTable
Row per META region
Row per table region
Monday, April 1, 13
What the client does
52
ZooKeeper
TheRealMT
-ROOT-
.META.
MyTable
Row per META region
Row per table region
Monday, April 1, 13
53
... they play well together
HBase and Hadoop
Monday, April 1, 13
HBase and Hadoop
• HBase has 7ght integra7on with Hadoop• Two separate concerns
• Access via MapReduce• Storage on HDFS
54
Monday, April 1, 13
Parallel processing
55
Monday, April 1, 13
MapReduce
56
Monday, April 1, 13
HBase and MapReduce
• HBase is 7ghtly integrated with the Hadoop stack• MapReduce• HDFS
• Source for MapReduce jobs• Read from an HBase Table with a Map-‐Reduce Job (Export, Row Count, ...)
• Sink for MapReduce jobs• Wri7ng to an HBase Table from a Map-‐Reduce Job
• Lookups during MapReduce jobs• Join between hbase and/or another source
57
Monday, April 1, 13
HBase as MR source
• One mapper per region
58
RegionServer
RegionServer
RegionServer
….
Map Tasks - one per region.The tasks take the key range
of the region as their input splitand scan over it
Regions beingserved by theRegion Server
Monday, April 1, 13
HBase as MR source (con7nued)
• Mapperprotected void map(ImmutableBytesWritable rowkey, Result result,
Context context) { // mapper logic goes here }
• Job setup Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("twits"), Bytes.toBytes("twit")); TableMapReduceUtil.initTableMapperJob("twits", scan, Map.class, ImmutableBytesWritable.class, Result.class, job);
59
Monday, April 1, 13
HBase as MR sink
• Write to regions via MR job
60
RegionServer
RegionServer
RegionServer
….
Reduce tasks writing to HBaseregions. Reduce tasks don't necessarily
write to a region on the same physical host. They will write to whichever region contains
the key range that they are writing into.This could potentially mean that all reduce
tasks talk to all regions on the cluster
Regions beingserved by theRegion Server
Monday, April 1, 13
HBase as MR sink (con7nued)
• Reducerprotected void reduce(ImmutableBytesWritable rowkey,
Iterable<Put> values, Context context) { // reduce logic goes here }
• Job setupTableMapReduceUtil.initTableReducerJob("users",
Reducer.class, job);
61
Monday, April 1, 13
HBase as a shared resource
• Lookups from MR jobs• Map-‐side joins• Simple Get/Scan calls
62
Datanode
RegionServer
Datanode
RegionServer
Datanode
RegionServer
….
Map Tasks readingfiles from HDFS as
their sourceand doing randomgets or short scans
from HBase
Regions beingserved by theRegion Server
HBase
Monday, April 1, 13
Integra7on with HDFS
• HDFS is a filesystem• Underlying storage fabric for HBase• HBase 7ghtly integrated with HDFS and leverages the seman7cs
• LSM trees fit well into a write once filesystem• Availability (fault tolerance) and reliability (durability) at scale
63
Monday, April 1, 13
Availability and fault tolerance
64
HDFS
Physical host
Region Server
Datanode
Region Server Region Server Region Server
HDFS
Datanode Datanode Datanode
HFile1 HFile1 HFile1
Physical host Physical host Physical host
Region Server
Datanode
Region Server Region Server Region Server
Datanode Datanode Datanode
HFile1 HFile1 HFile1
Physical host Physical host Physical host
Serving region R1
Serving region R1
Physical host
Disaster strikes onthe host which was serving
region R1
Another server picks up theregion and starts serving it
from the persisted HFilefrom HDFS. The lost copy ofthe HFile will get replicated
to another Datanode
Monday, April 1, 13
Durability
• WAL keeps writes intact even if RS fails• HDFS is reliable and doesn’t lose data if individual nodes fail
65
Monday, April 1, 13
Summary
• Big data = new products, applica7ons• Hadoop ecosystem -‐ poster child• HBase serves many use cases• Allows to solve aspects of big data problems• Simple Java API• Tight Hadoop integra7on• Low latency access to large amounts of data• Large scale computa7on using MR
66
Monday, April 1, 13
67
Monday, April 1, 13
68
... it’s a database aper-‐all
Schema design
Monday, April 1, 13
But isn’t HBase schema-‐less?
• Number of tables• Rowkey design • Number of column families per table. What goes into what column family
• Column qualifier names• What goes into the cells• Number of versions
69
Monday, April 1, 13
Rowkeys
• Rowkey design is the single most important aspect of HBase table designs
• The only way to address rows in HBase
70
Monday, April 1, 13
TwitBase rela7onships
• Users follow users• Rela2onships need to be persisted for usage later on• Model tables for the expected access paSerns• Read paSern
• Who does A follow?• Who follows A?• Does A follow B?
• Write paSern• A follows B• A unfollows B
71
Monday, April 1, 13
Start simple
• Adjacency list
72
Column Family : followsrow key:userid
cell value: followed userid
column qualifier: followed user number
4:HRogers1:HRogers
3:Olivia1:TheRealMTTheFakeMT2:Olivia
2:MTFanBoyTheRealMT
followsCol Qualifier
Cell value
Monday, April 1, 13
Op7mizing the adjacency list
• We need a count• Where does a new followed user go?
73
2:Olivia count:2count:4TheFakeMT 4:HRogers3:Olivia2:MTFanBoy1:TheRealMT
TheRealMT 1:HRogers
follows
Monday, April 1, 13
Adding a new user
74
2:Olivia count:2count:4TheFakeMT 4:HRogers3:Olivia2:MTFanBoy1:TheRealMT
TheRealMT 1:HRogers
follows
Row that needs to be updated
Client code:Step 1: Get current countStep 2: Update countStep 3: Add new entryStep 4: Write the new data to HBase
1
2
TheFakeMT : follows: {count -> 4}
increment count
TheFakeMT : follows: {count -> 5}
3 add new entry
TheFakeMT : follows: {5 -> MTFanBoy2, count -> 5}
count:52:Olivia count:2
5:MTFanBoy2TheFakeMT 4:HRogers3:Olivia2:MTFanBoy1:TheRealMTTheRealMT 1:HRogers
follows
4
Monday, April 1, 13
Transac7ons == not good
• HBase doesn’t have na7ve support (think scale)• Don’t want to complicate client side logic• Only solu7on -‐> simplify schema
75
Olivia:1MTFanBoy:1TheFakeMTOlivia:1HRogers:1
HRogers:1TheRealMT:1TheRealMT
follows
Monday, April 1, 13
Revisit the ques7ons
• Read paGern• Who all does A follow?• Who all follows A?• Does A follow B?
• Write paGern• A follows B• A unfollows B
76
Monday, April 1, 13
Revisit the ques7ons
76
Monday, April 1, 13
Denormaliza7on
• Second table for reverse rela7onship• Otherwise scan across en7re table and affect read performance
77
DenormalizationPoor design
DreamlandNormalization
Read performance
Writ
e pe
rform
ance
Monday, April 1, 13
More op7miza7ons
• Convert into tall-‐narrow table• Leverage rowkey indexing beGer• Gets -‐> short Scans
78
CF : f
row key:follower + followed
cell value: 1
CQ: followed user's nameThe + in the row key refers to concatenating
the two values. You could delimitateusing any character you like.
eg: A-B or A,B
Keeping the column family and column qualifiernames short reduces the data transferred over thenetwork back to the client. The KeyValue objects
become smaller.
Monday, April 1, 13
Tall-‐narrow table example
• Denormaliza7on is the way to go
79
TheRealMT+HRogers Henry Rogers:1Olivia Clemens:1TheRealMT+Olivia
Amandeep Khurana:1TheFakeMT+MTFanBoyOlivia Clemens:1
Mark Twain:1TheFakeMT+TheRealMT
TheFakeMT+OliviaHenry Rogers:1TheFakeMT+HRogers
f Putting the user name in the columnqualifier saves you from looking upthe users table for the name of theuser given an id. You can simply
list out names or ids while lookingat relationships just from this table.
The downside of this is that you needto update the name in all the cellsif the user updates their name in
their profile.This is classic Denormalization.
Monday, April 1, 13
Uniform rowkey length
• MD5 the userids -‐> 16 bytes + 16 bytes rowkeys• BeGer distribu7on of load
80
CF : f
row key:md5(follower)md5(followed)
cell value: followed users name
CQ: followed useridUsing MD5 of the user ids gives you
fixed lengths instead of variablelength user ids. You don't needconcatenation logic anymore.
Monday, April 1, 13
Uniform rowkey length (con7nued)
81
MD5(TheRealMT) MD5(HRogers) HRogers:Henry RogersOlivia:Olivia ClemensMD5(TheRealMT) MD5(Olivia)
MTFanBoy:Amandeep KhuranaMD5(TheFakeMT) MD5(MTFanBoy)Olivia:Olivia Clemens
TheRealMT:Mark TwainMD5(TheFakeMT) MD5(TheRealMT)
MD5(TheFakeMT) MD5(Olivia)HRogers:Henry RogersMD5(TheFakeMT) MD5(HRogers)
f
Monday, April 1, 13
Tall v/s Wide tables storage footprint
82
r5 c3:v5 c7:v8c1:v1
r4 c2:v4
c2:v3 c5:v6r3
c1:v2 c3:v6r2
c6:v2c1:v9c1:v1r1
CF1 CF2
r1:CF1:c1:t1:v1r2:CF1:c1:t2:v2r2:CF1:c3:t3:v6r3:CF1:c2:t1:v3r4:CF1:c2:t1:v4r5:CF1:c1:t2:v1r5:CF1:c3:t3:v5
r1:CF2:c1:t1:v9r1:CF2:c6:t4:v2r3:CF2:c5:t4:v6r5:CF2:c7:t3:v8
HFile for CF1 HFile for CF2
r5:cf2:c7:t3:v8r5:CF1:c3:t3:v5r5:CF1:c1:t2:v1
Result object returned for a Get() on row r5
KeyValue objects
Cell Value
TimeStamp
Col Qual
Col Fam
Row Key
Key Value
Logical representation of an HBase table.We'll look at what it means to Get() row r5 from this table. Actual physical storage of the table
Structure of a KeyValue object
Monday, April 1, 13
Rowkey design
• Single most important aspect of designing tables• Depends on expected access paGerns• HFiles are sorted on Key part of KeyValue objects
83
, , , "TheRealMT" "info" "password" , 1329088321289 "Langhorne", , , "TheRealMT" "info" "password" , 1329088818321 "abc123",
, , , , "TheRealMT" "info" "email" 1329088321289 "[email protected]", , , , "TheRealMT" "info" "name" 1329088321289 "Mark Twain"
HFile for the info column family in the users table
Monday, April 1, 13
Write op7mized
• Distribute writes across the cluster• Issue most pronounced with 7me series data
• Hashinghash("TheRealMT") -> random byte[]
• Sal7ngint salt = new Integer(new Long(timestamp).hashCode()).shortValue() % <number of region servers>;byte[] rowkey = Bytes.add(Bytes.toBytes(salt) + Bytes.toBytes("|") + Bytes.toBytes(timestamp));
84
Monday, April 1, 13
Read op7mized
• Data to be accessed together should be stored together• eg: twit streams -‐ last 10 twits by the users I follow
85
Olivia1Olivia2Olivia5Olivia7Olivia9TheFakeMT2TheFakeMT3TheFakeMT4TheFakeMT5TheFakeMT6TheRealMT1TheRealMT2TheRealMT5TheRealMT8
1Olivia1TheRealMT2Olivia2TheFakeMT2TheRealMT3TheFakeMT4TheFakeMT5Olivia5TheFakeMT5TheRealMT6TheFakeMT7Olivia8TheRealMT9Olivia
Monday, April 1, 13
Rela7onal to Non rela7onal
• Rela7onal concepts• En77es• AGributes• Rela7onships
• En77es• Table is a table. Not much going on there• Users table contains... users. Those are en77es
• Good place to start. Denormaliza7on bends that rule a bit
86
Monday, April 1, 13
Rela7onal to Non rela7onal (con7nued)
• AGributes• Iden7fying
• Primary keys. Compound keys• Maps to rowkeys
• Non-‐iden7fying• Other columns• Maps to column qualifiers and cells
• Rela7onships
87
Monday, April 1, 13
Nested En77es
• Column Qualifiers can contain data instead of just a column name
88
hbase tablerow key
column family
repeating entity
fixed qualifier → timestamp → value
variable qualifier → timestamp → value
Nested entities
Monday, April 1, 13
Schema design summary
• Schema can make or break the performance you get• Rowkey is the single most important thing
• Use tricks like hashing and sal2ng• Denormalize to your advantage
• There are no joins• Isolate access paSerns
• Separate CFs or even separate tables• Shorter names -‐> lower storage footprint• Column qualifiers can be used to store data and not just column names
• Big difference as compared to RDBMS
89
Monday, April 1, 13
90
... for some real stuff, beyond your laptop
Deployments and Opera2ons
Monday, April 1, 13
Components
• HDFS (DataNodes)• Storage
• ZooKeeper• Membership management
• RegionServers• Serve the regions
• HBase Masters• Janitorial work
91
Monday, April 1, 13
Hardware
• DataNodes and RegionServers collocated• Plenty of RAM and Disk• 8-‐12 cores• 48 GB RAM• 12 x 1 TB disks
92
Monday, April 1, 13
Hardware (con7nued)
• HBase Master and ZooKeeper collocated• Don’t necessarily need to be though• Keep odd number of ZKs
• ~3 is good enough for upto 50-‐70 node cluster typically
• Give ZK independent spindle for log wri7ng• I/O latency sensi7ve
• 4-‐6 cores• 24 GB RAM
93
Monday, April 1, 13
Types of deployments• Standalone for dev purpose• Prototype
• < 10 nodes• No real SLAs• 1 ZK + Master, rest are RS
• Small produc2on cluster• 10-‐20 nodes• 1 ZK + Master, maybe NNHA, rest are RS
• Medium produc2on cluster• 20-‐50 nodes• 3 ZK + Master, NNHA, rest are RS
• Large produc2on cluster• 50+ nodes• 3 or 5 ZK + Master, NNHA, rest are RS
• Hardware choices remain the same mostly
94
Monday, April 1, 13
Deploying sopware and configura7on
• Automated sopware bits deployment highly recommended• Chef• Puppet• Cloudera Manager
• Centralized configura7on management highly recommended• Consistency• Ease of management
95
Monday, April 1, 13
Configura7ons
• $HBASE_HOME/conf• hbase-‐env.sh
• Environment configura2ons
• hbase-‐site.xml• HBase system configura2ons
• Linux configura2ons• Number of open files• swappiness
• HDFS configura2ons• $HADOOP_HOME/hdfs-‐site.xml
96
Monday, April 1, 13
Monitoring
• Metrics context from Hadoop• Ganglia• File context
• JMX• Cac7
• Several metrics available• Write path specific• Read path specific
97
Monday, April 1, 13
Monitoring (con7nued)
98
Monday, April 1, 13
Performance tuning
• Performance tes2ng• Use real (or synthe2c) workloads• YCSB
• Tuning• Write op2mized
• Memstore
• Random-‐read op2mized• Block cache• Smaller block size
• Sequen2al-‐read op2mized• Block cache• Higher block size
• GC tuning
99
Monday, April 1, 13
100
Monday, April 1, 13