47
1 Shift into High Gear: Dramatically Improve Hadoop and NoSQL Performance M. C. Srivas, CTO/Founder

Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

Embed Size (px)

DESCRIPTION

MapR Architecture Presentation given at Strata + Hadoop World 2013 by MapR CTO & Co-Founder M.C. Srivas Prior to co-founding MapR, Srivas ran one of the major search infrastructure teams at Google where GFS, BigTable and MapReduce were used extensively. He wanted to provide that powerful capability to everyone, and started MapR on his vision to build the next-generation platform for Big Data. His strategy was to evolve Hadoop and bring simplicity of use, extreme speed and complete reliability to Hadoop users everywhere, and make it seamlessly easy for enterprises to use this powerful new way to get deep insights. Srivas brings to MapR his experiences at Google, Spinnaker Networks, Transarc in building game-changing products that advance the state of the art.

Citation preview

Page 1: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

1

Shift into High Gear: Dramatically Improve Hadoop

and NoSQL Performance

M. C. Srivas, CTO/Founder

Page 2: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

2

‘09 ‘110706

publishes MapReduce paper

(struggling with Nutch) Copies MR paper,calls it Hadoop

Uses Hadoop

Uses Hadoop

uses Hadoop

DOW drops from 14,300 to 6,800

MapR founded

‘13‘12

Launches ultra-reliable

MapRpartners with

Breaks all world records!

MapR History

2500 nodes Largest non-Web installation

MapR M7Fastest NoSQL on the planet

Page 3: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

3

MapR Distribution for Apache Hadoop

Analytics Operations OLTPM

anagem

en

t

Batch Interactive Real Time

Data Platform

NoSQL

Hardening, Testing, Integrating, Innovating,

Contributing as part of theApache Open Source

Community

Web 2.0Tools Enterprise Apps Vertical Apps

99.999% HA

Data Protection

Disaster Recovery

Enterprise Integration

Scalability & Performance

Multi-Tenancy

Page 4: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

4

100% Apache Hadoop

With significant enterprise-grade enhancements

Comprehensive management

Industry-standard interfaces

Higher performance

MapR Distribution for Apache Hadoop

Pig

Hive

HBase

Mahout

Drill

Impala

YARN

Cascading

Nagios

Ganglia

MapR Control System

MapR Data Platform

MapR Control System

MapR Data Platform

Flume

Sqoop

HCatalog

Zookeeper

MapReduce

Hue

Oozie

Avro

Page 5: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

5

The Cloud Leaders Pick MapR

Google chose MapR to provide Hadoop on Google Compute Engine

Amazon EMR is the largest Hadoop provider in revenue

and # of clusters

MapR partnership with Canonical and Mirantis provides advantages for OpenStack deployments

Page 6: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

6

What Makes MapR so Reliable?

Page 7: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

7

1. Make the storage reliable– Recover from disk and node failures

2. Make services reliable– Services need to checkpoint their state rapidly– Restart failed service, possibly on another node– Move check-pointed state to restarted service, using (1) above

3. Do it fast– Instant-on … (1) and (2) must happen very, very fast– Without maintenance windows

• No compactions (e.g., Cassandra, Apache HBase)• No “anti-entropy” that periodically wipes out the cluster (e.g., Cassandra)

How to Make a Cluster Reliable?

Page 8: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

8

No NVRAM

Cannot assume special connectivity– No separate data paths for "online" vs. replica traffic

Cannot even assume more than 1 drive per node– No RAID possible

Use replication, but …– Cannot assume peers have equal drive sizes– Drive on first machine is 10x larger than drive on other?

No choice but to replicate for reliability

Reliability with Commodity Hardware

Page 9: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

9

Replication is easy, right? All we have to do is send the same bits to the master and replica

Reliability via Replication

Clients

Primary Server Replica

Normal replication, primary forwards

Clients

Primary Server Replica

Cassandra-style replication

Page 10: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

10

When the replica comes back, it is stale– It must be brought up-to-date– Until then, exposed to failure

But Crashes Occur…

Clients

Primary Server

Replica remains stale until"anti-entropy" process

kicked off by administrator

Primary re-syncs replica

Clients

Primary Server Replica Replica

Who re-syncs?

Page 11: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

11

HDFS solves the problem a third way Makes everything read-only

– static data, trivial to re-sync

Single writer, no reads allowed while writing File close is the transaction that allows readers to see data

– Unclosed files are lost– Cannot write any further to closed file

Realtime not possible with HDFS– To make data visible, must close file immediately after writing– Too many files is a serious problem with HDFS (a well documented

limitation)

HDFS therefore cannot do NFS, ever– No “close” in NFS … can lose data any time

Unless it’s HDFS …

Page 12: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

12

To support normal apps, need full read/write support Let's return to the issue: re-sync the replica when it comes back

This is the 21st Century…

Clients

Primary Server

Replica remains stale until"anti-entropy" process

kicked off by administrator

Primary re-syncs replica

Clients

Primary Server Replica Replica

Who re-syncs?

Page 13: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

13

24 TB / server– @ 1000MB/s = 7 hours– Practical terms, @ 200MB/s = 35 hours

Did you say you want to do this online?– Throttle re-sync rate to 1/10th

– 350 hours to re-sync (= 15 days)

What is your Mean Time To Data Loss (MTTDL)?– How long before a double disk failure?– A triple disk failure?

How Long to Re-sync?

Page 14: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

14

Use dual-ported disk to side-step this problem

Traditional Solutions

Clients

Primary Server Replica

Dual Ported Disk Array

Raid-6 with idle spares

Servers useNVRAM

COMMODITY HARDWARE LARGE SCALE CLUSTERING

Large Purchase Contracts, 5-year spare-parts plan

Page 15: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

15

Forget Performance?

SAN/NAS

data data data

data data data

daa data data

data data data

function

RDBMS

Traditional Architecture

data

function

data

function

data

function

data

function

data

function

data

function

data

function

data

function

data

function

data

function

data

function

data

function

Hadoop

function

App

function

App

function

App

Geographically dispersed also?

Page 16: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

16

Chop the data on each node to 1000s of pieces– Not millions of pieces, only 1000s– Pieces are called containers

Spread replicas of each container across the cluster

What MapR Does

Page 17: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

17

Why Does It Improve Things?

Page 18: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

18

100-node cluster Each node holds 1/100th of every node's data When a server dies, entire cluster re-syncs the dead node's data

MapR Replication Example

Page 19: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

19

99 nodes re-sync'ing in parallel– 99x number of drives– 99x number of Ethernet ports– 99x CPUs

Each is re-sync'ing 1/100th of the data

MapR Re-sync Speed Net speed up is about 100x

– 350 hours vs. 3.5

MTTDL is 100x better

Page 20: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

20

Mean Time To Data Loss (MTTDL) is far better– Improves as cluster size increases

Does not require idle spare drives– Rest of cluster has sufficient spare capacity to absorb one node’s data– On a 100-node cluster, 1 node’s data == 1% of cluster capacity

Utilizes all resources– no wasted “master-slave” nodes– no wasted idle spare drives … all spindles put to use

Better reliability with less resources– on commodity hardware!

MapR Reliability

Page 21: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

21

Why Is This So Difficult?

Page 22: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

22

Writes are synchronous– Visible immediately

Data is replicated in a "chain" fashion– Utilizes full-duplex network

Meta-data is replicated in a "star" manner– Response time better

MapR's Read-write Replication

client1client2

clientN

client1client2

clientN

Page 23: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

23

Container Balancing

As data size increases, writes spread more (like dropping a pebble in a pond)

Larger pebbles spread the ripples farther

Space balanced by moving idle containers

• Servers keep a bunch of containers "ready to go”• Writes get distributed around the cluster

Page 24: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

24

MapR Container Re-sync

MapR is 100% random write– uses distributed transactions

On a complete crash, all replicas diverge from each other

On recovery, which one should be master?

Completecrash

Page 25: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

25

MapR Container Re-sync

MapR can detect exactly where replicas diverged– even at 2000 MB/s update rate

Re-sync means– roll-back each to divergence point– roll-forward to converge with chosen

master

Done while online– with very little impact on normal

operations

New masterafter crash

Page 26: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

26

Re-sync traffic is “secondary” Each node continuously measures RTT to all its peers More throttle to slower peers

– Idle system runs at full speed

All automatically

MapR Does Automatic Re-sync Throttling

Page 27: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

27

But Where Do Containers Fit In?

Page 28: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

28

Each container contains Directories & files Data blocks

Replicated on servers Automatically managed

MapR's ContainersFiles/directories are sharded and placed into containers

Containers are 16-32 GB segments of disk, placed on nodes

Page 29: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

29

MapR’s Architectural Params

Unit of I/O– 4K/8K (8K in MapR)

Unit of Chunking (a map-reduce split)– 10-100's of megabytes

Unit of Resync (a replica)– 10-100's of gigabytes– container in MapR– automatically managed

i/o KB 10^3map-red

10^6resync

10^9admin

HDFS 'block'

Unit of Administration (snap, repl, mirror, quota, backup)– 1 gigabyte - 1000's of terabytes– volume in MapR– what data is affected by my

missing blocks?

Page 30: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

30

Where & How Does MapR Exploit ThisUnique Advantage?

Page 31: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

31

E F E F E F

NameNode NameNode NameNode

MapR's No NameNode HATM ArchitectureHDFS Federation MapR (Distributed Metadata)

• Multiple single points of failure• Limited to 50-200 million files• Performance bottleneck• Commercial NAS required

• HA w/automatic failover• Instant cluster restart• Up to 1T files (> 5000x advantage)• 10-20x higher performance• 100% commodity hardware

NAS appliance

NameNode NameNode NameNode

A B C D E F

DataNode DataNode DataNode

DataNode DataNode DataNode

A F C D E D

B C E B

C F B F

A B

A D

E

Page 32: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

32

Relative Performance and Scale

0 1000 2000 3000 4000 5000 60000

2000

4000

6000

8000

10000

12000

14000

16000

18000

Files (M)

File

cre

ates

/s

0 100 200 400 600 800 1000

MapR distribution

Other distribution

Benchmark: 100-byte file creates

Hardware: 10 nodes, each2 x 4 cores24 GB RAM12 x 1 TB 7200

RPM1 GigE NIC

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

50100150200250300350400

Files (M)

File

cre

ates

/s

Other distributionMapR Other Advantage

Rate (creates/s) 14-16K 335-360 40x

Scale (files) 6B 1.3M 4615x

Page 33: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

33

Where & How Does MapR Exploit ThisUnique Advantage?

Page 34: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

34

MapR’s NFS Allows Direct Deposit

Connectors not neededNo extra scripts or clusters to deploy and maintain

Random Read/Write

Compression

Distributed HA

WebServer

DatabaseServer

ApplicationServer

Page 35: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

35

Where & How Does MapR Exploit ThisUnique Advantage?

Page 36: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

36

MapR Volumes & Snapshots

100K volumes are OK, create as many as

desired!

Volumes dramatically simplify the management of Big Data

• Replication factor• Scheduled mirroring• Scheduled snapshots• Data placement control• User access and tracking• Administrative permissions

100K volumes are OK, create as many as

desired!

/projects

/tahoe

/yosemite

/user

/msmith

/bjohnson

Page 37: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

37

Where & How Does MapR Exploit ThisUnique Advantage?

Page 38: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

38

MapR M7 Tables

Binary compatible with Apache HBase– no recompilation needed to access M7 tables– Just set CLASSPATH

M7 tables accessed via pathname– openTable( "hello") … uses HBase– openTable( "/hello") … uses M7– openTable( "/user/srivas/hello") … uses M7

Page 39: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

39

M7 tables integrated into storage– always available on every node, zero admin

Unlimited number of tables– Apache HBase is typically 10-20 tables (max 100)

No compactions– update in place

Instant-on– Zero recovery time

5-10x better performance Consistent low latency

– At 95th & 99th percentiles

M7 Tables

MapR M7

Page 40: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

40

YCSB 50-50 Mix (throughput)

M7 (ops/sec) Other HBase 0.94.9 (ops/sec)

Other HBase 0.94.9 latency (usec) M7 Latency (usec)

Page 41: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

41

YCSB 50-50 mix (latency)

Other latency (usec) M7 Latency (usec)

Page 42: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

43

Where & How Does MapR Exploit ThisUnique Advantage?

Page 43: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

44

ALL Hadoop components have High Availability– e.g. YARN

ApplicationMaster (old JT) and TaskTracker record their state in MapR

On node-failure, AM recovers its state from MapR– Works even if entire cluster restarted

All jobs resume from where they were– Only from MapR

Allows preemption– MapR can preempt any job, without losing its progress– ExpressLane™ feature in MapR exploits it

MapR Makes Hadoop Truly HA

Page 44: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

45

MapR Does MapReduce (Fast!)

TeraSort Record1 TB in 54 seconds

1003 nodes

MinuteSort Record1.5 TB in 59 seconds

2103 nodes

Page 45: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

46

MapR Does MapReduce (Faster!)

TeraSort Record1 TB in 54 seconds

1003 nodes

MinuteSort Record1.5 TB in 59 seconds

2103 nodes1.65

300

Page 46: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

47

YOUR CODECan Exploit MapR’s Unique Advantages

ALL your code can easily be scale-out HA– Save service-state in MapR – Save data in MapR

Use Zookeeper to notice service failure Restart anywhere, data+state will move there

automatically That’s what we did!

Only from MapR: HA for Impala, Hive, Oozie, Storm, MySQL, SOLR/Lucene, HCatalog, Kafka, …

Page 47: Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance

48

MapR: Unlimited Hadoop

Reliable Compute Dependable Storage

Build cluster brick by brick, one node at a time

Use commodity hardware at rock-bottom prices Get enterprise-class reliability: instant-restart,

snapshots, mirrors, no single-point-of-failure, … Export via NFS, ODBC, Hadoop and other standard

protocols