43
pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Embed Size (px)

Citation preview

Page 1: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

pg_shard: Shard andScale Out PostgreSQL

Jason PetersenSumedh PathakOzgun Erdogan

Page 2: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

What is CitusDB?• CitusDB is a scalable analytics database that

extends PostgreSQL. pg_shard targets the short read/write use case– Citus shards and replicates your data in the same way

pg_shard does– Citus parallelizes your queries for analytics– Citus isn’t a fork of Postgres. Rather, it hooks onto the

planner and executor for distributed query execution.

Page 3: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Talk Outline1. Why use pg_shard?2. Data: Scaling and failure handling3. Computation: Query routing and failure

handling4. Random PostgreSQL coolness

Page 4: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

#1 Requested Feature from Citus• Real-time analytics calls for real-time data ingest.• We had one customer who built real-time inserts

on their own. Then two other customers did the same thing.

• We also talked to PostgreSQL users. Some considered application level sharding, or migrating to NoSQL solutions.

Page 5: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Customer Interviews• Dynamically scale a cluster as new machines

are added or old ones are retired. Magically handle failures

• Simple to set up and use. Works natively on PostgreSQL

Page 6: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Architectural DecisionsDynamic Scaling: Use logical shards to facilitate easy rebalancing as cluster membership changes

Simple to Use: Works natively with PostgreSQL by augmenting its planner and executor logic for real-time data ingest and querying

Page 7: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Traditional Partitioning

Page 8: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

click_events_2012

Node #1(PostgreSQL)

click_events_2013

Node #2

click_events_2014

Node #3

(4 TB) (4 TB) (4 TB)

Page 9: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Node #4

click_events_2012

Node #1

(4 TB) click_events_2013

Node #2

(4 TB) click_events_2014

Node #3

(4 TB)

Page 10: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

click_events_2012

Node #1

(4 TB) click_events_2013

Node #2

(4 TB) click_events_2014

Node #3

(4 TB)

Node #4

1 TB (each)

Page 11: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

click_events_2012

Node #1

click_events_2013

Node #2

click_events_2014

Node #3

click_events_2012

Node #4

click_events_2013

Node #5

click_events_2014

Node #6

Page 12: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

click_events_2012

Node #1

click_events_2013

Node #2

click_events_2014

Node #3

click_events_2012

Node #4

click_events_2013

Node #5

click_events_2014

Node #6

Page 13: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Logical Sharding

Page 14: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Node #4

Node #1(PostgreSQL)

1 3 4

6 7 9

… … …

… … …

Node #2

1 2 4

5 7 8

… … …

… … …

Node #3

2 3 5

6 8 9

… … …

… … …

512 MB (each)

Page 15: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Node #1(PostgreSQL)

1 6 7

… … …

… … …

… … …

Node #2

1 2 7

… … …

… … …

… … …

Node #3

2 3 8

… … …

… … …

… … …

Node #4

3 4 8

… … …

… … …

… … …

Node #5

4 5 9

… … …

… … …

… … …

Node #6

5 6 9

… … …

… … …

… … …

Page 16: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Node #1(PostgreSQL)

1 6 7

… … …

… … …

… … …

Node #2

1 2 7

… … …

… … …

… … …

Node #3

2 3 8

… … …

… … …

… … …

Node #4

3 4 8

… … …

… … …

… … …

Node #5

4 5 9

… … …

… … …

… … …

Node #6

5 6 9

… … …

… … …

… … …

Page 17: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Metadata Server(PostgreSQL + pg_shard)

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Example Data Distributionin pg_shard Cluster

shard and shard placement metadata

Page 18: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Metadata Server (Master Node)• Metadata server holds authoritative state on

shards and their placements in the cluster• This state is minimal: 1 row per distributed table,

1 row per shard, and 1 row per shard placement. Kept in 3 Postgres tables

• To handle metadata node failures, you can create a streaming replica, or reconstruct metadata from workers on the fly

Page 19: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Metadata and Hash Partitioningpostgres=# SELECT * FROM pgs_distribution_metadata.shard; id | relation_id | storage | min_value | max_value -------+-------------+---------+-------------+------------- 10004 | 177880 | t | -2147483648 | -1879048194 10005 | 177880 | t | -1879048193 | -1610612739 10006 | 177880 | t | -1610612738 | -1342177284 10007 | 177880 | t | -1342177283 | -1073741829 10008 | 177880 | t | -1073741828 | -805306374 10009 | 177880 | t | -805306373 | -536870919 ... | ... | ... | ... | ...

Page 20: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Worker Nodes• 1 shard placement = 1 PostgreSQL table• Names of tables, indexes, and constraints are

extended by their shard identifiere.g. click_events_1001 holds data for shard 1001

• Indexes and constraint definitions are propagated when shards first created

Page 21: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

What about the logic?• Drop-in PostgreSQL extension• Supports subset of SQL• Doesn’t use any special functions• It’s still just PostgreSQL

Page 22: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Users: Making Scaling SQL EasyCREATE EXTENSION pg_shard; -- create a regular PostgreSQL table: CREATE TABLE customer_reviews (customer_id TEXT NOT NULL, review_date DATE, ...); -- distribute the table on the given partition key: SELECT master_create_distributed_table('customer_reviews', 'customer_id'); -- create 16 logical shards with 2 placements on workers: SELECT master_create_worker_shards('customer_reviews', 16, 2);

Page 23: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

PostgreSQL Secret Sauce: Hooks• Full control over command lifetime• Specific to needs:– Planning– Execution (Start, Run, Finish, End)– Utility

Page 24: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Planning Phase• Determine if distributed• Fall through to PostgreSQL if not• Using partition key, find involved shards• Deparse shard-specific SQL

Page 25: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Planning ExampleINSERT INTO customer_reviews (customer_id, rating) VALUES ('HN892', 5);

•Determine partition key clauses:customer_id = 'HN892'

•Find shards from metadata tables:hashtext('HN892') BETWEEN min_value AND max_value

•Produce shard-specific SQLINSERT INTO customer_reviews_16 (customer_id, rating) VALUES ('HN892', 5);

Page 26: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Execute Distributed Modify• Locks enforce safe commutation• Replicas visited in predictable order• Per-session connection pool uses libpq• If replica errors out, mark as inactive

Page 27: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Single-shard INSERTReplication factor: 2

Master

INSERT INTO customer_reviews ...

Page 28: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Single-shard INSERTOne replica fails

Master

INSERT INTO customer_reviews ...

Page 29: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Single-shard INSERTMaster marks inactive

Master

Sets shard 6, node 3 to inactive status

Page 30: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Single Shard SELECT

• Fetch entire result from single shard• Failover to another replica on error• Do not modify state if failure happens• Common key-value access pattern

Page 31: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Single-shard SELECTTry first placement

Master

SELECT * FROM customer_reviews WHERE customer_id = 'HN892';

Page 32: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Single-shard SELECTEncounter error

Master

SELECT * FROM customer_reviews WHERE customer_id = 'HN892';

Page 33: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

1 3 4

6 7 9

… … …

… … …

Worker Node #1

1 2 4

5 7 8

… … …

… … …

Worker Node #2

2 3 5

6 8 9

… … …

… … …

Worker Node #3

Single-shard SELECTTry next placement

Master

SELECT * FROM customer_reviews WHERE customer_id = 'HN892';

Page 34: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Multi-Shard SELECTpg_shard: Engaged when no partition key constraint is present. Pulls data to master and performs final pass locally (aggregates, etc.)

CitusDB: Pushes computations to worker nodes themselves for fully distributed query logic

Page 35: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

What’s Next?• pg_shard v1.1 release this week• Includes shard repair udf, easier integration with

CitusDB, COPY support through triggers, SELECT project push downs, faster INSERTs, and bug fixes

• pg_shard v1.2 will have more real-time analytics functionality

• Anything else?

Page 36: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Summary• Sharding extension for PostgreSQL

https://github.com/citusdata/pg_shard

• Many small logical shards• Implemented with standard PostgreSQL hooks• Load extension… create table… distribute• JSONB + pg_shard as alternative to NoSQL

Page 37: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

PostgreSQL Renaissance

Page 38: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

PostgreSQL Usage Trends

• What database does your company use?HackerNews2010

HackerNews2014

Page 39: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Why is PostgreSQL popular

1. Oraclization of MySQL2. Reliable and robust database3. New extension framework: Is the monolithic

SQL database dying? If so, long live Postgres

Page 40: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

PostgreSQL Extensions #1

• HyperLogLog extension for real-time unique counts over varying time intervals

• JSONB and GIN indexes for semi-structured data

Page 41: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

PostgreSQL Extensions #2

• Real-time funnel and cohort queries• Dynamic funnels to look at people who visited A,

then B, and finally J

• Heatmap video of vehicles on a map

Page 42: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

ContactJason: [email protected]

Sumedh: [email protected]: [email protected]

General: groups.google.com/forum/#!

forum/pg_shard-users

Page 43: Pg_shard: Shard and Scale Out PostgreSQL Jason Petersen Sumedh Pathak Ozgun Erdogan

Questions