Upload
dinhdat
View
268
Download
1
Embed Size (px)
Citation preview
1 1
Scaling PostgreSQL on Postgres-XL
February, 2015 Mason Sharp PGConf.ru
2
Agenda
• Intro • Postgres-XL Background
• Architecture Overview
• Distributing Tables
• Configuring a Local Cluster
• Differences to PostgreSQL • Community
3
whoami
• Co-organizer of NYC PUG • I like database clusters
• GridSQL • Postgres-XC • Postgres-XL
• Work: • TransLattice • StormDB • EnterpriseDB
4
PGConf.US
• March 25–27, 2015
• New York City
• 44 Talks
• 4 Tracks!
7
Postgres-XL
n Open Source
n Scale-out RDBMS – MPP for OLAP
– OLTP write and read scalability
– Custer-wide ACID properties
– Multi-tenant security
– Can also act as distributed key-value store
8
Postgres-XL
n Hybrid Transactional and Analytical Processing (HTAP)
n Can replace multiple systems
9
Postgres-XL
n Rich SQL support
– Correlated Subqueries
10
13
Postgres-XL Features
• Large subset of PostgreSQL SQL • Distributed Multi-version Concurrency Control
• Contrib-friendly
• Full Text Search
• JSON and XML Support
• Distributed Key-Value Store
14
Postgres-XL Missing Features
• Built-in High Availability • Use external like Corosync/Pacemaker • Area for future improvement
• “Easy” to re-shard / add nodes • Some pieces there • Tables locked during redistribution • Can however “pre-shard” multiple nodes on
the same server or VM
• Some FK and UNIQUE constraints • Recursive CTEs (WITH)
15
Performance
Transactional work loads
10 50 100 200 300 4000
2000
4000
6000
8000
10000
12000
14000
16000
18000
Dell DVD Benchmark
StormDBAmazon RDS
16
OLTP Characteristics
n Coordinator Layer adds about 30% overhead to DBT-1 (TPC-W) workload – Single node (4 cores) gets 70% performance compared
to vanilla PostgreSQL
n If linear scaling 10 nodes should achieve 7x performance compared to PostgreSQL
n Actual result: 6 – 6.4 x
n More nodes, more open transactions, means larger snapshots and more visibility checking
17
MPP Performance – DBT-3 (TPC-H)
Postgres-XL
PostgreSQL
Postgres-XL
18
OLAP Characteristics
n Depends on schema n Star schema?
– Distribute fact tables across nodes – Replicate dimensions
n Linear/near-linear query performance for straight-forward queries
n Other schemes different results – Sometimes “better than linear” due to more caching
19
Key-value Store Comparison
Postgres-XL
MongoDB
21
Multiversion Concurrency
Control (MVCC)
22 July 12th, 2012 Postgres-XC 22
MVCC
l Readers do not block writers l Writers do not block readers l Transaction Ids (XIDs)
l Every transaction gets an ID
l Snapshots contain a list of running XIDs
23 July 12th, 2012 Postgres-XC 23
MVCC – Regular PostgreSQL
Example:
T1 Begin...
T2 Begin; INSERT...; Commit;
T3 Begin...
T4 Begin; SELECT
l T4's snapshot contains T1 and T3
l T2 already committed l T4 can see T2's commits, but not T1's nor T3's
Node
24 July 12th, 2012 Postgres-XC 24
Example:
T1 Begin...
T2 Begin; INSERT...; Commit;
T3 Begin...
T4 Begin; SELECT
l Node 1: T2 Commit, T4 SELECT l Node 2: T4 SELECT, T2 Commit
l T4's SELECT statement returns inconsistent data l Includes data from Node1, but not Node2.
MVCC – Multiple Node Issues
Coord
N1 N2
25 July 12th, 2012 Postgres-XC 25
Postgres-XL Solution: Ø Make XIDs and Snapshots cluster-wide
MVCC – Multiple Node Issues
26
Postgres-XL Technology
• Users connect to any Coordinator
27
Postgres-XL Technology
• Data is automatically spread across cluster
28
Postgres-XL Technology
• Queries return back data from all nodes in parallel
29
Postgres-XL Technology
• GTM maintains a consistent view of the data
30
Standard PostgreSQL
Parser
Planner
Executor
MVCC and Transaction Handling
Storage
31
Standard PostgreSQL
Parser
Planner
Executor
MVCC and Transaction Handling
Storage
GTM
MVCC and Transaction Handling • Transaction ID (XID) Generation • Snapshot Handling
32
Standard PostgreSQL
Parser
Planner
Executor
MVCC and Transaction Handling
Storage
XL Coordinator Parser
Planner
Executor
“Metadata” Storage
XL Datanode Executor
Storage
33
Postgres-XL Architecture
• Based on the Postgres-XC project • Coordinators • Connection point for applications • Parsing and planning of queries
• Data Nodes • Actual data storage • Local execution of queries
• Global Transaction Manager (GTM) • Provides a consistent view to transactions • GTM Proxy to increase performance
34
Coordinators
• Handles network connectivity to the client • Parse and plan statements
• Perform final query processing of intermediate result sets
• Manages two-phase commit
• Stores global catalog information
35
Data Nodes
• Stores tables and indexes • Only coordinators connects to data nodes • Executes queries from the coordinators • Communicates with peer data nodes for
distributed joins
36
Global Transaction Manager (GTM)
• Handles necessary MVCC tasks • Transaction IDs • Snapshots
• Manages cluster wide values • Timestamps • Sequences
• GTM Standby can be configured
37
Global Transaction Manager (GTM)
• Can this be a single point of failure?
38
Global Transaction Manager (GTM)
• Can this be a single point of failure?
Yes.
39
Global Transaction Manager (GTM)
• Can this be a single point of failure?
Yes.
Solution: GTM Standby
GTM GTM Standby
40
Global Transaction Manager (GTM)
• Can GTM be a bottleneck?
41
Global Transaction Manager (GTM)
• Can GTM be a bottleneck?
Yes.
42
Global Transaction Manager (GTM)
• Can GTM be a bottleneck?
Yes.
Solution: GTM Proxy
GTM Proxy Coord Proc 1
Coord Proc 2
Coord Proc 3
DN Proc 1
DN Proc 2
GTM
43
GTM Proxy
• Runs alongside Coordinator or Datanode • Backends use it instead of GTM
• Groups requests together
• Obtain range of transaction ids (XIDs)
• Obtain snapshot
GTM Proxy Coord Proc 1
Coord Proc 2
Coord Proc 3
DN Proc 1
DN Proc 2
GTM
44
GTM Proxy
• Example: 10 process request a transaction id
• They each connect to local GTM Proxy
• GTM Proxy sends one request to GTM for 10 XIDs
• GTM locks procarray, allocates 10 XIDs
• GTM returns range
• GTM Proxy demuxes to processes
GTM Proxy Coord Proc 1
Coord Proc 2
Coord Proc 3
DN Proc 1
DN Proc 2
GTM
46
Data Distribution – Replicated Tables
• Good for read only and read mainly tables
• Sometimes good for dimension tables in data warehousing
• If coordinator and datanode are colocated, local read occurs
• Bad for write-heavy tables
• Each row is replicated to all data nodes
• Statement based replication
• “Primary” avoids deadlocks
47
Data Distribution – Distributed Tables
• Good for write-heavy tables
• Good for fact tables in data warehousing
• Each row is stored on one data node
• Available strategies • Hash • Round Robin • Modulo
49
Availability
• No Single Point of Failure
• Global Transaction Manager Standby
• Coordinators are load balanced
• Streaming replication for data nodes
• But, currently manual…
50
DDL
51
Defining Tables
CREATE TABLE my_table (…)
DISTRIBUTE BY
HASH(col) | MODULO(col) | ROUNDROBIN | REPLICATION
[ TO NODE (nodename[,nodename…])]
52
Defining Tables
CREATE TABLE state_code (
state_code CHAR(2),
state_name VARCHAR,
:
) DISTRIBUTE BY REPLICATION
An exact replica on each node
53
Defining Tables
CREATE TABLE invoice (
invoice_id SERIAL,
cust_id INT,
:
) DISTRIBUTE BY HASH(invoice_id)
Distributed evenly amongst sharded nodes
54
Defining Tables
CREATE TABLE invoice (
invoice_id SERIAL,
cust_id INT ….
) DISTRIBUTE BY HASH(invoice_id);
CREATE TABLE lineitem (
lineitem_id SERIAL,
invoice_id INT …
) DISTRIBUTE BY HASH(invoice_id);
Joins on invoice_id can be pushed down to the datanodes
55
test2=# create table t1 (col1 int, col2 int) distribute by hash(col1); test2=# create table t2 (col3 int, col4 int) distribute by hash(col3);
56
explain select * from t1 inner join t2 on col1 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Seq Scan on t1 -> Hash -> Seq Scan on t2
Join push-down. Looks much like regular PostgreSQL
57
test2=# explain select * from t1 inner join t2 on col2 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Remote Subquery Scan on all (datanode_1,datanode_2) Distribute results by H: col2 -> Seq Scan on t1 -> Hash -> Seq Scan on t2
Will read t1.col2 once and put in shared queue for consumption for other datanodes for joining. Datanode-datanode communication and parallelism
60
By the way
n PostgreSQL does not have query parallelism – yet
61
By the way
n PostgreSQL does not have query parallelism – yet n Example: 16 core box, 1 user, 1 query
– Just 1 core will be used – 15 (or so) idle cores
62
By the way
n PostgreSQL does not have query parallelism – yet n Example: 16 core box, 1 user, 1 query
– Just 1 core will be used – 15 (or so) idle cores
n You can use Postgres-XL even on a single server and configure multiple datanodes to achieve better performance thanks to parallelism
63
64
Configuration
65
66
Use pgxc_ctl!
(demo in a few minutes)
(please get from git HEAD for latest bug fixes)
67
Otherwise, PostgreSQL-like steps apply:
initgtm initdb
68
postgresql.conf
• Very similar to regular PostgreSQL • But there are extra parameters
• max_pool_size • min_pool_size • pool_maintenance _timeout • remote_query_cost • network_byte_cost • sequence_range • pooler_port • gtm_host, gtm_port • shared_queues • shared_queue_size
69
Install
n Download RPMs – http://sourceforge.net/projects/postgres-xl/
files/Releases/Version_9.2rc/
Or
n configure; make; make install
70
Setup Environment
• .bashrc • ssh used, so ad installation bin dir to
PATH
71
Avoid password with pgxc_ctl
• ssh-keygen –t rsa (in ~/.ssh) • For local test, no need to copy key,
already there • cat ~/.ssh/id_rsa.pub >> ~/.ssh/
authorized_keys
72
pgxc_ctl
• Initialize cluster • Start/stop • Deploy to node • Add coordinator • Add data node / slave • Add GTM slave
73
pgxc_ctl.conf
• Let’s create a local test cluster! • One GTM • One Coordinator • Two Datanodes
Make sure all are using different ports • including pooler_port
74
pgxc_ctl.conf
pgxcOwner=pgxl pgxcUser=$pgxcOwner
pgxcInstallDir=/usr/postgres-xl-9.2
75
pgxc_ctl.conf
gtmMasterDir=/data/gtm_master gtmMasterServer=localhost
gtmSlave=n
gtmProxy=n
76
pgxc_ctl.conf
coordMasterDir=/data/coord_master coordNames=(coord1)
coordPorts=(5432)
poolerPorts=(20010)
coordMasterServers=(localhost)
coordMasterDirs=($coordMasterDir/coord1)
77
pgxc_ctl.conf
coordMaxWALSenders=($coordMaxWALsender)
coordSlave=n
coordSpecificExtraConfig=(none)
coordSpecificExtraPgHba=(none)
78
pgxc_ctl.conf
datanodeNames=(dn1 dn2) datanodeMasterDir=/data/dn_master
datanodePorts=(5433 5434)
datanodePoolerPorts=(20011 20012)
datanodeMasterServers=(localhost localhost)
datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
79
pgxc_ctl.conf
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
datanodeSlave=n
primaryDatanode=dn1
80
pgxc_ctl
pgxc_ctl init all
Now have a working cluster
82
PostgreSQL Differences
83
pg_catalog
• pgxc_node • Coordinator and Datanode definition • Currently not GTM
• pgxc_class • Table replication or distribution info
84
Additional Commands
• PAUSE CLUSTER / UNPAUSE CLUSTER • Wait for current transactions to complete • Prevent new commands until UNPAUSE
• EXECUTE DIRECT ON (nodename) ‘command’
• CLEAN CONNECTION • Cleans connection pool
• CREATE NODE / ALTER NODE
85
Noteworthy
• SELECT pgxc_pool_reload() • pgxc_clean for 2PC
86
Looking at Postgres-XL Code
n XC #ifdef PGXC n XL #ifdef XCP #ifndef XCP
87
Community
88
Website and Code
n postgres-xl.org n sourceforge.net/projects/postgres-xl
n Will add mirror for github.com
n Docs – files.postgres-xl.org/documentation/index.html
90
Philosophy
n Stability and bug fixes over features n Performance-focused
n Open and inclusive – Welcome input for roadmap, priorities and design – Welcome others to become core members and
contributors
91
Roadmap
n New release n Improve Cluster Management
n Catch up to PostgreSQL Community -> 9.3, 9.4
n Make XL a more robust analytical platform – Distributed Foreign Data Wrappers
n GTM optional
n Native High Availability