View
329
Download
2
Category
Preview:
Citation preview
Mike Fowler, PGConf.EU 2017-10-26, Warsaw, Poland
Elephants in the Cloud
● Senior Site Reliability Engineer in the Public Cloud Practice of claranet
● Background in Software Engineering, Systems Engineering, System & Database Administration
● Been using PostgreSQL since 7.4
● Contributed to YAWL, PostgreSQL & Terraform open source projects
– XMLEXISTS/xpath_exists()
– xml_is_well_formed()
● Regular speaker at PGDay UK
About Me
● Hosted PostgreSQL
● Overview of public cloud hosting options
● Database migration strategies
Overview
● Your database somewhere else
● A managed service
– Some providers offer full DBA support
– Cloud providers give only the infrastructure
● Typically provisioned through an API or GUI
– i.e. a self-service environment
What is hosted PostgreSQL?
● Reduces adoption costs
● Installation & configuration is already done
– Generally sane defaults, some tuning often required
● Needn’t worry about physical servers
● Opex instead of Capex
● Most routine DBA tasks are done for you
● Easier to grow
Benefits of Hosted PostgreSQL
● Less control
● Latency
● Some features are disabled
● Migrating existing databases is hard
● Potential for vendor lock-in
● Resource limits
Drawbacks of Hosted PostgreSQL
● Automatic scaling
● It will fix your slow queries/reports/application
● You don’t need a DBA
– They need to be planning for scale
– They need to help address slowness
Fallacies of Hosted PostgreSQL
● We’ll look only at Public Cloud offerings
● Current major offerings
– Amazon Relation Database Service (RDS)
– Amazon Aurora
– Heroku
● Future major offerings
– Google Cloud SQL
– Microsoft Azure
Hosting Options
● PostgreSQL 9.3.12 – 9.6.3 supported
● Numerous instance types
– Costs range from $0.018 to $7.96 per hour
– Select from 1 vCPU up to 64 vCPUs, all 64-bit
– Memory ranges from 1GB to 488GB
● Flexible storage options
– Choose between SSD or Provisioned IOPS
– Up to 6TB with up to 30,000 IOPS
Amazon RDS
● High availability multi-availability zone option
– Synchronous replica
– Automatic failover (~2 minutes)
● Up to 5 read-only replicas (asynchronous replication)
● Configurable automatic backups with PITR
● Monthly uptime percentage of 99.95 per instance
– Allows for approximately 22 minutes downtime
Amazon RDS
● Became generally available yesterday (2017-10-25)
● Currently available in 3 US regions and Ireland
● 6 instance types costing from $0.29 to $9.28 per hour
● Compatible with PostgreSQL 9.6
● Up to 2x throughput of conventional PostgreSQL
● Up to 16 read replicas with sub-10ms replica lag
● Auto-growing filesystem up to 64TB
– Filesystem is shared between 3 availability zones
Amazon Aurora
Amazon Aurora
https://aws.amazon.com/blogs/aws/amazon-aurora-update-postgresql-compatibility/
● Supports PostgreSQL 9.3, 9.4, 9.5 & 9.6 (10 in beta)
● Simpler pricing based on choice of tier ($0-8.5k pcm)
● Tier dictates resource limits
– Maximum number of rows (Hobby only)
– Cache size (1GB - 240GB)
– Storage limit (64GB - 1TB)
– Connection limit (120 - 500)
– Rollback (4 days – 1 week)
Heroku
● Fork & Follow
● Some of your data may end up in the US
– Logs (can be blocked at creation time)
– Snapshots & Dataclips
● Not possible to replicate out
– No permission for Bucardo, Londiste & Slony
– Remote slave is prohibited
– Only way is dump & restore
Heroku
● Currently in Beta (no SLA)
● Only supports PostgreSQL 9.6
● No replication support
● Posed to be a serious rival to RDS
– Billing per minute
– Automatic scaling of filesystem
– Similar variety of instance types
Google Cloud SQL
● Currently in preview (no SLA)
● Supports PostgreSQL 9.5 & 9.6
● Replication is seamless
– Automated failover
– PITR
● Selectable compute units
– Roughly equates to maximum connections
Microsoft Azure
● At Rest (AES-256)
– Enabled by default for Google & Azure
– User enabled at create for Amazon (RDS/Aurora)
– Only available in premium plans for Heroku
● In transit (SSL)
– Enforced on Heroku & Azure
– Can be enforced on Google & Amazon
● pgcrypto is available on all platforms
Encryption
● Providers dictate the availability of extensions
– Well documented except for Aurora
● Only step required is to run CREATE EXTENSION
● Many extensions are widely available
– pgcrypto
– PostGIS
– pg_stat_statements
– hstore
– plpgsql
Extensions
● Dump & Restore
● Replication failover
● Amazon’s Database Migration Service
● PITR + Logical decoding
Migration Strategies
● Simplest strategy
– Perceived as low risk for data loss
– Less “moving parts”● Just a pg_dump & pg_restore
● Downtime is function of database size
Dump & Restore
● Move historic data ahead of time
– Opportunity to clear out unused data
– Consider introducing partitions
● Consider moving the dump closer to the target
– e.g. Upload to EC2 instance in the same region as the RDS instance and run pg_restore from there
● Over provision resources
– Gives higher throughput during data load
– Downscale once operational
Strategies to Minimise Downtime
● No one supports external masters!
● Trigger based replication failover
– Slony, Londiste & Bucardo
● Can be used on most any version of PostgreSQL
● Some restrictions apply
– DDL is not supported
– Rows must be uniquely identifiable
Replication Failover
● Presents some risk to production environment
– Initial overhead of replicating each table● Gradually add tables to the configuration to
spread the load
– Per-transaction overhead● Write latency to remote slave● Heavy write workload could lead to high
replication lag
● This also works to replicate out of RDS but not Heroku
Replication Failover
● Web service supporting migration to/from AWS
– Can be RDS or an EC2 instance
– External DB can be anywhere
● Perform a one time load or continual load
– Change Data Capture
● “Supports” heterogeneous migrations
– Oracle, MySQL, SQL Server, MongoDB
– Schema Conversion Tool exists to assist
AWS Database Migration Service
● PostgreSQL as a source database
– User only needs to see all data
● PostgreSQL as a streaming source database
– Tables must have a primary key
– Requires 9.4 with wal_level = logical
– Must have room for a new replication slot
– User must be a SUPERUSER
AWS Database Migration Service
● PostgreSQL as a target database
– Import your schema using pg_dump
– session_replication_role needs to be set to replica
– Set target table preparation mode to DO_NOTHING
● There are drawbacks of using DMS
– Replication lag
– Process is fiddly
AWS Database Migration Service
● Most involved approach, least downtime
● Combines point-in-time recovery with the changes captured by logical decoding to create a replica
● Need to be running at least PostgreSQL 9.4 with WAL level logical and have WAL archiving configured
● DDL not supported, still need unique rows
● Recommend barman for managing WALhttp://www.pgbarman.org/
● Recommend decoder_raw as logical decoding plugingithub.com/michaelpq/pg_plugins/tree/master/decoder_raw
PITR & Logical decoding
1. Create a logical replication slot
SELECT * FROM pg_create_logical_replication_slot ('logical_slot', 'decoder_raw');
2. Note the transaction ID (catalog_xmin)
SELECT catalog_xmin FROM pg_replication_slots WHERE slot_name = ‘logical_slot’;
PITR & Logical decoding
3. Perform a barman backup
$ barman backup master
4. Perform a barman PITR
$ barman recover –target-xid (catalog_xmin - 1) master latest
5. Start database and verify correct recovery
PITR & Logical decoding
5. Perform pg_dump on the readonly barman node
6. Restore to public cloud
7. Read output of logical decoding and write to cloud
PITR & Logical decoding
● Hosted PostgreSQL gives you high performance PostgreSQL without the hassle of hardware, maintenance and configuration
● Opex instead of Capex
● Consider the limitations of your intended platform
● There are multiple options for migration
Summary
Questions?
Please remember to leave feedback about my talk, other talks and the conference in general at
https://2017.pgconf.eu/f
Mike Fowler
gh-mlfowler
mlfowler
mike dot fowler at claranet dot uk
Recommended