32
Введение в современную PostgreSQL. Часть 2 ДЕНИС ПИРШТУК, INDATA LABS SLONIK

Введение в современную PostgreSQL. Часть 2

Embed Size (px)

Citation preview

Введение в

современную

PostgreSQL.

Часть 2

ДЕНИС ПИРШТУК,

INDATA LABS

SLONIK

ТИПЫ ИНДЕКСОВ

postgres=# select amname from pg_catalog.pg_am;

• btree ― balanced tree (по умолчанию)

• hash

• gist ― generalized search tree

• gin ― generalized inverted index

• spgist ― space-partitioned GiST

• brin ― block range index

2http://www.postgresql.org/docs/9.1/static/textsearch-indexes.html

СХЕМА ТАБЛИЦЫ GITHUB_EVENTS

Column | Type | Modifiers | Storage | Stats target | Description

--------------+-----------------------------+-----------+----------+--------------+------------

event_id | bigint | | plain | |

event_type | text | | extended | |

event_public | boolean | | plain | |

repo_id | bigint | | plain | |

payload | jsonb | | extended | |

repo | jsonb | | extended | |

actor | jsonb | | extended | |

org | jsonb | | extended | |

created_at | timestamp without time zone | | plain | |

3

СОЗДАНИЕ ИНДЕКСА

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ][ [ IF NOT EXISTS ] name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ]

4

ВЫБОРКА БЕЗ ИНДЕКСА

meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM github_eventsWHERE 3488850707 < event_id AND event_id < 3488880707;

------------------------------------------------------------------Seq Scan on github_events (cost=0.00..265213.33 rows=13185 width=8) (actual time=0.008..495.324 rows=12982 loops=1)

Filter: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))

Rows Removed by Filter: 2040200

Planning time: 0.189 ms

Execution time: 504.053 ms

5

ПРОСТОЙ ИНДЕКС

CREATE UNIQUE INDEX event_id_idx ON github_events(event_id);

meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM github_eventsWHERE 3488850707 < event_id AND event_id < 3488880707;

------------------------------------------------------------------Index Scan using event_id_idx on github_events(cost=0.43..1921.28 rows=13187 width=8) (actual time=0.024..12.544 rows=12982 loops=1)

Index Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))

Planning time: 0.190 ms

Execution time: 21.130 ms

6

ОБЫЧНЫЙ ИНДЕКС

CREATE UNIQUE INDEX event_id_idx ON github_events(event_id);

--------------------------------

Index Scan using event_id_idx on github_events(cost=0.43..1921.28 rows=13187 width=8) (actual time=0.037..12.485 rows=12982 loops=1)

Index Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))

Planning time: 0.186 ms

Execution time: 21.222 ms

7

СОСТАВНОЙ ИНДЕКС

CREATE UNIQUE INDEX event_id_idx

ON github_events(event_id, repo_id);

8

ПОКРЫВАЮЩИЙ ИНДЕКС

• Меньше размер индекса

• Меньше издержек на обновление

• Быстрее планирование и поиск

• Для включенных столбцов не нужен opclass

• Фильтр по включенным столбцам

CREATE UNIQUE INDEX event_id_idx2 ON github_events(event_id) INCLUDING (repo_id);

https://pgconf.ru/media/2016/02/19/4_Lubennikova_B-tree_pgconf.ru_3.0%20(1).pdf

9

ПОКРЫВАЮЩИЙ ИНДЕКС

meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM

github_events WHERE 3488850707 < event_id AND event_id < 3488880707;

---------------------------------------

Index Only Scan using event_id_idx2 on github_events

(cost=0.43..23764.29 rows=13187 width=8) (actual time=0.032..12.533

rows=12982 loops=1)

Index Cond: ((event_id > '3488850707'::bigint) AND (event_id <

'3488880707'::bigint))

Heap Fetches: 12982

Planning time: 0.178 ms

Execution time: 21.147 ms

10

BRIN-ИНДЕКС

CREATE INDEX event_id_brin_idx ON github_event USING(event_id);

--------------------------------

Bitmap Heap Scan on github_events (cost=175.16..42679.52 rows=13187 width=8) (actual time=0.824..1

5.489 rows=12982 loops=1)

Recheck Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))

Rows Removed by Index Recheck: 13995

Heap Blocks: lossy=3072

-> Bitmap Index Scan on event_id_brin_idx (cost=0.00..171.87 rows=13187 width=0) (actual time=0

.698..0.698 rows=30720 loops=1)

Index Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))

Planning time: 0.094 ms

Execution time: 24.421 ms11

РАЗНИЦА?

Размер:

Обычный: 44 MB

BRIN: 80kB

ЦЕНА ОБНОВЛЕНИЯ???

12

CSTORE_FDW

• Inspired by Optimized Row Columnar (ORC) format

developed by Hortonworks.

• Compression: Reduces in-memory and on-disk data size

by 2-4x. Can be extended to support different codecs.

• Column projections: Only reads column data relevant to

the query. Improves performance for I/O bound queries.

• Skip indexes: Stores min/max statistics for row groups,

and uses them to skip over unrelated rows.

13

CSTORE_FDW

CREATE FOREIGN TABLE cstored_github_events (

event_id bigint,

event_type text,

event_public boolean,

repo_id bigint,

payload jsonb,

repo jsonb, actor jsonb,

org jsonb,

created_at timestamp

)

SERVER cstore_server

OPTIONS(compression 'pglz');

INSERT INTO cstored_github_events (SELECT * FROM github_events);

ANALYZE cstored_github_events;

14

ТИПИЧНЫЙ ЗАПРОС

meetup_demo=# EXPLAIN ANALYZE SELECT repo_id, count(*) FROM cstored_github_events WHERE created_at BETWEEN timestamp

'2016-01-02 01:00:00' AND timestamp '2016-01-02 23:00:00' GROUP BY repo_id ORDER BY 2 DESC;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------

Sort (cost=75153.59..75221.43 rows=27137 width=8) (actual time=950.085..1030.283 rows=106145 loops=1)

Sort Key: (count(*)) DESC

Sort Method: quicksort Memory: 8048kB

-> HashAggregate (cost=72883.86..73155.23 rows=27137 width=8) (actual time=772.445..861.162 rows=106145 loops=1)

Group Key: repo_id

-> Foreign Scan on cstored_github_events (cost=0.00..70810.84 rows=414603 width=8) (actual time=4.762..382.302 rows=413081 loops=1)

Filter: ((created_at >= '2016-01-02 01:00:00'::timestamp without time zone) AND (created_at <= '2016-01-02 23:00:00'::timestamp without time zone))

Rows Removed by Filter: 46919

CStore File: /var/lib/pgsql/9.5/data/cstore_fdw/18963/1236161

CStore File Size: 1475036725

Planning time: 0.126 ms

Execution time: 1109.248 ms

15

НЕ ВСЕГДА КАК В РЕКЛАМЕ

SELECT

pg_size_pretty(cstore_table_size('cstored_github_events'));

1407 MB

SELECT pg_size_pretty(pg_table_size('github_events'));

2668 MB

16

POSTGRESQL 9.5:

FOREIGN TABLE INHERITANCE

• Fast INSERT and look-ups into current table.

• Periodically move data to archive table for compression.

• Query both via main table.

• Combined row-based and columnar store

17

КЛАСТЕРИЗАЦИЯ

SELECT retweet_count FROM contest WHERE "user.id" = 13201312;

Time: 120.743 ms

CREATE INDEX user_id_post_id ON contest("user.id" ASC, "id" DESC);

CLUSTER contest USING user_id_post_id;

VACUUM contest;

Time: 4.128 ms

18https://github.com/reorg/pg_repack

There isno CLUSTER statementin the SQL standard.

bloating

ЧТО ЕЩЕ?

• UPSERT: INSERT… ON CONFLICT DO NOTHING/UPDATE (9.5)

• Частичные индексы (9.2)

• Материализованные представления (9.3)

19

ПРОФИЛИРОВАНИЕ И DBA

• pg_stat_statements, pg_stat_activity, pg_buffercache

• https://github.com/PostgreSQL-Consulting/pg-utils

• https://github.com/ankane/pghero

• Множество полезных запросов на wiki PostgreSQL

• https://wiki.postgresql.org/wiki/Show_database_bloat

20

PG-UTILS

• query_stat_cpu_time.sql, query_stat_io_time.sql, query_stat_rows.sql, query_stat_time.sql

• low_used_indexes

• seq_scan_tables

• table_candidates_from_ssd.sql / table_candidates_to_ssd.sql

• index_disk_activity.sql

• table_disk_activity

• table_index_write_activity.sql / table_write_activity.sql

21

JSONB

CREATE INDEX login_idx ON github_events USING btree((org->>'login'));

CREATE INDEX login_idx2 ON github_events USING gin(org jsonb_value_path_ops);

jsonb_path_value_ops(hash(path_item_1.path_item_2. ... .path_item_n); value)

jsonb_value_path_ops

(value; bloom(path_item_1) | bloom(path_item_2) | ... | bloom(path_item_n))

22

JSQUERY

CREATE TABLE js (

id serial,

data jsonb,

CHECK (data @@ '

name IS STRING AND

similar_ids.#: IS NUMERIC AND

points.#:(x IS NUMERIC AND y IS NUMERIC)':: jsquery));

23

МАСШТАБИРУЕМОСТЬ POSTGRESQL

24

ВЕРТИКАЛЬНАЯ

(POSTGRESPRO, POWER 8)

25

НУЖНО ВЫБИРАТЬ …

26

27

ВАРИАНТЫ

28

POSTGRES-XL

29

https://habrahabr.ru/post/253017/

http://www.postgres-xl.org/overview/

16 МАШИНОК VS 1 МАШИНКА

30

31

СПАСИБО!

32