34
2ndQuadrant Italia Giuseppe Broccolo – [email protected] FOSS4G.EU 2015 Como, Politecnico di Milano July 14 th -17 th 2015 Use of indexes on geospatial database with the PostgreSQL DBMS Giuseppe Broccolo www.2ndquadrant.it

gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

Embed Size (px)

Citation preview

Page 1: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Use of indexes on geospatial database with the PostgreSQL

DBMS

Giuseppe Broccolo

www.2ndquadrant.it

Page 2: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

$~# whoami

• PostgreSQL and PostGIS consultant– Development, Replication, Disaster Recovery, pre-production Benchmark,

Remote DBA, 24/7 Support, Training

Page 3: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Outline

• Indexes on geospatial DBs

• What does PostgreSQL offer?

• Examples of usage:– Points in PostgreSQL– Points in PostGIS extension– (LiDAR) points in PointCloud extension

Page 4: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Indexes on geospatial databases

• Binary structure used to speed up accesses to data:–

– In case of trees: balanced/unbalanced structure of nodes

– Theoretical performances:• R/W: ~O(log N) Size: ~O(N)

– Algorithms are not defined by ordering/comparison but placement operators

– Index nodes are defined starting from the MBR containing the whole dataset

Page 5: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Page 6: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Page 7: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Page 8: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Balanced:● R-tree, etc.

Page 9: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Unbalanced:● Kd-tree, Quad-tree, etc.

Page 10: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

What PostgreSQL offers

• “in core” 2D geometric (not geografic) datatype– Fixed resolution: double precision– point, circle, box– @-@, @@, <->, &&, <<, >>, <<|, |>>, ...

Page 11: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

What PostgreSQL offers

• PostGIS extension:– geometry, geography

– <@, @>, &&, <<, >>, <<|, |>>, ...– ST_Lenght(), ST_Distance(), ...

Page 12: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Tree indexes in PostgreSQL• Balanced indexes

– B-Tree– GIN (Generalized Inverted Index) – fast accesses to data – GiST (Generalized Search Tree) – good concurrency, “lossy”

• kNN searches

• Unbalanced index– SP-GiST (Space Partitioned GiST) – low I/O

• Introduced in PostgreSQL 9.2• Usable in PostGIS >2.1

Page 13: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Tree indexes in PostgreSQL• Balanced indexes

– B-Tree– GIN (Generalized Inverted Index) – fast accesses to data – GiST (Generalized Search Tree) – good concurrency, “lossy”

• kNN searches

• Unbalanced index– SP-GiST (Space Partitioned GiST) – low I/O

• Introduced in PostgreSQL 9.2• Usable in PostGIS >2.1

Page 14: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with 2D points sets

• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k

• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default

• ~10M of points– Nearest Neighbours search – Bounding Box inclusion

Page 15: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with 2D points sets

• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k

• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default

• ~10M of points– Nearest Neighbours search – Bounding Box inclusion

Page 16: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with 2D points sets

• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k

• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default

• ~10M of points– Nearest Neighbours search – Bounding Box inclusion

Page 17: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Indexes creation on the 2D sample– point datatype supports both GiST and SPGiST indexing

=# CREATE INDEX idx_gist_point ON many_point USING gist(point);

=# CREATE INDEX idx_spgist_point ON many_point USING spgist(point);

– geometry(point,0) datatype supports only GiST indexing

=# CREATE INDEX idx_gist_geom ON many_geom USING gist(point);

=# CREATE INDEX idx_spgist ON many_geom USING spgist(point);

ERROR: data type geometry has no default operator class for access method "spgist"

HINT: You must specify an operator class for the index or define a default operator class for the data type.

Page 18: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Indexes creation on the 2D sample

index size table size time

idx_gist_point 715MB 653MB 214s

idx_spgist_point 437MB 653MB 137s

idx_gist_geom 523MB 501MB 290s

Page 19: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest Neighbours search (2D)

– point

SELECT *

FROM many_geom

ORDER BY ST_MakePoint(0.5, 0.5) <-> geom LIMIT 10;

– geometry(point,0)

SELECT *

FROM many_point

ORDER BY point(0.5, 0.5) <-> point LIMIT 10;

Page 20: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest Neighbours search (2D)• Query timing (without & with indexes):

– point

– geometry(point,0)

planner strategy exec. time

Seq. Scan + Sort 7.3s

planner strategy exec. time

Seq. Scan + Sort 17.2s

planner strategy exec. time

Index Scan (idx_gist_point)

52ms

planner strategy exec. time

Index Scan (idx_gist_geom)

18ms

Page 21: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Bounding Box inclusion (2D)

– point

SELECT *

FROM many_geom

WHERE point && ST_MakeBox2D(ST_MakePoint(0.4, 0.4), ST_MakePoint(0.6, 0.6));

– geometry(point,0)

SELECT *

FROM many_point

WHERE point <@ box(point(0.4, 0.4), point(0.6, 0.6));

Page 22: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Bounding Box inclusion (2D)• Query timing (without & with indexes):

– point

– geometry(point,0)

planner strategy exec. time

Seq. Scan + <@ 5.7s

planner strategy exec. time

Seq. Scan + && 2.0s

planner strategy exec. time

Index Scan (idx_spgist_point)

0.4s

planner strategy exec. time

Index Scan (idx_gist_geom)

0.7s

Page 23: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Bounding Box inclusion (2D)• Query timing (without & with indexes):

– point

– geometry(point,0)

planner strategy exec. time

Seq. Scan + <@ 5.7s

planner strategy exec. time

Seq. Scan + && 2.0s

planner strategy exec. time

Index Scan (idx_spgist_point)

0.4s

planner strategy exec. time

Index Scan (idx_gist_geom)

0.7s

Unbalanced indexes intrinsecally provide boxed sample in their nodes

Used in BB inclusion!!

Page 24: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with (many) 3D points in PostgreSQL

• The OpenGeo suite (Boundless – P. Ramsey)– Include postgis and pointcloud extensions

• Casting between the two points datatype is allowed• pointcloud allows to use the patches to reduce the

whole data size

– No packages available to work with PostgreSQL 9.4– Can import LiDAR data from .LAS files

http://suite.opengeo.org/4.1/whatsnew.html

http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/loadingdata.html#loading-with-pdal

Page 25: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

An example of usage: 1G points cloud

• The test environment:– 16GB RAM, 1TB RAID1 storage, 8 CPU @3.3GHz, PostgreSQL 9.3

• Use the pointcloud extension– one point → one record

• Search points inside a BB and NN

4B 4B 4B 2B

http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/schemas.html

Page 26: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Build the index

table size GiST index size building time

56GB 59GB 6h

CREATE INDEX pc_gist_idx ON pcpoints USING gist(Geometry(pt));

You have to cast to PostGIS point datatype to use GiST index

Page 27: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

BB inclusion with 1G points cloud

included points execution time(no index)

execution time(with index)

1M 798s 208ms

10M - 9.27s

100M - 99.7s

300M - 682s

SELECT * FROM pcpoint

WHERE Geometry(pt) &&

ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),

ST_MakePoint(100, 100, 500)), 4326);

Index is always used!

Page 28: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

BB inclusion with 1G points cloud using patches

WITH sel AS (

SELECT PC_Explode(pa) AS pc FROM pcpatch

WHERE ST_SetSRID(ST_GeomFromEWKB(PC_Envelope(pa)), 4326) &&

ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),

ST_MakePoint(100, 100, 500)), 4326)

)

SELECT pc FROM sel

WHERE ST_Within(Geometry(pc),

ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),

ST_MakePoint(100, 100, 500)), 4326));

100k patches 10k points/patch (2h, 9.4GB)

http://suite.opengeo.org/4.1/dataadmin/pointcloud/objects.html#pcpatch

Page 29: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

BB inclusion with 1G points cloud using patches

included points execution time(search of patches)

execution time(patch explosion)

1M 520ms 3s

10M 3.8s 16.5s

100M 33.8s 150s

So...indexed searchesare faster!

Page 30: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest neighbours search with 1G points cloud

searched points execution time(no index)

execution time(with index)

1M 2000s 1.41s

10M - 13.8s

SELECT *

FROM pcpoints

ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)LIMIT <searched points>;

Page 31: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest neighbours search with 1G points cloud

searched points execution time(no index)

execution time(with index)

1M 2000s 1.41s

10M - 13.8s

SELECT *

FROM pcpoints

ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)LIMIT <searched points>;

Index blocks in memory are used,

then SeqScanssearched points execution time

100M 2100s

Page 32: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Conclusions

• PostgreSQL includes many features to work with geospatial entities– 2D in core geometries, PostGIS, PointCloud (, ...)

• Indexes can be successfully used– Improved performances for geospatial entities introduced with PostGIS

• Waiting for SP-GiST indexes (PostGIS >2.1)

• Performances achievable for higher number of entries show that geospatial features in the PostgreSQL DBMS can be suitable for the range 100M-1G

Page 33: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Questions?

[email protected]

• @giubro

• gemini__81

• gbroccolo7

Page 34: gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.EU 2015

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Creative Commons License

Copyright 2012-2015,

2ndQuadrant Italia - http://www.2ndquadrant.it

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License