Upload
giuseppe-broccolo
View
302
Download
1
Embed Size (px)
Citation preview
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Use of indexes on geospatial database with the PostgreSQL
DBMS
Giuseppe Broccolo
www.2ndquadrant.it
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
$~# whoami
• PostgreSQL and PostGIS consultant– Development, Replication, Disaster Recovery, pre-production Benchmark,
Remote DBA, 24/7 Support, Training
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Outline
• Indexes on geospatial DBs
• What does PostgreSQL offer?
• Examples of usage:– Points in PostgreSQL– Points in PostGIS extension– (LiDAR) points in PointCloud extension
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Indexes on geospatial databases
• Binary structure used to speed up accesses to data:–
– In case of trees: balanced/unbalanced structure of nodes
– Theoretical performances:• R/W: ~O(log N) Size: ~O(N)
– Algorithms are not defined by ordering/comparison but placement operators
– Index nodes are defined starting from the MBR containing the whole dataset
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
MBR
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
MBR
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
MBR
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
MBR
Balanced:● R-tree, etc.
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
MBR
Unbalanced:● Kd-tree, Quad-tree, etc.
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
What PostgreSQL offers
• “in core” 2D geometric (not geografic) datatype– Fixed resolution: double precision– point, circle, box– @-@, @@, <->, &&, <<, >>, <<|, |>>, ...
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
What PostgreSQL offers
• PostGIS extension:– geometry, geography
– <@, @>, &&, <<, >>, <<|, |>>, ...– ST_Lenght(), ST_Distance(), ...
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Tree indexes in PostgreSQL• Balanced indexes
– B-Tree– GIN (Generalized Inverted Index) – fast accesses to data – GiST (Generalized Search Tree) – good concurrency, “lossy”
• kNN searches
• Unbalanced index– SP-GiST (Space Partitioned GiST) – low I/O
• Introduced in PostgreSQL 9.2• Usable in PostGIS >2.1
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Tree indexes in PostgreSQL• Balanced indexes
– B-Tree– GIN (Generalized Inverted Index) – fast accesses to data – GiST (Generalized Search Tree) – good concurrency, “lossy”
• kNN searches
• Unbalanced index– SP-GiST (Space Partitioned GiST) – low I/O
• Introduced in PostgreSQL 9.2• Usable in PostGIS >2.1
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Work with 2D points sets
• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k
• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default
• ~10M of points– Nearest Neighbours search – Bounding Box inclusion
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Work with 2D points sets
• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k
• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default
• ~10M of points– Nearest Neighbours search – Bounding Box inclusion
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Work with 2D points sets
• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k
• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default
• ~10M of points– Nearest Neighbours search – Bounding Box inclusion
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Indexes creation on the 2D sample– point datatype supports both GiST and SPGiST indexing
=# CREATE INDEX idx_gist_point ON many_point USING gist(point);
=# CREATE INDEX idx_spgist_point ON many_point USING spgist(point);
– geometry(point,0) datatype supports only GiST indexing
=# CREATE INDEX idx_gist_geom ON many_geom USING gist(point);
=# CREATE INDEX idx_spgist ON many_geom USING spgist(point);
ERROR: data type geometry has no default operator class for access method "spgist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Indexes creation on the 2D sample
index size table size time
idx_gist_point 715MB 653MB 214s
idx_spgist_point 437MB 653MB 137s
idx_gist_geom 523MB 501MB 290s
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Nearest Neighbours search (2D)
– point
SELECT *
FROM many_geom
ORDER BY ST_MakePoint(0.5, 0.5) <-> geom LIMIT 10;
– geometry(point,0)
SELECT *
FROM many_point
ORDER BY point(0.5, 0.5) <-> point LIMIT 10;
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Nearest Neighbours search (2D)• Query timing (without & with indexes):
– point
– geometry(point,0)
planner strategy exec. time
Seq. Scan + Sort 7.3s
planner strategy exec. time
Seq. Scan + Sort 17.2s
planner strategy exec. time
Index Scan (idx_gist_point)
52ms
planner strategy exec. time
Index Scan (idx_gist_geom)
18ms
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Bounding Box inclusion (2D)
– point
SELECT *
FROM many_geom
WHERE point && ST_MakeBox2D(ST_MakePoint(0.4, 0.4), ST_MakePoint(0.6, 0.6));
– geometry(point,0)
SELECT *
FROM many_point
WHERE point <@ box(point(0.4, 0.4), point(0.6, 0.6));
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Bounding Box inclusion (2D)• Query timing (without & with indexes):
– point
– geometry(point,0)
planner strategy exec. time
Seq. Scan + <@ 5.7s
planner strategy exec. time
Seq. Scan + && 2.0s
planner strategy exec. time
Index Scan (idx_spgist_point)
0.4s
planner strategy exec. time
Index Scan (idx_gist_geom)
0.7s
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Bounding Box inclusion (2D)• Query timing (without & with indexes):
– point
– geometry(point,0)
planner strategy exec. time
Seq. Scan + <@ 5.7s
planner strategy exec. time
Seq. Scan + && 2.0s
planner strategy exec. time
Index Scan (idx_spgist_point)
0.4s
planner strategy exec. time
Index Scan (idx_gist_geom)
0.7s
Unbalanced indexes intrinsecally provide boxed sample in their nodes
Used in BB inclusion!!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Work with (many) 3D points in PostgreSQL
• The OpenGeo suite (Boundless – P. Ramsey)– Include postgis and pointcloud extensions
• Casting between the two points datatype is allowed• pointcloud allows to use the patches to reduce the
whole data size
– No packages available to work with PostgreSQL 9.4– Can import LiDAR data from .LAS files
http://suite.opengeo.org/4.1/whatsnew.html
http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/loadingdata.html#loading-with-pdal
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
An example of usage: 1G points cloud
• The test environment:– 16GB RAM, 1TB RAID1 storage, 8 CPU @3.3GHz, PostgreSQL 9.3
• Use the pointcloud extension– one point → one record
• Search points inside a BB and NN
4B 4B 4B 2B
http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/schemas.html
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Build the index
table size GiST index size building time
56GB 59GB 6h
CREATE INDEX pc_gist_idx ON pcpoints USING gist(Geometry(pt));
You have to cast to PostGIS point datatype to use GiST index
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
BB inclusion with 1G points cloud
included points execution time(no index)
execution time(with index)
1M 798s 208ms
10M - 9.27s
100M - 99.7s
300M - 682s
SELECT * FROM pcpoint
WHERE Geometry(pt) &&
ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),
ST_MakePoint(100, 100, 500)), 4326);
Index is always used!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
BB inclusion with 1G points cloud using patches
WITH sel AS (
SELECT PC_Explode(pa) AS pc FROM pcpatch
WHERE ST_SetSRID(ST_GeomFromEWKB(PC_Envelope(pa)), 4326) &&
ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),
ST_MakePoint(100, 100, 500)), 4326)
)
SELECT pc FROM sel
WHERE ST_Within(Geometry(pc),
ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),
ST_MakePoint(100, 100, 500)), 4326));
100k patches 10k points/patch (2h, 9.4GB)
http://suite.opengeo.org/4.1/dataadmin/pointcloud/objects.html#pcpatch
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
BB inclusion with 1G points cloud using patches
included points execution time(search of patches)
execution time(patch explosion)
1M 520ms 3s
10M 3.8s 16.5s
100M 33.8s 150s
So...indexed searchesare faster!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Nearest neighbours search with 1G points cloud
searched points execution time(no index)
execution time(with index)
1M 2000s 1.41s
10M - 13.8s
SELECT *
FROM pcpoints
ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)LIMIT <searched points>;
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Nearest neighbours search with 1G points cloud
searched points execution time(no index)
execution time(with index)
1M 2000s 1.41s
10M - 13.8s
SELECT *
FROM pcpoints
ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)LIMIT <searched points>;
Index blocks in memory are used,
then SeqScanssearched points execution time
100M 2100s
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Conclusions
• PostgreSQL includes many features to work with geospatial entities– 2D in core geometries, PostGIS, PointCloud (, ...)
• Indexes can be successfully used– Improved performances for geospatial entities introduced with PostGIS
• Waiting for SP-GiST indexes (PostGIS >2.1)
• Performances achievable for higher number of entries show that geospatial features in the PostgreSQL DBMS can be suitable for the range 100M-1G
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Questions?
• @giubro
• gemini__81
• gbroccolo7
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
FOSS4G.EU 2015Como, Politecnico di Milano
July 14th-17th 2015
Creative Commons License
Copyright 2012-2015,
2ndQuadrant Italia - http://www.2ndquadrant.it
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License