How can I optimize a Point in Polygon query for millions of points when most of the points lie within the...
I have 150 million points in a point table and would like to find the few points lying outside a given polygon geometry. I know that 99.9% of the points are within the polygon geometry. I am interested in finding the few points which lie outside the polygon.
My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?
SELECT COUNT(*)
FROM italy_points pt
JOIN borders poly
ON ST_WITHIN (pt.the_geom, poly.geom)
WHERE poly.iso3 = 'ITA';
The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)
postgis postgresql
|
show 1 more comment
I have 150 million points in a point table and would like to find the few points lying outside a given polygon geometry. I know that 99.9% of the points are within the polygon geometry. I am interested in finding the few points which lie outside the polygon.
My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?
SELECT COUNT(*)
FROM italy_points pt
JOIN borders poly
ON ST_WITHIN (pt.the_geom, poly.geom)
WHERE poly.iso3 = 'ITA';
The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)
postgis postgresql
2
Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 '18 at 17:02
The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 '18 at 19:56
@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need onlyGROUP BY
the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in theSELECT
clause that come from a table where the primary key is included in theGROUP BY
clause.)
– jpmc26
Dec 17 '18 at 21:02
@WhiteboxDevST_Within
already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 '18 at 21:05
@jpmc26 Certainly, but the SQL query would also need to be modified to useST_Intersects
, sinceST_Within
would not reliably match internal boundary conditions.
– Vince
Dec 17 '18 at 21:52
|
show 1 more comment
I have 150 million points in a point table and would like to find the few points lying outside a given polygon geometry. I know that 99.9% of the points are within the polygon geometry. I am interested in finding the few points which lie outside the polygon.
My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?
SELECT COUNT(*)
FROM italy_points pt
JOIN borders poly
ON ST_WITHIN (pt.the_geom, poly.geom)
WHERE poly.iso3 = 'ITA';
The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)
postgis postgresql
I have 150 million points in a point table and would like to find the few points lying outside a given polygon geometry. I know that 99.9% of the points are within the polygon geometry. I am interested in finding the few points which lie outside the polygon.
My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?
SELECT COUNT(*)
FROM italy_points pt
JOIN borders poly
ON ST_WITHIN (pt.the_geom, poly.geom)
WHERE poly.iso3 = 'ITA';
The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)
postgis postgresql
postgis postgresql
edited Dec 17 '18 at 22:25
Prithvi
asked Dec 17 '18 at 16:49
PrithviPrithvi
433
433
2
Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 '18 at 17:02
The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 '18 at 19:56
@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need onlyGROUP BY
the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in theSELECT
clause that come from a table where the primary key is included in theGROUP BY
clause.)
– jpmc26
Dec 17 '18 at 21:02
@WhiteboxDevST_Within
already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 '18 at 21:05
@jpmc26 Certainly, but the SQL query would also need to be modified to useST_Intersects
, sinceST_Within
would not reliably match internal boundary conditions.
– Vince
Dec 17 '18 at 21:52
|
show 1 more comment
2
Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 '18 at 17:02
The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 '18 at 19:56
@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need onlyGROUP BY
the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in theSELECT
clause that come from a table where the primary key is included in theGROUP BY
clause.)
– jpmc26
Dec 17 '18 at 21:02
@WhiteboxDevST_Within
already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 '18 at 21:05
@jpmc26 Certainly, but the SQL query would also need to be modified to useST_Intersects
, sinceST_Within
would not reliably match internal boundary conditions.
– Vince
Dec 17 '18 at 21:52
2
2
Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 '18 at 17:02
Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 '18 at 17:02
The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 '18 at 19:56
The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 '18 at 19:56
@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only
GROUP BY
the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT
clause that come from a table where the primary key is included in the GROUP BY
clause.)– jpmc26
Dec 17 '18 at 21:02
@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only
GROUP BY
the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT
clause that come from a table where the primary key is included in the GROUP BY
clause.)– jpmc26
Dec 17 '18 at 21:02
@WhiteboxDev
ST_Within
already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.– jpmc26
Dec 17 '18 at 21:05
@WhiteboxDev
ST_Within
already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.– jpmc26
Dec 17 '18 at 21:05
@jpmc26 Certainly, but the SQL query would also need to be modified to use
ST_Intersects
, since ST_Within
would not reliably match internal boundary conditions.– Vince
Dec 17 '18 at 21:52
@jpmc26 Certainly, but the SQL query would also need to be modified to use
ST_Intersects
, since ST_Within
would not reliably match internal boundary conditions.– Vince
Dec 17 '18 at 21:52
|
show 1 more comment
1 Answer
1
active
oldest
votes
Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.
Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).
4
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "79"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f306295%2fhow-can-i-optimize-a-point-in-polygon-query-for-millions-of-points-when-most-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.
Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).
4
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
add a comment |
Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.
Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).
4
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
add a comment |
Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.
Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).
Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.
Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).
answered Dec 17 '18 at 17:00
MesaMesa
1,066615
1,066615
4
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
add a comment |
4
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
4
4
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 '18 at 17:27
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 '18 at 22:28
add a comment |
Thanks for contributing an answer to Geographic Information Systems Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f306295%2fhow-can-i-optimize-a-point-in-polygon-query-for-millions-of-points-when-most-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 '18 at 17:02
The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 '18 at 19:56
@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only
GROUP BY
the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in theSELECT
clause that come from a table where the primary key is included in theGROUP BY
clause.)– jpmc26
Dec 17 '18 at 21:02
@WhiteboxDev
ST_Within
already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.– jpmc26
Dec 17 '18 at 21:05
@jpmc26 Certainly, but the SQL query would also need to be modified to use
ST_Intersects
, sinceST_Within
would not reliably match internal boundary conditions.– Vince
Dec 17 '18 at 21:52