You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given a set of polygons, I'm trying to get all pairs of polygons whose intersection is at least some (non-zero) number.
My first query yields a IE_JOIN, which is good and indeed quite fast, but the second query (the same as first, just adding where ST_Area(ST_Intersection(p1.geom, p2.geom)) > 10) yields a BLOCKWISE_NL_JOIN which is very slow. I've tried to trick duckdb using a with construct, but to no avail.
These plans are from a test db (see below) but I've spotted the issue on a 200GB on my laptop so I can confirm that's it's not duckdb doing smart stuff because the data is small. Also, I can confirm that saving the result of the first query in a table and then performing the area filtering on this table is much faster than queries 2 or 3.
Actual results: queries 2 and 3 use BLOCKWISE_NL_JOIN Expected results: queries 2 and 3 use IE_JOIN
import duckdb
import random
import math
def generate_valid_polygon(num_sides, min_x, max_x, min_y, max_y):
# Generate points in polar coordinates centered within or around the bounding box
center_x = (min_x + max_x) / 2
center_y = (min_y + max_y) / 2
radius = (max_x - min_x) / 2
angles = sorted([random.uniform(0, 2 * math.pi) for _ in range(num_sides)])
vertices = [
(
center_x + random.uniform(radius * 0.5, radius * 1.5) * math.cos(angle),
center_y + random.uniform(radius * 0.5, radius * 1.5) * math.sin(angle)
)
for angle in angles
]
# Ensure the polygon is closed by repeating the first point at the end
vertices.append(vertices[0])
return vertices
def main(num_polygons, num_sides, min_x, max_x, min_y, max_y):
# Initialize an in-memory DuckDB instance
con = duckdb.connect('polygons.db')
con.execute("INSTALL 'spatial';")
con.execute("LOAD 'spatial';")
# Create a table for storing polygon data as text initially
con.execute("CREATE TABLE IF NOT EXISTS polygon_data (id INTEGER, polygon TEXT)")
# Generate N valid polygons and insert them into the database
for i in range(1, num_polygons + 1):
polygon_vertices = generate_valid_polygon(num_sides, min_x, max_x, min_y, max_y)
wkt = 'POLYGON((' + ', '.join(f"{x:.2f} {y:.2f}" for x, y in polygon_vertices) + '))'
con.execute("INSERT INTO polygon_data (id, polygon) VALUES (?, ?)", (i, wkt))
# Alter the table to add a new geometry column and populate it
con.execute("ALTER TABLE polygon_data ADD COLUMN geom GEOMETRY;")
con.execute("UPDATE polygon_data SET geom = ST_MakeValid(ST_GeomFromText(polygon));")
# Verify the conversion by selecting and printing some geometric data
result = con.execute("SELECT id, ST_AsText(geom) FROM polygon_data LIMIT 10").fetchall()
for row in result:
print(row)
# Close the connection
con.close()
# Parameters
num_polygons = 100_000 # Number of polygons to generate
num_sides = 6 # Number of sides for each polygon
min_x, max_x = 0, 100 # Define bounding box
min_y, max_y = 0, 100
# Run the script
main(num_polygons, num_sides, min_x, max_x, min_y, max_y)
The text was updated successfully, but these errors were encountered:
Hello!
This is currently expected, we only turn the join into an I.E. join if there is a single spatial predicate as the join condition.
However, it would theoretically be possible to extract other simple predicates from the join and apply them afterwards as filters, but that would require more advanced query planning and optimizer logic to weigh the different alternatives (either do a normal predicate join and apply the spatial filter afterwards, or do a spatial I.E join and apply the filter afterwards). Im not sure we have the capabilities to do that intelligently just yet given that we don't have any spatial statistics for the geometry type to estimate selectivity with (and DuckDB doesn't support custom statistics for custom types yet either). But we can leave this issue open as a tracking issue to implement this functionality in the future.
Given a set of polygons, I'm trying to get all pairs of polygons whose intersection is at least some (non-zero) number.
My first query yields a IE_JOIN, which is good and indeed quite fast, but the second query (the same as first, just adding
where ST_Area(ST_Intersection(p1.geom, p2.geom)) > 10
) yields a BLOCKWISE_NL_JOIN which is very slow. I've tried to trick duckdb using awith
construct, but to no avail.These plans are from a test db (see below) but I've spotted the issue on a 200GB on my laptop so I can confirm that's it's not duckdb doing smart stuff because the data is small. Also, I can confirm that saving the result of the first query in a table and then performing the area filtering on this table is much faster than queries 2 or 3.
Actual results: queries 2 and 3 use BLOCKWISE_NL_JOIN
Expected results: queries 2 and 3 use IE_JOIN
You can generate dummy data with this script:
The text was updated successfully, but these errors were encountered: