-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-630] Improve ST_Union_Aggr performance #1526
[SEDONA-630] Improve ST_Union_Aggr performance #1526
Conversation
Switch to JTS `OverlayNGRobust.union` function to perform geometry union and add geometry cache capability.
@jiayuasu I noticed that after switching from geo.buffer to OverlayNGRobust.union, the complex geometry representation returned from ST_Union_Aggr might change due to the reordering of polygon/polyline vertex. For example: New: POLYGON ((1 0, 0 0, 0 1, 1 1, 2 1, 2 0, 1 0)) Old: POLYGON ((0 0, 0 1, 1 1, 2 1, 2 0, 1 0, 0 0)) They represent the same polygon, but the vertex order has changed. |
I think this is fine. In addition, we want to make sure the behavior of ST_Union_Aggr is similar to PostGIS ST_Union (array variant): https://postgis.net/docs/ST_Union.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangfengcdt Did you see performance improvement using this implementation, compared to the previous one?
Yeah, I am adding some tests to report the performance measure and we can see the improvements for different cases there. |
@jiayuasu I have used the newly added test to measure both old and new runtime for different number of geometries. Here are the results:
I think it shows clearly the new method is much efficient and scalable. |
|SELECT explode(array($polygonArrayStr)) AS geom | ||
""".stripMargin | ||
|
||
sparkSession.sql(sqlQuery).createOrReplaceTempView("geometry_table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you return a reference of the DF as the return value of the function, instead of creating a new temp view? Otherwise this might pollute the global namespace and lead to bugs that are hard to find.
createPolygonDataFrame(numPolygons) | ||
|
||
// cache the table to eliminate the time of table scan | ||
sparkSession.sql("cache table geometry_table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also unpersist this table at the end of the test case? Otherwise this will lead to memory leak.
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
Switch to JTS
OverlayNGRobust.union
function to perform geometry union and add geometry cache capability.https://locationtech.github.io/jts/javadoc/org/locationtech/jts/operation/overlayng/OverlayNGRobust.html
How was this patch tested?
All existing unit tests should pass.
Did this PR include necessary documentation updates?