-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove GEOS dependency in favor of Shapely #2083
Conversation
Good idea! That reduced it by ~20%. But, we are still creating a lot of Points and Linestrings to do the covers test further on, so we need to think about creating the array of all geometries as soon as possible if we can. This would also greatly help the pyproj transforms too... |
I'd be willing to pay a significant speed penalty (with the hopes of getting it back later) in order to remove the headache that is depending on Shapely while simultaneously linking to GEOS. |
lib/cartopy/trace.pyx
Outdated
# TODO: Avoid create-destroy | ||
coords = GEOSCoordSeq_create_r(handle, 1, 2) | ||
GEOSCoordSeq_setX_r(handle, coords, 0, point.x) | ||
GEOSCoordSeq_setY_r(handle, coords, 0, point.y) | ||
g_point = GEOSGeom_createPoint_r(handle, coords) | ||
state = (POINT_IN | ||
if GEOSPreparedCovers_r(handle, gp_domain, g_point) | ||
else POINT_OUT) | ||
GEOSGeom_destroy_r(handle, g_point) | ||
g_point = sgeom.Point((point.x, point.y)) | ||
state = POINT_IN if gp_domain.covers(g_point) else POINT_OUT | ||
del g_point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this create-destroy Point overhead, shapely could have some predicates specialized for taking x, y coordinates instead of a Point.
Coincidentally, GEOS recently added this to their C API as well, so I experimented with this a bit in shapely: shapely/shapely#1548 (and this improvement wouldn't be tied to the GEOS version).
With that, if the user has shapely 2.0 installed, the above could be replaced with:
state = POINT_IN if shapely.intersects_xy(gp_domain.context, point.x, point.y) else POINT_OUT
(intersects and covers should be the same in case of polygon/point predicate)
Testing this, this seems to give around 15-20% speedup on the time_project_linear
benchmark (on my laptop, going from around 6s to 5s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this actually made it into shapely 2.0, and thus could be used (conditionally on the shapely version, if you are OK with adding some different code branches depending on the shapely version)
If it means I don't have to build GEOS, I would take a 2x slowdown any day :) |
098421f
to
c169e3a
Compare
After a long delay, I finally got around to pushing up some commits I've had around for a while. I added a new commit that transforms all of the source points in one call immediately to remove some of the repetitive If we want to be more like basemap and transform everything immediately and not interpolate between the transformed points, we can get a factor of 2x improvement. See this quick commit I was using for testing: greglucas@c052f63 My On the final one where we've removed all interpolation/bisection, the slowest functions are all within Shapely, so we could focus effort in refactoring the shapely geometry handling then. |
c169e3a
to
428d90b
Compare
I think the bisection/interpolation was an intentional feature of Cartopy, so we shouldn't just blindly remove it. How hard would it be to add an option (somewhere) to disable it? The 20% sounds gain sounds great, and honestly the win in terms of making this project more sustainable, simplifying packaging, and eliminating a solid 50% of the support questions is worth the current performance impact. |
💯 agree on all fronts. I'm hopeful that this can be a short-term hit in performance, but lead to a more sustainable project and getting people to be able to contribute.
I agree we should not remove it (I have some ideas for how to do it better still), that commit is just a demonstration of how it could work and the speed we can gain. Thinking more about it, it is analogous to the cartopy/lib/cartopy/mpl/geoaxes.py Lines 322 to 334 in 8c31a15
So, I can see adding that decorator to more functions, or even bringing that up a level and adding a property to Projection that would signal us to transform things immediately for the user.
|
That seems reasonable. |
Any other thoughts from people on this PR and the trade-off of the speed decrease vs ease of installation? @QuLogic this was mostly your work, so did you have a preference one way or the other on it? |
Just a ping to any interested parties for comment (@QuLogic ) since the release of Shapely 2.0 is imminent. |
As another speed-up idea, I am wondering: how many of the projections / CRS classes have a |
I like all of these thoughts for speed-ups! My question is do we want to keep implementing the speed-ups before accepting a PR, or do we merge something a bit slower for now and hope that people open PRs with improvements in the future? My take is that it would be good to get this in and then start whittling away on the speed-ups later (don't let perfect be the enemy of good). There are quite a few speed-ups to be had and I think they can be done in parallel by people with expertise in different areas too. (I've convinced myself that our bisection routines are bad performance-wise and not true bisection and that is the root of the problem. We go back and forth bisecting between points until a straight-enough segment is found, then set that to the new start or end point and restart bisecting. This means if we've already overshot in a previous iteration, we could try that same point again in the next iteration. I think we should really do recursive bisection adding points as we go down the recursive path avoiding any duplicate project_point + overlaps calls.) |
If someone has plans to immediately implement some speed-ups, I'd entertain waiting. Otherwise, while I'm sure we'll catch some flack for being even slower, I think finally being able to upload wheels and eliminating whole classes of installation issue outweighs that. Maybe it being released slow will motivate some contributions. 😉 |
One can dream 😄 (that is my hope too) |
We, conda uses, underestimate the value of this. Cartopy really need wheels ASAP! |
Latest commit adds the version gate to avoid the create/destroy Point geometry as @jorisvandenbossche suggested. I wasn't able to see much of a speed-up locally with that though. There is another LineString create/destroy for the covers()/disjoint() calls that is likely still the slow path. |
My suggestions were certainly just ideas from looking at the code that I wanted to note, not things that have to be done in this PR (and anyway it's not up to me to decide on that ;)) I would personally also say that being able to drop the build dependency on GEOS will be a huge benefit for installation, and seems worth some (temporary) speed degradation. |
bed3576
to
c626695
Compare
I've taken a stab at upstreaming some performance updates to pyproj, which would be able to more than offset this (i.e. faster than current main). We are definitely taking a performance hit here, but I still think the benefits are worth it and then we can start whittling away on other performance improvements. Anyone willing to review/approve and hit the merge button ;) ? |
I'm willing to hit it as soon as I carve out some cycles to actually review the code. |
c626695
to
c119f34
Compare
|
||
# Each line resulting from the split should start and end with a | ||
# segment that crosses the boundary when extended to double length. | ||
def assert_close_to_boundary(xy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit changes this. As far as I can tell we don't extend a line segment by 2x the distance to attach to a boundary anywhere, rather we just project along the boundary to find the closest point. So really we just want to be "close" and not be in the middle of the domain. This is relevant because some segments may be very close to the boundary, but when extended they don't go very far because the points are too closely spaced.
@bjlittle does IRIS have any issues with removing GEOS? Does this slow things down too much, and are there any additional failures that this introduces for all of you? |
@greglucas We also have @trexfeathers Would it be possible to benchmark against this PR pre-merge? That might be pretty informative. Fancy making that happen and reporting the outcome, if you have capacity? |
INSTALL
Outdated
written in C++. | ||
**Matplotlib** 3.2 or later (https://matplotlib.org/) | ||
Python package for 2D plotting. Python package required for any | ||
graphical capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matplotlib
is duplicated. This changeset can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks for catching that!
2ed9841
to
26a6094
Compare
Does anyone have other thoughts on this PR? What performance penalty is the maximum we would take for easier installations, or what else we would need to consider to keep this moving forward? |
As a user I would say < 2x performance hit is more than acceptable given the ease of installation. Also, once it's out in the field it'll be easier to notice where the performance hits are really mattering, right? |
In all honesty, given the resources we have for maintaining this project, I think the only requirement I have is that it still works. This is a huge win for packaging and maintainability. |
26a6094
to
8b0b7eb
Compare
OK, some updated numbers for everyone then. Running the benchmark from the initial comment (project_linear), I get 1.85s on Running the test suite locally I get within a second of each other. On CI it looks like there is ~50% difference between this branch and the most recent merge to main. (Look at the testing time explicitly because the installation time takes forever) |
Sounds more than acceptable to me ! |
It looks like I do have the ability to merge without any approvals. With the lack of people and time to review this PR I think we should push forward and merge this since we have new numpy, scipy, and matplotlib releases recently that it would be nice to push an update out for. I'll give this 2 more weeks and someone can throw a request changes to block me from self-merging if they have strong opinions against that, otherwise since no one has spoken up adamantly against this I think it is time to move forward with getting this in. |
This avoids __getitem__ calls from the shapely geometry and converts to numpy array and memory views within cython.
This transforms all points in the source geometry right away, rather than one-by-one later on. This saves some calls to pyproj's transform() method, which added some overhead.
With Shapely 2.0 we can avoid the create/destroy point for checking whether a point is within a geometry.
If we are going to a rectangular domain and all initial destination points are within the domain, then we can avoid checks to see if each individual line segment is within the domain.
We don't "project" by a factor of a two to find the boundary intersection, rather we project along the boundary to the nearest point. So really we just want to make sure that our cuts are "close" to the boundary, but we don't care about the segments being extended by a given fraction. This is relevant when the final two coordinates are close to each other and thus that segment is not projected very far, yet it is quite close to the boundary.
8b0b7eb
to
2dcd21d
Compare
This removes the GEOS dependency and pushes the geometry handling into Shapely. It currently takes ~2x longer, so opening this for discussion purposes.
ping: @QuLogic, @jorisvandenbossche
For simple benchmarking I've been making the
project_linear
benchmark executable for quick runs by adding this to the bottom:Then profiling