Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge geo-types and geo crate #1186

Open
urschrei opened this issue May 19, 2024 · 5 comments
Open

Merge geo-types and geo crate #1186

urschrei opened this issue May 19, 2024 · 5 comments

Comments

@urschrei
Copy link
Member

Splitting the types and algorithms into separate crates made sense at one point, but is now an ongoing source of annoyance, bad DX, technical debt, and frustrating papercuts for crate users and geo's developers because implementing methods from geo on geo-types structs in third-party crates such as rstar is not possible due to circular dependencies. This has led to the proliferation of things like secret modules and the complete impossibility of implementing full spatial indexing on e.g. Polygons: #984.

I've thought about it, and I've concluded that we have no evidence that there are users of geo-types who don't also use geo, and that the proliferating need for spatial indexing in geo (for prepared geometries, unary unions to name but a few) is important to the crate and ecosystem's ongoing growth.

I propose putting a plan and schedule in place to merge the crates as a priority.

@michaelkirk
Copy link
Member

michaelkirk commented May 20, 2024

I've thought about this too. It might be a good idea, but it would be a pretty big change, so I wanted to flesh out the pros and cons a bit more.

The axiom is that it's useful to provide a way for geospatial crates like https://github.com/tmontaigu/shapefile-rs/blob/master/Cargo.toml and https://github.com/georust/geojson to interop with eachother and the algorithms in geo.

So I'm assuming we still see value in providing interop, but it would now live in geo instead of geo_types.

Cons against merging geo/geo-types

I see two primary benefits to maintaining the current separation of geo-types and geo (cons against merging).

Con 1. currently minimal deps are inherited by third party crates

Keeping geo-types minimal encourages crate developers to integrate with it, without greatly increasing their own dependency chain (and thus their user's build times and binary sizes).

Mitigation proposal: Put all the algorithms behind a new --features="algorithms" and have --no-default-features build only the types. Encourage third party crate integrators to use --no-default-features when possible.

Con 2. currently semver is relatively stable

Keeping geo-types relatively sem-ver stable keeps crate authors from having to spend too much time on the upgrade treadmill. If we merged the two, geo-types would break as often as geo.

For context, the last breaking release of geo-types was Jan 2021. We've had 11 breaking release of geo in that timeframe. That metric is a bit exaggerated. I think because we have geo-types separated, we are a bit fearless about breaking geo semver. We could be a bit more conservative if we saw value in it, but I think the point stands. (Personally, this is my biggest concern with merging the two.)

I've concluded that we have no evidence that there are users of geo-types who don't also use geo,

There are definitely plenty of libraries that use geo-types without using geo (geozero, geojson, wkt, shapefile-rs), but I think what you mean is, at the end of the day, users of those libraries are all (or mostly all) using geo somewhere else in their codebase, so there's no net reduction in dependencies for the end user. Is that right?

I'd wager that's true for most users. I'm not sure how to measure the significance of that remainder though.

Pros of merging geo/geo-types

  1. third party integration's that rely on algorithms get easier. This means we can build better features like rstar integration that just works for folks.
  2. I guess release mechanics would be simpler, but honestly this isn't a big burden for me as a maintainer. And I don't think it affects end users much.

Alternatives Considered

new type wrappers in geo

Keep the separate crates, and add new type wrappers for geo_type::geometries into geo. Then we could easily implement the traits (e.g. rstar) on the new types in geo with access to the algorithms. The problem though is we now have geo::Point and geo_types::Point which are not actually the same thing and will likely lead to hard to understand compilation errors.

"secret" modules

We can continue to move shared algorithm code needed for third party trait integration from geo into geo_types::private_utils, but it starts to sabotage the purported benefits of the separation.

@michaelkirk
Copy link
Member

michaelkirk commented May 20, 2024

Wild idea that would maintain the current low number of breaking releases for crates that only need the types for interop and don't care about the algorithms:

What if we:

  1. moved the geo_types geometry definitions into geo as the issue proposes.
  2. Put algorithms behind a new algorithms feature (enabled by default), such that cargo build geo --no-default-features builds only the types (no algorithms).
  3. Made the geo_types crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.

Then I think we could continue to freely break the semver for geo, and even update geo-types's geo dependency to the latest geo release, but we'd only need to actually break geo-types semver if we actually change the geometry format, since that's the only thing it re-exports.

I haven't thought about it too hard... but it seems like it'd address my concern about keeping a relatively stable interop format while allowing third party integrations to take advantage of algorithms.

@urschrei
Copy link
Member Author

urschrei commented May 20, 2024

Mitigation proposal: Put all the algorithms behind a new --features="algorithms" and have --no-default-features build only the types. Encourage third party crate integrators to use --no-default-features when possible.

This is a great idea!

For context, the last breaking release of geo-types was Jan 2021. We've had 11 breaking release of geo in that timeframe. That metric is a bit exaggerated. I think because we have geo-types separated, we are a bit fearless about breaking geo semver. We could be a bit more conservative if we saw value in it, but I think the point stands. (Personally, this is my biggest concern with merging the two.)

These are semver-breaking, but in practice only require a small amount of work (if any) to upgrade (I realise that's more than "none", but I think it's important to note)

There are definitely plenty of libraries that use geo-types without using geo (geozero, geojson, wkt, shapefile-rs), but I think what you mean is, at the end of the day, users of those libraries are all (or mostly all) using geo somewhere else in their codebase, so there's no net reduction in dependencies for the end user. Is that right?

Yep, I meant that there are very few libraries that only use -types.

Keep the separate crates, and add new type wrappers for geo_type::geometries into geo. Then we could easily implement the traits (e.g. rstar) on the new types in geo with access to the algorithms. The problem though is we now have geo::Point and geo_types::Point which are not actually the same thing and will likely lead to hard to understand comilation errors.

I think this would be a big source of annoyance for crate consumers – in my / our experience this kind of incompatibility trips people up often, and the compiler error messages don't help you to fix it. Of course it's easy for us to point people in the right direction, but that means they have to tell us, someone has to look at the code etc.

We can continue to move shared algorithm code needed for third party trait integration from geo into geo_types::private_utils, but it starts to sabotage the purported benefits of the separation.

One of the reasons I opened this issue is that I don't think we can, particularly in the case of the various euclidean distance measures. There's way too much scaffolding code that makes it work; I think we've hit the limit of what can be practically duplicated, and the duplication we have now is already too extensive in my opinion – you have to know the geo codebase well to even know that it exists, which is a barrier to people who want to help with any features that are implemented on geo-types geometries but require geo code.

@urschrei
Copy link
Member Author

Made the geo_types crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.

This would be amazing if we could make it work!

@michaelkirk
Copy link
Member

These are semver-breaking, but in practice only require a small amount of work (if any) to upgrade (I realise that's more than "none", but I think it's important to note)

Agreed, the changes required are typically trival. I think for actively maintained libraries, it's a rounding error.

My bigger concern is for the long tail of less actively developed libraries that we'll be fracturing off from the ecosystem every time we do a geo-types release. Multiple times a year is a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants