Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared_coords=True vs shared_coords=False #187

Closed
theroggy opened this issue Sep 4, 2022 · 5 comments
Closed

shared_coords=True vs shared_coords=False #187

theroggy opened this issue Sep 4, 2022 · 5 comments

Comments

@theroggy
Copy link
Contributor

theroggy commented Sep 4, 2022

As already briefly touched here: #179 (comment)

I was also wondering about the shared_coords=True code path... is there a realworld use case for it or was it only introduced for performance reasons?

In my opinion it's only there for performance reasons. So if the 'path-connected' approach is reasonably fast it can be the only one.

I'm not sure what the best way forward is for this?

@theroggy
Copy link
Contributor Author

theroggy commented Sep 4, 2022

Regarding the section about coverages in geos you mentioned:

 * A polygon is coverage-valid if:
 *
 *   * The polygon interior does not intersect the interior of other polygons.
 *   * If the polygon boundary intersects another polygon boundary, the vertices
 *     and line segments of the intersection match exactly.

As you probably know, gaps and slivers are a big issue in GIS data in general. As long as not all borders between polygons/features are explicitly matched to each other including by having matching vertices everywhere the features touch having a perfect topology is impossible due to rounding errors,... Just doing a conversion from one file format to another can create gaps because of a conversion from double to discrete numbers,... so data that seemed properly matched/snapped will change to gappy/slivery data.

So, even though the current implementations using intersections doesn't need all vertices to be there, once you get into more complex data the places where a point was snapped to the middle of a line without adding the snap-vertex to the neighbour, some of those cases won't be properly "topologized". The only structural way to get perfect results all the time and with all operations (this is not limited to creating a topology) is that data is perfectly matched...
So, if I understand correctly they (geos developers) are in first focussing on tools to cleanup the data... (which is something I've been hoping for to appear for a long time) and which is useful for many more cases than topology. Once the data is cleaned, it will be a lot easier to create topologies from it.

The data I'm working with at the moment, and that I've been using to test, is "happy day scenario" data. It is the result of a polygonize of raster data, so all intersections between data are perfectly matched: every segment is either perfectly horizontal or perfectly vertical, so no gaps and slivers in the data. Most data out there isn't like that though :-(...

@theroggy
Copy link
Contributor Author

theroggy commented Sep 4, 2022

So, as long as the shared_coords=True, is faster, an alternative approach could be to change the default from shared_coords=True to shared_coords=False, as this will give the best results for most datasets, but if the user is sure the data is already 100% cleaned/prepared (~coverage-valid), he can use shared_coords=False to get the bit of extra performance.

Mind: when I was adding tests, I first started by running them both using shared_coords=False and shared_coords=True. But, in the first 2 cases where I did this this resulted in what seemed to be a bug at first sight in the shared_coords=True path. So, I might be wrong, I was focused on shared_coords=False, but at first sight there are still some bugs there... that are best fixed if the option is kept alive.

@mattijn
Copy link
Owner

mattijn commented Sep 4, 2022

I'm fine with changing the default from shared_coords=True to shared_coords=False (edit: done by #190).

Did you play with the prequantize parameter? Its applying a type of normalisation on range, see docs: https://mattijn.github.io/topojson/example/settings-tuning.html#prequantize. It will improve the quality of the topology if the input geometry is messy.
Current default is 1e6, but I like to change the default to 1e5 (edit: done by #189). But one can play with this value (any integer number is valid) to see what is best for each situation.

@mattijn
Copy link
Owner

mattijn commented Sep 4, 2022

this resulted in what seemed to be a bug at first sight in the shared_coords=True path

Also observed a, what seems like, bug with shared_coords=True (new default in master is shared_coords=False):

import geopandas
from topojson import Topology
nybb_path = geopandas.datasets.get_path("nybb")
data = geopandas.read_file(nybb_path)
topo = Topology(
    data=data, prequantize=200, shared_coords=True
)
topo.to_alt()

image

There should be a line string near the orange arrow.

@theroggy
Copy link
Contributor Author

theroggy commented Sep 4, 2022

Did you play with the prequantize parameter? Its applying a type of normalisation on range, see docs: https://mattijn.github.io/topojson/example/settings-tuning.html#prequantize. It will improve the quality of the topology if the input geometry is messy. Current default is 1e6, but I like to change the default to 1e5 (edit: done by #189). But one can play with this value (any integer number is valid) to see what is best for each situation.

No I haven't. I turned it off because the data I've used till now didn't need any cleaning. So I don't have any opionion on what a good value would be...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants