-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default Precision Issues #135
Comments
@youngpm thanks for your input. I will say the default precision feature was added intentionally. https://tools.ietf.org/html/rfc7946#section-11.2 makes a strong argument for defaulting to 6 decimal places in GeoJSON and as the new project lead I decided to go along with it. I would prefer to keep it this way rather than change it to opt-in. As far as the version number, yes, this should have been a major version bump...I did not at first consider "identical to the old precision" program outputs to be under the auspices of backwards compatibility, but I see your point. I will be more careful with semantic versioning going forward. I foresee a v3 to be released soon; I hope that can mitigate the inconvenience. |
@sgillies I would be interested in your opinion, esp as you led the charge on the RFC! Would such a change make sense in fiona? |
@youngpm I don't normally consider a change of precision in string output of a floating point number to be a breaking change. Fiona provides a method to round output but doesn't round by default. It's not strict on RFC 7946. |
I guess what bothers me about this change is that you can find yourself in the situation where, by default, upon outputting a geometry the geometry can become invalid (e.g. introduce a crossing or some such). With the opt in style, you're at least aware that you're fiddling with the values. |
@youngpm this is a good point. I wish I had a good, cheap, all-purpose solution for this. If we always used a fixed precision model we can write out coordinates with no extra precision and no less precision. But if we're using floating point precision the best we can do is limit our precision to the smallest distance between nodes, right? And that's sorta expensive to compute. |
@sgillies Reminds me of this blog on GEOS 3.8 and their new "overlay" engine to handle tricky edge cases. |
@youngpm I want to make sure you understand the motivation. You may already know all this, but here we go. A 6 decimal place precision was chosen intentionally because of potential space savings. Not just the incremental space savings in a JSON string gained by reducing the size of numbers with greater than 6 decimal places. Rather, there's something very special about this threshold. Any lat/lon number with up to 6 decimal places of precision will fit in 32 bits. With any more, you need 64. These are known as "single" and "double"precision floats respectively. So, a number with 7 or more decimal places needs twice as much storage space as a number with 6 or less. This is all explained in depth in IEEE 754. Python tends to use doubles for storage more often than not, but speaking in terms of interoperability, any other system that implements single/double floats separately (from I hope that helps a bit; let us know. |
@rayrrr I get it, and I think it is great that you can now control the precision when you dump geometries. If you're just schlepping geometries across the network so you can view them, it makes a lot of sense to half the packet size. Let me give an example of why I don't like your new default behavior. Take a web mercator tile, say (x,y,z) = (0,0,1). If you calculate the bounds of the tile, here's what you'll see:
Now imagine I'm doing some geometry operations where i'm intersecting geometries against these web mercator tile boundaries. If I dump the outputs of my operations to disk with the geojson package and use the new truncating behavior, the serialized coordinates now have half the precision of what was in memory, by default. If I were to read back in the geometry and do subsequent geometry operations (say I use mercantile again to get double precision coordinates of web mercator tiles but am working with these truncated coordinates from what I read back in), I'll potentially get all sorts of weird slivers and whatnot. Computational geometry packages like GEOS (what shapley wraps) work in double precision; if one is using this package and unwittingly is getting values truncated that pass through it, you'll begin to encounter strange geometry issues (invalid geometries, geometry unions with sliver artifacts, etc). Python and JS don't support single precision data types (numpy aside), so the savings is only in disk storage and data movement; you'll still consume the same amount of memory. My point is that there's more to precision than saying 6 digits after the decimal is mm level accuracy; it still will cause problems for folks if they're actually computing against these geometries. This is how I came across the change you made; a test stopped passing because the serialized geometry became invalid with v2.5. |
@youngpm you make a compelling case. Thank you for the real-world example and for your persistence. Since interoperability was my initial motivation...I agree that interoperability with web mercator tiles is right up there with interoperability with single-precision float situations. And we live in an increasingly 64-bit world. Reopening this issue; planning to remove the 6 decimal place default in the next incremental release. |
One other potential consideration here is the ability to set precision at a higher level than the geometry object. When we ran into this issue here today, we were considering how to deal with it, but the problem is that our codebase creates geojson geometry objects on probably a couple of hundred different lines, and if we needed to go through and update every single one of those to include the increased precision needed for our application, it would be quite a lot of work. Perhaps there's a way to have a "default precision" that can be set at a module (file?) level or something, rather than (or in addition to?) each geometry object? Or maybe it makes sense to only enforce precision when actually encoding to a file, so that it could be an argument to |
Precision is not exposed at the from geojson import Feature
from shapely.geometry import Point
Feature(geometry=Point(0.43242423423, 0.3443443))
# {"geometry": {"coordinates": [0.432424, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"} Therefore, regardless of whether there is a default of 6 or not, there is no control over specifying a precision at this level. Would you consider exposing the precision at a Feature level or is it expected that geometry objects should be build based on type such as: from shapely.geometry import Point
from geojson.geometry import Point as geojsonPoint
point = Point(0.43242423423, 0.3443443)
geojsonPoint(point.coords, precision=10)
# {"coordinates": [[0.4324242342, 0.3443443]], "type": "Point"} Happy to open a new request if this is not the correct place. |
@aj-hitchins Thank you for pointing that out. I think we would all like to specify |
@rayrrr Thanks, I'll give it a go when I have some time. |
But how can I set precision when I'm reading a provided geojson? I mean, there is no "precision" argument on: |
@Rumpelstinsk My understanding is that in the current version here, this is not possible as this operation is also handled by the class method here: geojson.loads('{"geometry": {"precision": 10, "coordinates": [0.432424343243, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}') which maintains the point precision below:
Where as loading without the precision property will truncate the coordinates: geojson.loads('{"geometry": {"coordinates": [0.432424343243, 0.344344], "type": "Point"}, "properties": {}, "type": "Feature"}') Which leaves you with the default precision of 6:
Annoyingly, the precision can't be set at any other level at the moment (Please correct me if I'm wrong!), therefore each feature in your collection would need to have this property injected into the import json
string = '{"type": "FeatureCollection", "features": [{"geometry": {"coordinates": [0.432424343243, 0.34434444444], "type": "Point"}, "properties": {}, "type": "Feature"}]}'
data = json.loads(string)
for feature in data["features"]:
feature["geometry"]["precision"] = 10
encoder = geojson.GeoJSONEncoder()
geojson.loads(encoder.encode(data)) Which would give you:
Note that this returned data no longer has a precision property in the geometry. It depends on what you're doing with it afterwards, as you would already have it loaded into With regards to performance, I haven't tested this and am not sure how many times you would be performing this operation so I suggest you take that into account as well. |
@aj-hitchins Thanks for the response. I will test it layer, it seems a valid workarround at the moment. It would preferrable, as @rayrrr said, if I could set precision on loads, as the same way, we do on |
@aj-hitchins @Rumpelstinsk an alternative workaround is to monkey patch the default precision argument in the Geometry class using. import geojson
geojson.geometry.Geometry.__init__.__defaults__ = (None, False, 10) This affects all subsequent instantiation of Geometry, including its children like Point, and classes that use it like Feature and FeatureCollection |
Thanks for the workaround @meseta . I think i prefered that one. I will try it |
@meseta thank you for the monkey patch! It works for me and now I am not losing precision on exporting geometry from postgis any more. In my instance we are generating high resolution orthomosaic against cm-level GCPs and 6 digits of precision wasn't cutting it. |
I wanted the ability reduce the precision of geojson files for a web interface. I added a DEFAULT_PRECISION variable in geometry that gets assigned when the precision is not specified. It seems to do the trick of allowing precision to be set more broadly: Not sure if this would be helpful to pull but you are welcome to it. |
The behaviour of automatically rounding coordinates caused a lot of trouble. I did not expect that at all and I don't like this default behaviour at all. Here is an example of a polygon that gets invalid after parsing it with
which becomes this with default precision 6:
BigQuery through an error on the rounded one:
|
Nice idea. I was also surprised that you cannot configure the default precision. And I am really surprised that the is not one word about precision in the docs! |
I've suggested a fix for this, using Roger Lew's patch that he mentions above at PR #177 . It's currently stuck as I'm a new contributor, this is just a bump in case someone can take a look, thank you. |
For more details- jazzband/geojson#135 It should be removed in the next server update, by which time hopefully the new geojson version will incorporate the long-term fix See - jazzband/geojson#177 Contains fix for the issue: e-mission/e-mission-docs#856 (comment)
I understand the rationale with the default precision in v2.5 for those that want smaller geojson files, but I'm curious why the default isn't to behave as in previous versions? This change isn't backwards compatible in the sense that data read/written with 2.4 will come in different in 2.5, so at a minimum I suppose the version should have bumped to v3.
Is there opposition to changing the default precision to be backwards compatible and those that want it truncated can opt in?
The text was updated successfully, but these errors were encountered: