Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize MultiPolygon Array of Empty Coordinates Array #140

Conversation

dickinsonm
Copy link
Contributor

@dickinsonm dickinsonm commented Mar 25, 2024

gdal-bin ogr2ogr outputs multipolygons with empty geometries in this format:

{ "type" : "MultiPolygon", "coordinates": [ [ ] ] }

NetTopologySuite throws an exception when encountering this data:

System.InvalidOperationException : Cannot get the value of a token type 'EndArray' as a string.

If the first two tokens are start array, then the next token may be an end array. If that token is end array, then the next token after it must also be an end array. Those tokens should result in a multi polygon with an empty polygon.

@CLAassistant
Copy link

CLAassistant commented Mar 25, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@airbreather airbreather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've identified a valid bug, but this is not a valid fix for it.

{ "type" : "MultiPolygon", "coordinates": [ [ ] ] } does not encode an empty MultiPolygon, but rather a MultiPolygon that contains a single Polygon which, itself, is empty.

This will need more work to fix.

@dickinsonm dickinsonm requested a review from airbreather April 30, 2024 06:17
Copy link
Member

@airbreather airbreather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The middle part of this comment is going to sound really grumpy, so I want to get ahead of that by saying yes I will merge this very shortly, and thanks for bringing it to our attention; I just want to do some bookkeeping first so that the underlying issue can be completely fixed.


I've been carefully reading the spec to try to make sense of this. According to my reading of it:

  • [[]] is a legal "coordinates" array for a MultiPolygon because [] is a legal "Polygon coordinate array", because it's an empty array of "linear ring coordinate arrays" which, per §3.1.6, have no defined lower bound on their lengths (a mistake, IMO, but even if most of the world agrees, it's not a mistake that can realistically be fixed at this point). Therefore, [[], [], []] should also be legal, as well as [[], [], [[[0, 0], [1, 0], [0, 1], [0, 0]]], []] and [[[[0, 0], [1, 0], [0, 1], [0, 0]]], []].
  • I am comfortable with the implicit assertion that [[]] continues not to be a legal "coordinates" array for any other type of object: in all other cases, the nested arrays have defined lower-bound lengths greater than zero, and the explicit allowance for empty arrays listed in §3.1 only seems to be written in a way where it would apply to the top-level member.

Given that, this PR still does not completely address the issue that it brought to our attention: it only exactly accepts [[]], but not the other forms that are legal according to the same reading of the spec.


With that out of the way, I want to repeat, thanks for bringing this to our attention! I'm going to open an issue and associate it with this PR so that it can be merged and linked to the specific problem that it partially solves, but then I'll leave that other issue open so that I can continue to fix the rest of it after this gets merged. I don't want to leave this hanging, nor do I want to ask you to untangle this spaghetti that I shoved in here.

@airbreather airbreather merged commit 76cdb7d into NetTopologySuite:develop May 1, 2024
6 checks passed
@dickinsonm
Copy link
Contributor Author

Thank you for merging this minimal fix for my use case. You also did an excellent job writing out some of the things I was thinking about related to parsing these. I wanted to assume we could know the type before we parsed the coordinates. While every example in RFC7946 has the type field appearing first, I did not find an explicit requirement that type come before coordinates. I agree with your reading of the spec regarding both your first recommended change and also the need to parse the other test cases you mentioned. I hope I can find some time to help further and greatly appreciate your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants