Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support GeoJSON lines format #651

Merged
merged 6 commits into from
Jan 23, 2024
Merged

support GeoJSON lines format #651

merged 6 commits into from
Jan 23, 2024

Conversation

cldellow
Copy link
Contributor

Described at https://stevage.github.io/ndgeojson/; each feature is given its own line, rather than being wrapped in a FeatureCollection.

This format is slightly easier to generate and plays a bit nicer with source control diffs. It's also possible to parallelize the parsing of such a file, although I haven't tried to do that in this PR.

The Daylight Map Distribution uses this format for their lake centrelines: https://daylightmap.org/2023/03/20/water-labels.html

I've also got a not-ready-for-primetime branch of tilemaker that exposes a GeoJSON() function to Lua, which I'm using to generate .geojsonl files for further processing (e.g., to use on subsequent runs as inputs to tilemaker's Intersects functions, or to use as input when deriving an SQLite autosuggest database)

Described at https://stevage.github.io/ndgeojson/; each feature is given
its own line, rather than being wrapped in a FeatureCollection.
@systemed
Copy link
Owner

Neat idea!

It's also possible to parallelize the parsing of such a file, although I haven't tried to do that in this PR.

I guess the major win would be moving the boost::asio::post into the file reading loop. That way we wouldn't need to construct the deque of JSON docs, which will be pretty trivial in the lake centreline case (small file) but could be useful in cases of very large files.

Maybe worth detecting .(geo)jsonseq extensions too? That appears to be what's being used by the Daylight file and is in https://datatracker.ietf.org/doc/html/rfc7464.

For a ~300MB geojson file of mine, this decreases wall clock time from
2.5s to 1s.
@cldellow
Copy link
Contributor Author

Good feedback! Added support for files ending in jsonseq and parallelized the reading.

A possible area for future improvement: be able to prune prior to building the MultiPolygon and correcting it. I think it'd be safe to create a naive Box using the largest/smallest lat/lons to do a test to definitively rule out (but not rule in) whether a geometry should be loaded.

@cldellow
Copy link
Contributor Author

Reading RFC 7464 and RFC 8142 a bit more closely, they also contemplate some behaviours that this PR does not currently support:

  1. use ASCII record separator (RS, 0x1e) as a mandatory separator
  2. permit internal newlines (e.g. when the JSON has been pretty-printed)
  3. support invalid JSON entries by ignoring them (vs failing)

I am skeptical that (2) and (3) are desirable in practice. (2) implies a tripling of file size and (3) implies you're fine ignoring corrupt data. (1) seems reasonable, but I wonder if it's actually used in practice, or if the path of least resistance tends to lead folks to newlines anyway.

I don't have a lot of context about how the geo world uses this file format. Looking at the Daylight GeoJSON file, it does not use the RS character.

tl;dr - this PR isn't in full compliance with the RFC, but that's probably OK? Can always revisit later if it turns out people actually use those features.

@systemed systemed merged commit 3af62f6 into systemed:master Jan 23, 2024
5 checks passed
@systemed
Copy link
Owner

tl;dr - this PR isn't in full compliance with the RFC, but that's probably OK? Can always revisit later if it turns out people actually use those features.

I'd be surprised if there's more than half a dozen GeoJSON files in existence using RS as a record separator!

That's perfect - thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants