-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support GeoJSON lines format #651
Conversation
Described at https://stevage.github.io/ndgeojson/; each feature is given its own line, rather than being wrapped in a FeatureCollection.
Neat idea!
I guess the major win would be moving the Maybe worth detecting .(geo)jsonseq extensions too? That appears to be what's being used by the Daylight file and is in https://datatracker.ietf.org/doc/html/rfc7464. |
For a ~300MB geojson file of mine, this decreases wall clock time from 2.5s to 1s.
Good feedback! Added support for files ending in jsonseq and parallelized the reading. A possible area for future improvement: be able to prune prior to building the MultiPolygon and correcting it. I think it'd be safe to create a naive |
Reading RFC 7464 and RFC 8142 a bit more closely, they also contemplate some behaviours that this PR does not currently support:
I am skeptical that (2) and (3) are desirable in practice. (2) implies a tripling of file size and (3) implies you're fine ignoring corrupt data. (1) seems reasonable, but I wonder if it's actually used in practice, or if the path of least resistance tends to lead folks to newlines anyway. I don't have a lot of context about how the geo world uses this file format. Looking at the Daylight GeoJSON file, it does not use the RS character. tl;dr - this PR isn't in full compliance with the RFC, but that's probably OK? Can always revisit later if it turns out people actually use those features. |
I'd be surprised if there's more than half a dozen GeoJSON files in existence using RS as a record separator! That's perfect - thank you. |
Described at https://stevage.github.io/ndgeojson/; each feature is given its own line, rather than being wrapped in a FeatureCollection.
This format is slightly easier to generate and plays a bit nicer with source control diffs. It's also possible to parallelize the parsing of such a file, although I haven't tried to do that in this PR.
The Daylight Map Distribution uses this format for their lake centrelines: https://daylightmap.org/2023/03/20/water-labels.html
I've also got a not-ready-for-primetime branch of tilemaker that exposes a
GeoJSON()
function to Lua, which I'm using to generate.geojsonl
files for further processing (e.g., to use on subsequent runs as inputs to tilemaker'sIntersects
functions, or to use as input when deriving an SQLite autosuggest database)