You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to convert an FGB to a CSV. This already works for a typical FGB, but I'd like to take advantage of the FGB format to save some space by skipping a features' empty properties.
I think solving this problem might have some more general purpose use in geozero.
Because an FGB's properties are prefixed with their column index, when a particular feature has no value for a column, you could choose to omit the column altogether, rather than spending 6 bytes just to say "no value for this column". I've made this change in a demo FGB feature branch here: https://github.com/michaelkirk/flatgeobuf/tree/mkirk/empty-fields.
In theory there's no problem writing this back out to another FGB or to a flexible format like geojson, but some other output formats need to know the schema up front, like csv (but maybe also gpx and shapefile, arrow?).
I think it can be broken down to a few cases:
It's irrelevant for geometry-only formats such as wkt and geo-types, so we don't need to worry about them.
Formats that support writing sparse properties could be serialized more succinctly, such as fgb, geojson by omitting empty values. Probably this should be a configurable option on the writer.
Formats that support constant time access to their schema, such as csv, fgb, (arrow? gpkg?) can be deserialized in one pass. Other formats do not support constant time access to their schema, like geojson. That means it's not currently possible to convert sparse geojson to something rigid like csv, because "new" columns might appear after already writing some CSV rows. An additional pass before writing to ascertain the schema could address this, but that has some drawbacks, and in any case, doesn't currently exist. (There's no guarantees about any geojson in the wild having regular columns anyway, so we're already facing that problem to a degree).
Reading from an fgb would call: dataset_begin(Some(name_from_header), Some(feature_schema_from_header)) whereas reading from geojson would call dataset_begin(None, None)
Note that this would mean introducing something like FGB's ColumnArgs and ColumnType to geozero.
Formats that require a rigid schema, like csv, could utilize that data in order to correctly "fill in the blanks" when reading features with sparse properties.
This definitely introduces some complexity into the library. Overall, I'm not sure if it's worth it. What do people think?
The text was updated successfully, but these errors were encountered:
I want to convert an FGB to a CSV. This already works for a typical FGB, but I'd like to take advantage of the FGB format to save some space by skipping a features' empty properties.
I think solving this problem might have some more general purpose use in geozero.
Because an FGB's properties are prefixed with their column index, when a particular feature has no value for a column, you could choose to omit the column altogether, rather than spending 6 bytes just to say "no value for this column". I've made this change in a demo FGB feature branch here: https://github.com/michaelkirk/flatgeobuf/tree/mkirk/empty-fields.
In theory there's no problem writing this back out to another FGB or to a flexible format like geojson, but some other output formats need to know the schema up front, like csv (but maybe also gpx and shapefile, arrow?).
I think it can be broken down to a few cases:
As for a potential step forward:
Reading from an fgb would call:
dataset_begin(Some(name_from_header), Some(feature_schema_from_header))
whereas reading from geojson would calldataset_begin(None, None)
Note that this would mean introducing something like FGB's
ColumnArgs
andColumnType
to geozero.Formats that require a rigid schema, like csv, could utilize that data in order to correctly "fill in the blanks" when reading features with sparse properties.
This definitely introduces some complexity into the library. Overall, I'm not sure if it's worth it. What do people think?
The text was updated successfully, but these errors were encountered: