-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrowStream interface: support for timezones #8460
Comments
Probably not directly related, but I also noticed that the equivalent GPKG file (
|
Datetime in GeoPackage must be in UTC by the standard, I guess that's the reason for the warning.
Have a look at the DATETIME_FORMAT in https://gdal.org/drivers/vector/gpkg.html#dataset-creation-options. Could there be something re-usable? |
OK, that's interesting. But in this case, it's GDAL itself that created the file, so if the spec requires UTC, shouldn't GDAL write that? (by converting any other offset to UTC) Or is it expected that the user knows to convert first to UTC before writing to GPKG? |
GDAL kind of uses an informal extension to write non-UTC date times, if the user provides them as such. If a user wants to produce a fully compliant file, he can pass DATETIME_FORMAT=UTC as a creation option and GDAL will do the conversion. Anyway that's not directly related to that issue, which I'm working on |
@jratike80 I see your added second paragraph now. And indeed, the EDIT: and I too slow with typing, in the meantime Even answered more or less the same ;) |
…cessary conversions (fixes OSGeo#8460)
with the fix in #8461, I now get:
GeoJSON is the only driver that does a full scan of the file to establish the set of fields on opening, hence the timezone can be known. |
I would rather say that it defaults to writing datetimes as they appear in the source data. I think that if the default was automatic conversion into UTC, then it would make many users confused and unhappy because if the time is around midnight then the date can change by the same. |
…cessary conversions (fixes OSGeo#8460)
…cessary conversions (fixes OSGeo#8460)
…cessary conversions (fixes OSGeo#8460)
@rouault thanks for that! The PR looks good.
In general, additional timezone support in the arrowstream output (as you did now for the mentioned formats) is something that would need to be added per driver? (although I don't know if there are many others that do support timezone in the format, Shapefile don't even support datetimes AFAIK)
Although I assume (from reading the code, didn't try it out, so might be wrong) that a timestamp field in such a file that would eg have a timezone of "Europe/Brussels" will be set to OGR_TZFLAG_UNKNOWN in the internal data model? And thus come out as the original value as stored. |
I think that I now understand what @jorisvandenbossche means with the timezones. Obviously they are the entries in the tz database https://en.wikipedia.org/wiki/List_of_tz_database_time_zones. I had been thinking that the offsets like +02:00 mean the same as timezones but that would have been too easy. If GDAL would support the tz timezones, doesn't it mean that it should support them properly, so that the Daylight saving times would are also handled correctly? The offset to UTC in Europe/Brussels times can be either +01:00 or +02:00 depending on the day of year. I do not know but I fear that they depend also on the year, because the rules for DST may have changed. It may be too late to change anything, but isn't it so, that GDAL does not know anything about the tz timezones? And the best that GDAL can do is to deal with the UTC Offsets? So maybe it would be more exact to use that term in the documentation as well: |
…cessary conversions (fixes OSGeo#8460)
yes, drivers can either set OGRFieldDefn::SetTZFlag(), or override manually the TIMESTAMP GetArrowStream() option. I've also declare it as public option
I've just done that in a new commit
Not more than (fixed) offsets to UTC. It has no knowledged of things like Europe/Brussels and DST. |
…cessary conversions (fixes OSGeo#8460)
Expected behavior and actual behavior.
Assume a geojson file
test_datetime_tz.geojson
with content:and reading this with the ArrowStream interface:
We get tz-naive timestamp values in the "local" time (wall time), not the UTC-equivalent values (after applying the offset). While the "traditional" field based access with
OGR_F_GetFieldAsDateTimeEx
will give you timezone information.It's not necessarily clear what would be the best behaviour here. GDAL doesn't actually store or know the time "zone" (like
Europe/Brussels
), only an offset per value (which is not necessarily constant for the full column). And thus it cannot return an Arrow timestamp type with a zone astz
.It could return a timestamp with a fixed offset (but that's only possible if all offsets are the same, which you only know when parsing the last value, and so this would require a first pass to check this), or return
timestamp[ms, tz=UTC]
type (this doesn't preserve the wall time, but it does preserve the actual point in time). Although I can imagine that it depends on the use case whether you would prefer getting wall-time tz-naive timestamps or tz-aware UTC timestamps.GDAL version and provenance
GDAL 3.7.2 installed with conda forge
The text was updated successfully, but these errors were encountered: