Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fromdate should support timezone offsets #1053

Open
CFoltin opened this issue Dec 24, 2015 · 23 comments
Open

fromdate should support timezone offsets #1053

CFoltin opened this issue Dec 24, 2015 · 23 comments

Comments

@CFoltin
Copy link

CFoltin commented Dec 24, 2015

Hi,

parsing the following command:
"2015-12-24T09:29:30+01:00" | fromdate
gives the error
jq: error (at :49): date "2015-12-24T09:29:30+01:00" does not match format "%Y-%m-%dT%H:%M:%SZ"
although it seems pretty ok for. Only timezone "Z" seems to work.

BR, Chris from FreeMind

@nicowilliams nicowilliams changed the title fromdate doesn't work with timezones fromdate should support timezone offsets Dec 28, 2015
@nicowilliams
Copy link
Contributor

Supporting timezones in general is problematic. Currently we use the C library for dealing with parsing and formatting datetime strings. The more features of the C library we use, the harder it will be to stop using it later. Also, the C library's datetime capabilities vary quite a bit from one version/platform to another, so the more features we use, the more issues we can expect users to file that are ultimately caused by C library issues. Limiting datetime strings to UTC ISO8601 was an explicit choice to limit this pain. Adding timezone offset support when parsing is doable though, and probably worth the effort. I'm much less interested in adding timezone offsets when formatting datetime strings.

@CFoltin
Copy link
Author

CFoltin commented Jan 2, 2016

Hi,

thanks for the replay. AFAIK, the format I put to be parsed is a valid ISO8601 date. I've got the impression, that only the "timezone" 'Z' is working at jq, or could you give me an example of a different zone being parsed?

BR, Chris

@nicowilliams
Copy link
Contributor

nicowilliams commented Jan 3, 2016 via email

@nicowilliams
Copy link
Contributor

One of the nice things about dates being in UTC is that then dates can be compared and ordered lexicographically. So I'm inclined to only support timezone offsets in fromdate.

Incidentally, you can use strptime() and the %z format specifier to parse ISO8061 timezones. Is that good enough for you?

@CFoltin
Copy link
Author

CFoltin commented Jan 12, 2016

Hi,
yes, an alternative would be ok.
I tried
"2015-12-24T09:29:30+01:00" | strptime("%Y-%m-%dT%H:%M:%SZ"), but same error.
Do you have an example at hand?
BR, Chris

@nicowilliams
Copy link
Contributor

@CFoltin You should use "2015-12-24T09:29:30+01:00" | strptime("%Y-%m-%dT%H:%M:%S%z"), but note that glibc doesn't seem to support timezone offsets very well :(

(And, of course, %z is a glibc extension to the POSIX standard, so we can't rely on it.)

I see that glibc doesn't like a : in the tz offset, and it doesn't seem to adjust the result by the tz offset. I'm not inclined to work around this glibc bug, nor to implement timezone offsets in jq proper at this time, but I'll leave this open, and if someone submits a PR, we'll consider it!

@aliekens
Copy link

aliekens commented Mar 1, 2016

+1 for having timezone support in fromdate, even if that means working with UTC (or the computer's locale) internally and as todate output timezone.

Although dates aren't in the JSON standard, I see a bunch of use cases for jq processing. My current use case is to use group_by to organize objects in day or hour bins.

All my incoming JSON dates include the %z timezone format (Ruby's datetime) and I cannot change these inputs. I'm pondering if dates can be scanned and timezone-converted using sed or awk, before piping the JSON objects to jq, but that will be an aweful hack to work with.

@ghost
Copy link

ghost commented Mar 1, 2016

@aliekens About your specific use case, since jq 1.5 has support for regular expressions and string substitutions and replacements, you can use this to rearrange the string so that the timezone appears where it can be parsed.

@aliekens
Copy link

aliekens commented Mar 2, 2016

Yay, I have figured out a way to correctly parse datetimes with timezones in jq, but it requires a bit of hacking. Here's how I can now parse my (Ruby's) datetimes:

$ TZ=/usr/share/zoneinfo/UTC jq -n '"2015-12-24T09:29:30+00:00" | sub("(?<before>.*):"; .before ) | strptime("%Y-%m-%dT%H:%M:%S%z") | todate'
"2015-12-24T09:29:30Z"
$ TZ=/usr/share/zoneinfo/UTC jq -n '"2015-12-24T09:29:30+01:00" | sub("(?<before>.*):"; .before ) | strptime("%Y-%m-%dT%H:%M:%S%z") | todate'
"2015-12-24T08:29:30Z"

Some notes:

  • The example above is on a Mac. Behavior may be different in other environments because of C library differences. For example, jqplay (which runs on what platform?) always returns the same datetime, independent of the timezone.
  • It is important to set jq's environment's timezone to UTC with TZ=/usr/share/zoneinfo/UTC or strptime will assume your computer locale's timezone is UTC (not good if your environment's locale is not UTC)
  • strptime's %z format does not support "+01:00" timezones, it needs to be formatted as "+0100" (some implementations of strptime have a %: flag to support timezones with a colon). The last colon in the string is therefore subbed using a regex. (BTW, the docs need info or an example on how to use named captures)

@pkoppstein
Copy link
Contributor

(BTW, the docs need info or an example on how to use named captures)

In the meantime, the FAQ has a question: Q: How are named capture variables used?

@mauricioprado00
Copy link

I have tested aliekens in jq-1.5-1-a5b5cbe, and I get every time the same output:
"2015-12-24T09:29:30Z"

@lleeoo
Copy link

lleeoo commented Oct 28, 2019

Another workaround, in case your gmt offset is already a float (remember to reverse the offset to get UTC):

echo '{"date": "2015-03-06T04:21:47Z", "offset": 6.5}' \
| jq '(.date | fromdate) - 3600 * .offset | todate'
"2015-03-05T22:51:47Z"

@erhhung
Copy link

erhhung commented Jan 30, 2020

Ugly manual parsing, but, hey, it works:

echo '{"date":"2020-01-30T02:35:20-08:00"}' | \
  jq 'def parseDate(date): date | capture("(?<no_tz>.*)(?<tz_sgn>[-+])(?<tz_hr>\\d{2}):(?<tz_min>\\d{2})$") | (.no_tz + "Z" | fromdateiso8601) - (.tz_sgn + "60" | tonumber) * ((.tz_hr | tonumber) * 60 + (.tz_min | tonumber)); parseDate(.date)'

@adam-azarchs
Copy link

adam-azarchs commented Aug 18, 2022

My main use case for this is consuming timestamps in json produced by go, which will (without going to some extra work) serialize times as e.g. 2022-08-17T22:59:45.157237491-07:00. Which is to say RFC 3339 format, which which amounts to a subset of ISO-8601 (with a couple of exception which need to be supported in practice). I suspect there's lots of similar use cases.

It's entirely reasonable to only support UTC for the "broken down" datetime representation, but like it or not there's a lot of json data out there which is stored with a zone offset. There is also no need to support parsing named time zones (which are not permitted by either RFC3339 or ISO-8601). It's misleading to claim to support iso8601 date format if you don't support data in formats which conform to the spec and are produced by the standard libraries of common programming languages.

@erhhung's solution is a reasonable one. I'd extend it a little to

  1. Handle all time zone formats in the spec, including Z, ±hh:mm, ±hhmm, and ±hh timezone styles.
  2. Handle fractional seconds (which are not handled by at least most glibc implementations)
capture("(?<no_tz>[^.]*)(?<frac_sec>\\.\\d+)?(?:(?:(?<tz_sgn>[-+])(?<tz_hr>\\d{2}):?(?<tz_min>\\d{2})?)|Z)$") |
(.no_tz + "Z" | fromdateiso8601)
+ ("0"+.frac_sec | tonumber)
- (.tz_sgn + "60" | tonumber)
* ((.tz_hr // "0" | tonumber) * 60 + (.tz_min // "0" | tonumber))

That is,

$ echo '"2020-01-30T02:35:20.001-08:00"
"2020-01-30T02:35:20Z"
"2020-01-30T02:35:20+0330"
"2020-01-30T02:35:20+03"
' | jq 'capture("(?<no_tz>[^.]*)(?<frac_sec>\\.\\d+)?(?:(?:(?<tz_sgn>[-+])(?<tz_hr>\\d{2}):?(?<tz_min>\\d{2})?)|Z)$") | (.no_tz + "Z" | fromdateiso8601) + ("0"+.frac_sec |tonumber) - (.tz_sgn + "60" | tonumber) * ((.tz_hr // "0" | tonumber) * 60 + (.tz_min // "0" | tonumber))|todate'
"2020-01-30T10:35:20Z"
"2020-01-30T02:35:20Z"
"2020-01-29T23:05:20Z"
"2020-01-29T23:35:20Z"

Note that the above specifically does not handle "implicitly local time" dates (those lacking either Z or a time zone suffix), which are allowed by the spec but for which actually supporting them would probably lead to incorrect results in most use cases. It will permit -00:00 (and interpret it as equivalent to UTC), which is illegal in ISO-8601 but is used to indicate local time in RFC3339.

Building that data massaging into fromdateiso8601 so that it can support all RFC3339 dates (or at least all of those which are also valid in ISO-8601) would make the function considerably more useful in many real-world use cases.

@bartekus
Copy link

bartekus commented Sep 9, 2022

For a workaround, please check #1117

@fatso83
Copy link

fatso83 commented Aug 29, 2023

There are external small libraries in C that specifically do ISO8601 parsing. https://github.com/chansen/c-dt has great test suite and no dependencies AFAIK.

Would a PR building on that be accepted? Currently JQ hardly supports ISO8601 at all, as it does not support a single line in the following list of valid ISO8601 strings:

20121224
2012-12-24 23:59:59
2012-12-24T00:00:00+00:00
2012359
2012359T235959+0130
2012-359
2012W521
2012-W52-1
2012Q485
2012-Q4-85
0001-Q1-01

Building on @chansen's lib would change that.

@nicowilliams
Copy link
Contributor

: ; jq -cnr '"2015-12-24T09:29:30+01:00" | strptime("%Y-%m-%dT%H:%M:%S%z")|todate'
2015-12-24T09:29:30Z

On platforms where %z is supported by strptime()/strftime() this works today.

But not all platforms have equally good C time function support, so, yes, @fatso83, I think we'd consider a replacement of the C library's time functions with something like @chansen's c-dt. However, we should make sure that that library handles the tzdata database on Unix, Linux, and Windows first.

@nicowilliams
Copy link
Contributor

Are people asking for fromdate to be flexible in the formats it parses?

@fatso83
Copy link

fatso83 commented Aug 29, 2023

I don't know about others and the general function for date parsing, but I assumed a function that indicated parsing iso8601 would do just that, so support for tz in strings - without involved workarounds - is much wanted 😃

@adam-azarchs
Copy link

That would be nice, but mainly I think primarily people just want fromdateiso8601 to be able to handle at least the subset of ISO-8601 date strings which are also compliant with RFC3339.

I don't really see how tzdata enters into this, given that ISO-8601 doesn't allow named time zones, and RFC3339 does not allow unqualified local time. It only allows numeric offsets, which don't require a database lookup. And in general most use cases (e.g. sorting or filtering relative to some timestamp) don't even require that it keeps track of the offset after parsing.

@nicowilliams
Copy link
Contributor

nicowilliams commented Aug 29, 2023

@fatso83 @adam-azarchs try this:

def fromdateiso8601: first(strptime("%Y-%m-%dT%H:%M:%S%z")?,strptime("%Y-%m-%dT%H:%M:%SZ"))|timegm;

The Z in UTC times can be lower case, but on Linux at least strptime() doesn't handle that. And for %z the : between hours and minutes in the offset is optional on Linux.

@danielhoherd
Copy link

@nicowilliams

Linux

$ uname -a
Linux litten 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
$ jq --version
jq-1.6
$ jq '. | first(strptime("%Y-%m-%dT%H:%M:%S%z")?,strptime("%Y-%m-%dT%H:%M:%SZ"))' <<< '"2023-03-31T17:15:00+00:00"'
[
  2023,
  2,
  31,
  17,
  15,
  0,
  5,
  89
]
jq: error (at <stdin>:1): date "2023-03-31T17:15:00+00:00" does not match format "%Y-%m-%dT%H:%M:%SZ"
$ jq-1.7.1 --version
jq-1.7.1
$ jq-1.7.1 '. | first(strptime("%Y-%m-%dT%H:%M:%S%z")?,strptime("%Y-%m-%dT%H:%M:%SZ"))' <<< '"2023-03-31T17:15:00+00:00"'
[
  2023,
  2,
  31,
  17,
  15,
  0,
  5,
  89
]

macOS

$ sw_vers
ProductName:		macOS
ProductVersion:		14.2.1
BuildVersion:		23C71
$ jq --version
jq-1.7.1
$ jq '. | first(strptime("%Y-%m-%dT%H:%M:%S%z")?,strptime("%Y-%m-%dT%H:%M:%SZ"))' <<< '"2023-03-31T17:15:00+00:00"'
jq: error (at <stdin>:1): date "2023-03-31T17:15:00+00:00" does not match format "%Y-%m-%dT%H:%M:%SZ"

@tst2005
Copy link

tst2005 commented Feb 8, 2024

Hello,

I meet an unsupported date format...
The format has millisecond and timezone that is not supported by fromdateiso8601

The format is "YYYY-MM-DDTHH:MM:SS.xxx[+-]HHMM". See example below.

I made a fromdateiso8601gmt function to support it:

def fromdateiso8601gmt:
        scan("^(....)-(..)-(..)T(..):(..):(..)(\\.?[0-9]*)([Z+-])(.?.?)(.?.?)$")|
        (.[6]) as $ms|
        if .[7] == "Z" then
                "\(.[0])-\(.[1])-\(.[2])T\(.[3]):\(.[4]):\(.[5])Z"
                |fromdateiso8601
        else 
                ((.[7]+.[8]|tonumber*3600)+(.[7]+.[9]|tonumber*60)) as $offset
                |"\(.[0])-\(.[1])-\(.[2])T\(.[3]):\(.[4]):\(.[5])Z"
                |fromdateiso8601-$offset
        end| .+("0\($ms)"|tonumber)
;

That is far to be optimal, but it is better than nothing.

$ echo '
"2024-02-06T16:53:19.1234Z"
"2024-02-06T17:53:19.1234+0100"
"2024-02-06T14:53:19.1234-0200"
"2024-02-05T23:53:19.123Z"
"2024-02-06T00:53:19.123+0100"
"2024-02-06T01:53:19.123+0200"
' | jq "$def_fromdateiso8601gmt"'fromdateiso8601gmt'
1707238399.1234
1707238399.1234
1707238399.1234
1707177199.123
1707177199.123
1707177199.123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests