Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: STRING to DATE conversion uses bogus logic #27500

Closed
knz opened this issue Jul 13, 2018 · 1 comment · Fixed by #31758
Closed

sql: STRING to DATE conversion uses bogus logic #27500

knz opened this issue Jul 13, 2018 · 1 comment · Fixed by #31758
Assignees
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@knz
Copy link
Contributor

knz commented Jul 13, 2018

The code currently piggy-backs the conversion from STRING to TIMESTAMP to parse dates.

This is 1) inefficient 2) incorrect; pg requires us to support more formats, see

https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-DATE-TABLE

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-pgcompat Semantic compatibility with PostgreSQL labels Jul 13, 2018
@bobvawter bobvawter self-assigned this Oct 10, 2018
@bobvawter
Copy link
Member

craig bot pushed a commit that referenced this issue Oct 30, 2018
31758: sql: Generalize date/time parsing r=bobvawter a=bobvawter

sql: Generalize date/time parsing

The current date/time parsing code relies on `time.ParseInLocation()`. It does
not support all of the various date/time formats accepted by PostgreSQL and
also requires multiple invocation to try the various date/time formats that we
do accept.

This change updates the date/time parsing code with a new implementation that
does not delegate to `time.ParseInLocation()` and is able to parse all
supported formats in a single pass.

In order to support parsing named timezones like `America/New_York`, we
delegate to `time.LoadLocation()` as we did previously.  `LoadLocation()` is
rather expensive, since it looks for tzinfo files on disk every time it is
invoked. A per-node, in-memory cache has been added to amortize this overhead.
Per #31978, the tzinfo used on each node could already be inconsistent,
depending on the tzinfo files present in the underlying OS.

The following table compares the new `ParseTimestamp()` function to calling
`ParseInLocation()`.  While it is true that `ParseInLocation()` is generally
faster for any given pattern, the current parsing code must call it repeatedly,
trying each supported date format until one succeeds. The test with the named
timezone also shows the significant overhead of calling `LoadLocation()`.

```
2003-06-12/ParseTimestamp-8             10000000               122 ns/op          81.53 MB/s
2003-06-12/ParseInLocation-8            30000000                35.6 ns/op       281.29 MB/s
2003-06-12_01:02:03/ParseTimestamp-8            10000000               163 ns/op         116.45 MB/s
2003-06-12_01:02:03/ParseInLocation-8           30000000                54.4 ns/op       349.16 MB/s
2003-06-12_04:05:06.789-04:00/ParseTimestamp-8          10000000               238 ns/op         121.69 MB/s
2003-06-12_04:05:06.789-04:00/ParseInLocation-8         10000000               161 ns/op         180.05 MB/s
2000-01-01T02:02:02.567+09:30/ParseTimestamp-8           5000000               233 ns/op         124.01 MB/s
2000-01-01T02:02:02.567+09:30/ParseInLocation-8         10000000               158 ns/op         182.41 MB/s
2003-06-12_04:05:06.789_America/New_York/ParseTimestamp-8                3000000               475 ns/op          84.06 MB/s
2003-06-12_04:05:06.789_America/New_York/ParseInLocation-8                200000              7313 ns/op           3.15 MB/s
```

The tests in `parsing_test.go` have an optional mode to cross-check the test
data aginst a PostgreSQL server.  This is useful for developing, but is not
part of the automated build.

Parsing of BC dates is supported, #28099 could then be completed by changing
the date-formatting code to print a BC date.

This change would allow #30697 (incomplete handling of datestyle) to be
re-evaluated, since the parser does allow configuration of YMD, DMY, or MDY
input styles.

Resolves #27500
Resolves #27501
Resolves #31954

Release note (sql change): A wider variety of date, time, and timestamp formats
are now accepted by the SQL frontend.

Release note (bug fix): Prepared statements that bind temporal values now
respect the session's timezone setting. Previously, bound temporal values were
always interpreted as though the session time zone were UTC.

Release note (backward-incompatible change): Timezone abbreviations, such as
EST, are no longer allowed when parsing or converting to a date/time type.
Previously, an abbreviation would be accepted if it were an alias for the
session's timezone.


Co-authored-by: Bob Vawter <bob@cockroachlabs.com>
@craig craig bot closed this as completed in #31758 Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants