Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support dates BC #28099

Closed
jordanlewis opened this issue Jul 31, 2018 · 7 comments
Closed

sql: support dates BC #28099

jordanlewis opened this issue Jul 31, 2018 · 7 comments
Labels
A-sql-datatypes SQL column types usable in table descriptors. A-sql-encoding Relating to the SQL/KV encoding. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases.
Milestone

Comments

@jordanlewis
Copy link
Member

Currently, CockroachDB has no support for dates with negative years, or BC dates. However, it does permit inputting negative dates via the binary pgwire protocol, which can lead to some issues down the line.

Specifically, our date formatting code will format a BC date by prepending a negative sign to the year. This output date is then unreadable by the parser, leading to an un-roundtrippable datum, which is not permitted.

This needs to be corrected, by either teaching the date parser about negative years, or teaching the date parser about BC dates and teaching the date formatter to output BC when appropriate.

@knz for triage.

@jordanlewis jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-encoding Relating to the SQL/KV encoding. A-sql-datatypes SQL column types usable in table descriptors. labels Jul 31, 2018
@jordanlewis jordanlewis added this to the Later milestone Jul 31, 2018
@knz
Copy link
Contributor

knz commented Jul 31, 2018

probably to be bundled with #27500

@solongordon
Copy link
Contributor

In case it makes this more urgent, the binary protocol isn't the only way to get negative dates.


> SELECT (now() - '2019y'::INTERVAL)::DATE;
+----------------------------+
|            date            |
+----------------------------+
| -0001-07-31 00:00:00+00:00 |
+----------------------------+

> SELECT (-999999)::DATE;
+----------------------------+
|            date            |
+----------------------------+
| -0768-02-05 00:00:00+00:00 |
+----------------------------+

@jordanlewis
Copy link
Member Author

Hmm... unfortunate.

@jordanlewis
Copy link
Member Author

In that case, this is already broken. I think running SELECT (-999999)::DATE FROM t where t is some table will throw an error on master.

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Aug 7, 2018
@bobvawter bobvawter self-assigned this Oct 23, 2018
craig bot pushed a commit that referenced this issue Oct 30, 2018
31758: sql: Generalize date/time parsing r=bobvawter a=bobvawter

sql: Generalize date/time parsing

The current date/time parsing code relies on `time.ParseInLocation()`. It does
not support all of the various date/time formats accepted by PostgreSQL and
also requires multiple invocation to try the various date/time formats that we
do accept.

This change updates the date/time parsing code with a new implementation that
does not delegate to `time.ParseInLocation()` and is able to parse all
supported formats in a single pass.

In order to support parsing named timezones like `America/New_York`, we
delegate to `time.LoadLocation()` as we did previously.  `LoadLocation()` is
rather expensive, since it looks for tzinfo files on disk every time it is
invoked. A per-node, in-memory cache has been added to amortize this overhead.
Per #31978, the tzinfo used on each node could already be inconsistent,
depending on the tzinfo files present in the underlying OS.

The following table compares the new `ParseTimestamp()` function to calling
`ParseInLocation()`.  While it is true that `ParseInLocation()` is generally
faster for any given pattern, the current parsing code must call it repeatedly,
trying each supported date format until one succeeds. The test with the named
timezone also shows the significant overhead of calling `LoadLocation()`.

```
2003-06-12/ParseTimestamp-8             10000000               122 ns/op          81.53 MB/s
2003-06-12/ParseInLocation-8            30000000                35.6 ns/op       281.29 MB/s
2003-06-12_01:02:03/ParseTimestamp-8            10000000               163 ns/op         116.45 MB/s
2003-06-12_01:02:03/ParseInLocation-8           30000000                54.4 ns/op       349.16 MB/s
2003-06-12_04:05:06.789-04:00/ParseTimestamp-8          10000000               238 ns/op         121.69 MB/s
2003-06-12_04:05:06.789-04:00/ParseInLocation-8         10000000               161 ns/op         180.05 MB/s
2000-01-01T02:02:02.567+09:30/ParseTimestamp-8           5000000               233 ns/op         124.01 MB/s
2000-01-01T02:02:02.567+09:30/ParseInLocation-8         10000000               158 ns/op         182.41 MB/s
2003-06-12_04:05:06.789_America/New_York/ParseTimestamp-8                3000000               475 ns/op          84.06 MB/s
2003-06-12_04:05:06.789_America/New_York/ParseInLocation-8                200000              7313 ns/op           3.15 MB/s
```

The tests in `parsing_test.go` have an optional mode to cross-check the test
data aginst a PostgreSQL server.  This is useful for developing, but is not
part of the automated build.

Parsing of BC dates is supported, #28099 could then be completed by changing
the date-formatting code to print a BC date.

This change would allow #30697 (incomplete handling of datestyle) to be
re-evaluated, since the parser does allow configuration of YMD, DMY, or MDY
input styles.

Resolves #27500
Resolves #27501
Resolves #31954

Release note (sql change): A wider variety of date, time, and timestamp formats
are now accepted by the SQL frontend.

Release note (bug fix): Prepared statements that bind temporal values now
respect the session's timezone setting. Previously, bound temporal values were
always interpreted as though the session time zone were UTC.

Release note (backward-incompatible change): Timezone abbreviations, such as
EST, are no longer allowed when parsing or converting to a date/time type.
Previously, an abbreviation would be accepted if it were an alias for the
session's timezone.


Co-authored-by: Bob Vawter <bob@cockroachlabs.com>
@knz knz unassigned knz and bobvawter Jan 3, 2019
craig bot pushed a commit that referenced this issue Mar 11, 2019
35442: roachtest: add nightly multi-AZ tpccbench test r=nvanbenschoten a=nvanbenschoten

This test deviates from similar roachtests in a few useful ways:
- it tests across three AZs in the same region
- it tests a 6-node cluster
- it loads 5k tpcc warehouses
- it runs on only about half of them (testing the impact of cold data)

I've run this a number of times over the past few days and settled upon
2500 warehouses as a good estimate.

Release note: None

35602: pgdate: Improve handling of negative years r=bobvawter a=bobvawter

The current parsing code doesn't handle negative year values and returns an
unhelpful error message. This change allows negative years to be specified, so
that negative-year datums can at least be round-tripped through the parser.

Supports #28099
Fixes #35255

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Bob Vawter <bob@cockroachlabs.com>
@jordanlewis
Copy link
Member Author

@mjibson ping

@maddyblue
Copy link
Contributor

Fixed by #36938

@jordanlewis
Copy link
Member Author

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-datatypes SQL column types usable in table descriptors. A-sql-encoding Relating to the SQL/KV encoding. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases.
Projects
None yet
Development

No branches or pull requests

5 participants