-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: refactor dates to be fully PG compatible #36938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great stuff! The API is very clean!
And I also like the compatibility approach that you and Bram came up with. Just wondering: do we plan to ever remove orig
fields to reduce the space usage? I understand that it'll break the compatibility with already existing indices, but maybe like in 2 or 3 major releases?
Reviewed 29 of 29 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @knz, and @mjibson)
pkg/sql/logictest/testdata/logic_test/datetime, line 143 at r1 (raw file):
query TTTT SELECT 'infinity'::date + 1, 'infinity'::date - 1, '-infinity'::date + 1, '-infinity'::date - 1
[nit]: this query overlaps with the one a few above, possibly combine them.
pkg/sql/logictest/testdata/logic_test/datetime, line 1281 at r1 (raw file):
subtest regression_36146 statement error out of range
Why does it have a different error message?
pkg/sql/logictest/testdata/logic_test/pg_catalog, line 853 at r1 (raw file):
1042 bpchar 1307062959 NULL -1 false b 1043 varchar 1307062959 NULL -1 false b 1082 date 1307062959 NULL 16 true b
[question]: just to confirm my understanding, days
take up 4 bytes, orig
another 8, and hasOrig
yet another 4 bytes, correct?
pkg/util/timeutil/pgdate/pgdate.go, line 18 at r1 (raw file):
// and times in a manner that is compatible with PostgreSQL. // // The implementation here is inspired by the following
[nit]: possibly this comment needs adjusting.
pkg/util/timeutil/pgdate/pgdate.go, line 69 at r1 (raw file):
// instead of days when encoding for on-disk. This is required // so that we can roundtrip to the same on-disk value, which is // necessary deleting old index entries. This value should never
[nit]: necessary for deleting.
pkg/util/timeutil/pgdate/pgdate.go, line 138 at r1 (raw file):
} func makeDateFromYMD(year, month, day int) Date {
I can't seem to find where this method is used. Also, is panic
king here unavoidable? Maybe return an error instead?
pkg/util/timeutil/pgdate/pgdate.go, line 164 at r1 (raw file):
secondsPerHour = 60 * 60 secondsPerDay = 24 * secondsPerHour daysPer400Years = 365*400 + 97
Also I can't seem to find where these daysPer...
are used.
pkg/util/timeutil/pgdate/pgdate.go, line 176 at r1 (raw file):
case math.MaxInt32: buf.WriteString("infinity") default:
Do we need a special case for epoch
here?
pkg/util/timeutil/pgdate/pgdate.go, line 262 at r1 (raw file):
if !ok { return Date{}, pgerror.NewErrorf(pgerror.CodeDatetimeFieldOverflowError, "%s + %d is out of range", d, days)
[nit]: %s - %d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @knz, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/datetime, line 143 at r1 (raw file):
Previously, yuzefovich wrote…
[nit]: this query overlaps with the one a few above, possibly combine them.
Removed, it was redundant.
pkg/sql/logictest/testdata/logic_test/datetime, line 1281 at r1 (raw file):
Previously, yuzefovich wrote…
Why does it have a different error message?
The original problem happened because of overflow during distsql serialization. But now it's not possible to even create that date.
pkg/sql/logictest/testdata/logic_test/pg_catalog, line 853 at r1 (raw file):
Previously, yuzefovich wrote…
[question]: just to confirm my understanding,
days
take up 4 bytes,orig
another 8, andhasOrig
yet another 4 bytes, correct?
Go aligns things at word boundaries for adjacent properties. So days is 4, hasOrig is 1, and there's 3 unused bytes, but 8 total. Then orig is another 8.
pkg/util/timeutil/pgdate/pgdate.go, line 18 at r1 (raw file):
Previously, yuzefovich wrote…
[nit]: possibly this comment needs adjusting.
Done.
pkg/util/timeutil/pgdate/pgdate.go, line 69 at r1 (raw file):
Previously, yuzefovich wrote…
[nit]: necessary for deleting.
Done.
pkg/util/timeutil/pgdate/pgdate.go, line 138 at r1 (raw file):
Previously, yuzefovich wrote…
I can't seem to find where this method is used. Also, is
panic
king here unavoidable? Maybe return an error instead?
Removed.
pkg/util/timeutil/pgdate/pgdate.go, line 164 at r1 (raw file):
Previously, yuzefovich wrote…
Also I can't seem to find where these
daysPer...
are used.
Removed.
pkg/util/timeutil/pgdate/pgdate.go, line 176 at r1 (raw file):
Previously, yuzefovich wrote…
Do we need a special case for
epoch
here?
No. Epoch is an alias for 1970-01-01, it's not preserved as the word 'epoch'.
pkg/util/timeutil/pgdate/pgdate.go, line 262 at r1 (raw file):
Previously, yuzefovich wrote…
[nit]:
%s - %d
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good work. I love that I was feeling like I was learning something while reading the code of the new date package. Eventually I have a dream we give dedicated packages with proper semantics to all the scalar types.
Only a few comments below.
Reviewed 24 of 29 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @mjibson, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/datetime, line 124 at r2 (raw file):
'0001-12-13 BC'::date
psql gives me " 0001-12-13 BC" on this
your test displays "0000-12-13"
is that intended?
(I suspect it is -- we're merely seeing Go's conversion of a time.Time to string, which can display year 0. However in that case I would invite you to also test/display the result of converting the date back to ::string
. Thank you.)
pkg/sql/sem/tree/eval.go, line 701 at r2 (raw file):
Fn: func(_ *EvalContext, left Datum, right Datum) (Datum, error) { a, b := MustBeDInt(left), MustBeDInt(right) r, ok := arith.SubWithOverflow(int64(a), int64(b))
This change does not belong to this commit. Please extract it to a separate commit in the sequence (or a subsequent PR) for clarity.
pkg/sql/sem/tree/eval.go, line 3111 at r2 (raw file):
res = NewDInt(DInt(v.Unix())) case *DDate: // TODO(mjibson): This cast is unsupported by postgres. Should we remove ours?
Huh, why remove it - the semantics are super well defined.
I agree on the error, but out of curiosity: what is the SQL way to test whether a date is infinite?
pkg/sql/sem/tree/eval.go, line 3158 at r2 (raw file):
case *DDate: // TODO(mjibson): This cast is unsupported by postgres. Should we remove ours? if !v.IsFinite() {
ditto
pkg/sql/sem/tree/eval.go, line 3180 at r2 (raw file):
// TODO(mjibson): This cast is unsupported by postgres. Should we remove ours? if !v.IsFinite() { return nil, errDecOutOfRange
ditto
pkg/sql/sem/tree/eval.go, line 3322 at r2 (raw file):
return d, nil case *DInt: // TODO(mjibson): This cast is unsupported by postgres. Should we remove ours?
Same comment as above - the semantics are well defined. I don't see a reason to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 24 of 29 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @mjibson, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/datetime, line 124 at r2 (raw file):
Previously, knz (kena) wrote…
'0001-12-13 BC'::date
psql gives me " 0001-12-13 BC" on this
your test displays "0000-12-13"is that intended?
(I suspect it is -- we're merely seeing Go's conversion of a time.Time to string, which can display year 0. However in that case I would invite you to also test/display the result of converting the date back to
::string
. Thank you.)
This a weird formatting case where year 0 is what we'd normally write as 1 BC. Year -1 is 2 BC, etc. We should probably avoid using go's stringification as a follow-on patch to make the output consistent with pg.
pkg/util/arith/arith.go, line 32 at r2 (raw file):
// Add32to64WithOverflow returns a+b. If ok is false, a+b overflowed. func Add32to64WithOverflow(a int32, b int64) (r int32, ok bool) { if b > math.MaxInt32 || b < math.MinInt32 {
How are these functions supposed to behave? Are they guaranteeing that the operation can be performed and that the resulting value fits within the return type? Consider the case where a := MaxInt32
and b := MinInt32-1
. I would expect this function to return -1
since the result is in range, even though it may have required performing the math using more bits.
pkg/util/timeutil/pgdate/pgdate.go, line 58 at r2 (raw file):
// Date is a postgres-compatible date implementation. It stores the number // of seconds from the postgres epoch (2000-01-01). Its properties are
If it's storing seconds, why is the field called days?
pkg/util/timeutil/pgdate/pgdate.go, line 210 at r2 (raw file):
// Compare compares two dates. func (d Date) Compare(other Date) int {
Is it valid to just return d.days - other.days
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @knz, @mjibson, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/datetime, line 124 at r2 (raw file):
Previously, bobvawter (Bob Vawter) wrote…
This a weird formatting case where year 0 is what we'd normally write as 1 BC. Year -1 is 2 BC, etc. We should probably avoid using go's stringification as a follow-on patch to make the output consistent with pg.
Added a test for the string conversion.
pkg/sql/sem/tree/eval.go, line 701 at r2 (raw file):
Previously, knz (kena) wrote…
This change does not belong to this commit. Please extract it to a separate commit in the sequence (or a subsequent PR) for clarity.
Moved to second commit.
pkg/sql/sem/tree/eval.go, line 3111 at r2 (raw file):
Previously, knz (kena) wrote…
Huh, why remove it - the semantics are super well defined.
I agree on the error, but out of curiosity: what is the SQL way to test whether a date is infinite?
The semantics used to be well defined: return the integer number of days since the unix epoch. Now that we support infinity dates, they are no longer well defined. What should we do in the infinity case? Should we return infinity for float and decimals since they support it? Should we not check for infinity at all, but instead just return whatever's there (max/min int32 in the inf cases)? Since postgres doesn't do this we can't check for compatibility, but it definitely adds some questions without clear answers. Furthermore, we chose the unix epoch fairly arbitrarily (especially considering that postgres uses a different epoch underneath). A user just has to know that these conversions use the unix epoch number of days, which isn't obvious at all.
I believe postgres made the better design choice here which is to support only the date - date
operator which returns the number of days between those two dates, and errors if either are infinity. This absolves the database of having to choose an arbitrary epoch, is conceptually more obvious than raw number conversion, and the infinity problem is a bit simpler also.
pkg/util/arith/arith.go, line 32 at r2 (raw file):
Previously, bobvawter (Bob Vawter) wrote…
How are these functions supposed to behave? Are they guaranteeing that the operation can be performed and that the resulting value fits within the return type? Consider the case where
a := MaxInt32
andb := MinInt32-1
. I would expect this function to return-1
since the result is in range, even though it may have required performing the math using more bits.
For now they are fine. I've added a comment describing the int32 size requirement of b so it's operation is clearer. But postgres doesn't support date - int8, only date - int4, so we are within the spec for now.
pkg/util/timeutil/pgdate/pgdate.go, line 58 at r2 (raw file):
Previously, bobvawter (Bob Vawter) wrote…
If it's storing seconds, why is the field called days?
seconds was a typo. Fixed the comment to say days.
pkg/util/timeutil/pgdate/pgdate.go, line 210 at r2 (raw file):
Previously, bobvawter (Bob Vawter) wrote…
Is it valid to just
return d.days - other.days
?
No, the API docs for tree.Datum.Compare specify +/- 1, not positive or negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bobvawter, @knz, @mjibson, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I still don't like a "TODO: remove this" for valid and well-defined features :)
Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @bobvawter, @mjibson, and @yuzefovich)
pkg/sql/sem/tree/eval.go, line 3111 at r2 (raw file):
Previously, mjibson (Matt Jibson) wrote…
The semantics used to be well defined: return the integer number of days since the unix epoch. Now that we support infinity dates, they are no longer well defined. What should we do in the infinity case? Should we return infinity for float and decimals since they support it? Should we not check for infinity at all, but instead just return whatever's there (max/min int32 in the inf cases)? Since postgres doesn't do this we can't check for compatibility, but it definitely adds some questions without clear answers. Furthermore, we chose the unix epoch fairly arbitrarily (especially considering that postgres uses a different epoch underneath). A user just has to know that these conversions use the unix epoch number of days, which isn't obvious at all.
I believe postgres made the better design choice here which is to support only the
date - date
operator which returns the number of days between those two dates, and errors if either are infinity. This absolves the database of having to choose an arbitrary epoch, is conceptually more obvious than raw number conversion, and the infinity problem is a bit simpler also.
-
the infinity support is new, so it's just normal we have to ask the question and a good answer. If we don't want to answer this now, you can refuse the conversion from an infinity date with an error, like you did here. If we want an answer, I agree with you that likely conversion to/from floats and decimals should use infinities in that case. I don't have a clear opinion about integers (should remain an error until then).
-
conversions with numbers using the epoch as reference is a common choice for server software. It's also consistent with the other date/time/timestamp conversions. It's a matter of documentation (for people who don't expect epoch-based conversions already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 3 of 0 LGTMs obtained (waiting on @mjibson and @yuzefovich)
Previously dates were an int64 number of days from 1970-01-01. Functions and operators needing to act on a date could manipulate the value of the date without any overflow checks, causing multiple bugs: - The postgres binary wire protocol uses an int32 number of days since 2000-01-01, thus it was possible to have dates in cockroach outside of the expressible range over the wire. We were blindly converting the int64 to an int32, possibly discarding data. - Adding dates to ints was done by normal addition. However this could overflow. Our normal int + int path uses overflow-checked addition, but dates were not subject to such checks. - Converting dates to a text representation (for sending over pgwire or our distsql serialization) converted the number of days to a time.Time using the standard library time.Date method, which (more-or-less) stores time as an int64 number of nanos. A very high year could overflow the nanosecond part causing it to silently by incorrect. - The infinity dates were not treated as infinity, but converted to an actual date. This change adds a new pgdate.Date type. It internally represented as an int32 number of days since 2000-01-01, because that's exactly what is needed by the postgres wire protocol. It doesn't export any internal properties, and instead forces users of it to use its API, which is guaranteed to be safe. The API does all overflow and bounds checking, and correctly handles the infinity dates. This fixes the above bugs since it's now not possible to mistakenly misuse Dates. When reading from any existing on-disk data, dates outside of the postgres bounds are converted to +/- infinity. This presents a problem since datums need to roundtrip to the same on-disk encoding so that old index entries can be deleted. In order to round trip correctly, the original on-disk integer is saved and used during serialization if it exists. Fixes #36557 Release note (sql change): Dates are now fully Postgres-compatible, including support for sentinel values (+/- infinity) and the Postgres date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of this range will be converted to the +/- infinity dates.
This puts all the overflow logic into a single package.
bors r+ |
36938: sql: refactor dates to be fully PG compatible r=mjibson a=mjibson Previously dates were an int64 number of days from 1970-01-01. Functions and operators needing to act on a date could manipulate the value of the date without any overflow checks, causing multiple bugs: - The postgres binary wire protocol uses an int32 number of days since 2000-01-01, thus it was possible to have dates in cockroach outside of the expressible range over the wire. We were blindly converting the int64 to an int32, possibly discarding data. - Adding dates to ints was done by normal addition. However this could overflow. Our normal int + int path uses overflow-checked addition, but dates were not subject to such checks. - Converting dates to a text representation (for sending over pgwire or our distsql serialization) converted the number of days to a time.Time using the standard library time.Date method, which (more-or-less) stores time as an int64 number of nanos. A very high year could overflow the nanosecond part causing it to silently by incorrect. - The infinity dates were not treated as infinity, but converted to an actual date. This change adds a new pgdate.Date type. It internally represented as an int32 number of days since 2000-01-01, because that's exactly what is needed by the postgres wire protocol. It doesn't export any internal properties, and instead forces users of it to use its API, which is guaranteed to be safe. The API does all overflow and bounds checking, and correctly handles the infinity dates. This fixes the above bugs since it's now not possible to mistakenly misuse Dates. When reading from any existing on-disk data, dates outside of the postgres bounds are converted to +/- infinity. This presents a problem since datums need to roundtrip to the same on-disk encoding so that old index entries can be deleted. In order to round trip correctly, the original on-disk integer is saved and used during serialization if it exists. Fixes #36557 Release note (sql change): Dates are now fully Postgres-compatible, including support for sentinel values (+/- infinity) and the Postgres date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of this range will be converted to the +/- infinity dates. Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Build succeeded |
Previously dates were an int64 number of days from 1970-01-01. Functions
and operators needing to act on a date could manipulate the value of
the date without any overflow checks, causing multiple bugs:
2000-01-01, thus it was possible to have dates in cockroach outside
of the expressible range over the wire. We were blindly converting the
int64 to an int32, possibly discarding data.
overflow. Our normal int + int path uses overflow-checked addition,
but dates were not subject to such checks.
our distsql serialization) converted the number of days to a time.Time
using the standard library time.Date method, which (more-or-less) stores
time as an int64 number of nanos. A very high year could overflow the
nanosecond part causing it to silently by incorrect.
actual date.
This change adds a new pgdate.Date type. It internally represented as
an int32 number of days since 2000-01-01, because that's exactly what
is needed by the postgres wire protocol. It doesn't export any internal
properties, and instead forces users of it to use its API, which is
guaranteed to be safe. The API does all overflow and bounds checking,
and correctly handles the infinity dates. This fixes the above bugs
since it's now not possible to mistakenly misuse Dates.
When reading from any existing on-disk data, dates outside of the postgres
bounds are converted to +/- infinity. This presents a problem since
datums need to roundtrip to the same on-disk encoding so that old index
entries can be deleted. In order to round trip correctly, the original
on-disk integer is saved and used during serialization if it exists.
Fixes #36557
Release note (sql change): Dates are now fully Postgres-compatible,
including support for sentinel values (+/- infinity) and the Postgres
date range (4714-11-24 BC to 5874897-12-31). Existing dates outside of
this range will be converted to the +/- infinity dates.