-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support causality tokens #7945
Comments
Discussed today extensively with @bdarnell and @andreimatei
@andreimatei can you flesh out your thoughts here? |
(For context: we're having this discussion again because there were multiple community questions around this area recently) |
One way we could mitigate the risk of the client causality token pushing the clock too far into the future would be to force the client to wait until the wall time of timestamp to which it wants to push the HLC is within max offset of the wall clock on the gateway machine. If a client tries to push the clock an hour ahead, it's just going to sleep for an hour. We don't expect the causality token to be far in the future. As for different interfaces, there's been some offline discussions. One way would be to provide a sql interface. Without driver support anything we do here is likely to be clunky. Imagine a function In this proposal a client could obtain a causality token as follows: BEGIN;
...
SELECT commit_timestamp(); -- gives back a DECIMAL hlc timestamp
COMMIT As for the client using it, perhaps we add something like: SELECT crdb_wait($1); Which will sleep until the timestamp is at least close to the wall clock and then bump the gateway HLC. Drivers could help make the use of the causality token seamless by injecting queries on behalf of the user. I suspect that the reading of the timestamp and properly passing it to the client driver would be the biggest concern. Perhaps the driver can translate The causality token will need to be ratcheted across all connections in the connection pool. Another way the token could be transferred to a client would be through a Another thing which seems like it could be a problem re: causality tracking is #7526 where causality may not propagate properly across distsl during distributed execution. |
This has also been discussed before. In fact we even have recommended users to do so using existing SQL features. Say your previous sampled value is X, with the same unit (nanos+logical) as SELECT pg_sleep(X::INT::FLOAT / 1e9 - clock_timestamp()::FLOAT + <maxoffset in secs>) (the timestamp -> float/decimal conversion gives fractional seconds; X::INT gets rid of the logical part, and Y::FLOAT/1e9 moves the decimal point to the right place; finally, ISTM we should provide syntactic sugar for this formula but maybe we don't need much more!
Good idea, I like that! Good syntactic sugar for the above.
So here there's some data points missing:
We already provide "interstitial statements" that can be issued by clients outside of the SQL transaction state machine: SHOW SYNTAX. We can envision a Alternatively, we can store the token of the last txn in a session var so that the client can use However Ben pointed out that we should not rely on syntax that must be used outside of a begin..commit block because clients often use a connection pool and may not control which connection is used for the statement just after a COMMIT. Hence a big focus on something in-between the
|
Why do you think this? I feel strongly in the opposite direction: I don't believe we can make causality tokens usable enough to be relevant without custom drivers.
This is one way being in the driver can be important. The driver can do a |
We have marked this issue as stale because it has been inactive for |
Still relevant, and we even discussed it on slack in the past few weeks. (internal link) |
Fixes cockroachdb#79591 Relates to cockroachdb#7945 Release note (<category, see below>): A new adds a new special sentinel builtin `crdb_internal.commit_with_causality_token` has been added which can be used exclusively as the only expression in a `SELECT` statement at the end of an explicit transaction for the purpose of committing the the transaction and retrieving the now committed transaction's commit timestamp. This timestamp can then be used as the input to, say, an AS OF SYSTEM TIME clause or a changefeed in order to observe the exact snapshot at which the transaction committed. Note that after issuing such a statement successfully, the client must still issue a COMMIT to reset the connection state for the next transaction.
Fixes cockroachdb#79591 Relates to cockroachdb#7945 Release note (<category, see below>): A new adds a new special sentinel builtin `crdb_internal.commit_with_causality_token` has been added which can be used exclusively as the only expression in a `SELECT` statement at the end of an explicit transaction for the purpose of committing the the transaction and retrieving the now committed transaction's commit timestamp. This timestamp can then be used as the input to, say, an AS OF SYSTEM TIME clause or a changefeed in order to observe the exact snapshot at which the transaction committed. Note that after issuing such a statement successfully, the client must still issue a COMMIT to reset the connection state for the next transaction.
Fixes cockroachdb#79591 Relates to cockroachdb#7945 Release note (sql change): A new sql statement `SHOW COMMIT TIMESTAMP` has been added. This statement can be used to retrieve the commit timestamp of the current explicit transaction, current multi-statement implicit transaction, or previous transaction. The statement may be used in a variety of settings to maximize its utility in the face of connection pooling. When used as a part of an explicit transaction, the statement implicitly commits the transaction internally before being able to return a causality token. This is similar to the `RELEASE cockroach_restart` behavior; after issuing this statement, commands to the transaction will be rejected until `COMMIT` is issued. When used as part of a multi-statement implicit transaction, the statement must be the final statement. If it occurs in the middle of a multi-statement implicit transaction, it will be rejected with an error. When sent as a stand-alone single-statement implicit transaction, it will return the commit timestamp of the previously committed transaction. If there was no transaction on the connection, or the previous transaction did not commit (either rolled back or encountered an error), the command will fail with an error code 25000 (`InvalidTransactionState`).
Fixes cockroachdb#79591 Relates to cockroachdb#7945 Release note (sql change): A new sql statement `SHOW COMMIT TIMESTAMP` has been added. This statement can be used to retrieve the commit timestamp of the current explicit transaction, current multi-statement implicit transaction, or previous transaction. The statement may be used in a variety of settings to maximize its utility in the face of connection pooling. When used as a part of an explicit transaction, the statement implicitly commits the transaction internally before being able to return a causality token. This is similar to the `RELEASE cockroach_restart` behavior; after issuing this statement, commands to the transaction will be rejected until `COMMIT` is issued. When used as part of a multi-statement implicit transaction, the statement must be the final statement. If it occurs in the middle of a multi-statement implicit transaction, it will be rejected with an error. When sent as a stand-alone single-statement implicit transaction, it will return the commit timestamp of the previously committed transaction. If there was no transaction on the connection, or the previous transaction did not commit (either rolled back or encountered an error), the command will fail with an error code 25000 (`InvalidTransactionState`).
80848: sql: add SHOW COMMIT TIMESTAMP to retrieve a causality token r=ajwerner a=ajwerner Fixes #79591 Relates to #7945 Release note (sql change): A new sql statement `SHOW COMMIT TIMESTAMP` has been added. This statement can be used to retrieve the commit timestamp of the current explicit transaction, current multi-statement implicit transaction, or previous transaction. The statement may be used in a variety of settings to maximize its utility in the face of connection pooling. When used as a part of an explicit transaction, the statement implicitly commits the transaction internally before being able to return a causality token. This is similar to the `RELEASE cockroach_restart` behavior; after issuing this statement, commands to the transaction will be rejected until `COMMIT` is issued. When used as part of a multi-statement implicit transaction, the statement must be the final statement. If it occurs in the middle of a multi-statement implicit transaction, it will be rejected with an error. When sent as a stand-alone single-statement implicit transaction, it will return the commit timestamp of the previously committed transaction. If there was no transaction on the connection, or the previous transaction did not commit (either rolled back or encountered an error), the command will fail with an error code 25000 (`InvalidTransactionState`). The `SHOW COMMIT TIMESTAMP` statement is idempotent; it can be issued multiple times both inside of and after transactions and will return the same result. One rough edge is that the `cockroach sql` cli command will, by default, send statements on behalf of the user which will lead to repeated issuances of `SHOW COMMIT TIMESTAMP` from the CLI returning different values. If one disables syntax checking with `\set check_syntax=false` and one changes their prompt1 to not require a query, perhaps with `\set prompt1=%n@%M>;`, the command will become idempotent from the CLI. 91832: CODEOWNERS: updater importer to sql-exp r=rafiss a=dt Release note: none. Epic: none. 92338: util/stop: finish task span before Stop() returns r=andreimatei a=andreimatei Before this patch, a task's span was finished after the stopper considered the task to be finished. This was a problem for a test who wanted to assume that, once stopper.Stop() returns, all task spans are finished - which is a reasonable contract to expect. This patch reorders the span finish accordingly. Fixes #83886 Release note: None Epic: None Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
We have marked this issue as stale because it has been inactive for |
still relevant |
Any progress or update here? |
My apologies if this exists somewhere - wasn't able to find it.
We already support reading the causality token of a transaction via
cluster_logical_timestamp
, but we provide no way for a causally downstream transaction to ingest that token (i.e. making sure that its assigned timestamp is at least that represented by the token).When implementing this, we must validate the token. This is a tricky problem when clients are not trusted - the token will make it into the hybrid logical clocks, so supplying a bogus high value could easily trigger our clock offset mechanisms (or even worse, not trigger them and lead to stale reads).
The baseline is making sure we're not MaxOffset ahead of the current HLC timestamp, but that's not enough. Perhaps causality tokens which can be passed in that way ought to be certified by the node certs and be totally opaque to the client.
Jira issue: CRDB-6165
The text was updated successfully, but these errors were encountered: