-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] enginepb: strongly type enginepb.TxnTimestamp, split from hlc.Timestamp #58213
Closed
nvanbenschoten
wants to merge
1
commit into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/txnTimestampAlias
Closed
[DNM] enginepb: strongly type enginepb.TxnTimestamp, split from hlc.Timestamp #58213
nvanbenschoten
wants to merge
1
commit into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/txnTimestampAlias
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…imestamp This commit splits off a new enginepb.TxnTimestamp type (see `pkg/storage/enginepb/mvcc.go`) from the existing hlc.Timestamp type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. hlc.Timestamp continues to serve in its original role — representing a real timestamp pulled from a clock on one of the nodes in the system. enginepb.TxnTimestamp is introduced to serve in hlc.Timestamp's current secondary role – representing MVCC time and the time that transactions read and write at. In the words of cockroachdb#56373 (review), this splits "clock-domain timestamps" from "transaction-domain timestamps", allowing us to use to type system to help keep them separate and to make their interactions more explicit. This results in boiling most of the ocean, but comes with some benefits in the end. - It makes conversion from the clock-domain to the transaction-domain explicit. This must pass through a call to `enginepb.ToTxnTimestamp`. - It makes conversions from the transaction-domain to the clock-domain impossible (not quite there yet). - Similarly, it makes it impossible for transaction-domain timestamps to be passed into `hlc.Clock.Update`. - It allows us to more clearly state that clock-domain timestamps must always trail present time but transaction-domain timestamps can move into the future. As such, only transaction-domain timestamps will ever have the synthetic bit set. I'm interested to get people's take on this. These benefits are nice, but the size of this diff (it still doesn't come close to compiling) and the newly introduced burden of needing to choose between multiple timestamp types is not.
Closing because I don't think this is the way to go anymore. Just wanted the permalink. |
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this pull request
Dec 29, 2020
This commit splits off a new hlc.ClockTimestamp type from the existing hlc.Timestamp type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. Timestamp serves the role of representing an arbitrary timestamp, one that makes no claim about real time. ClockTimestamp serves the role of representing a real timestamp pulled from one of the HLC clocks in the system. Because of this, it has the added capability of being able to update a peer's HLC clock. As such, a clock timestamp is an promise that some node in the system has a clock with a reading equal to or above its value. The commit also pushes towards a world where the ClockTimestamp specialization is maintained through a combination of static and dynamic typing. While the primary mechanisms that use ClockTimestamps will use static typing, Timestamp will also carry a bit indicating whether it can be downcast to a ClockTimestamp. This bit will replace the current synthetic flag. So instead of an interaction like the one introduced in cockroachdb#57077 checking whether the value has a "synthetic timestamp", it will instead check whether the value has a "clock timestamp". This serves as an alternative to an approach like the one in cockroachdb#58213, where we split the types in the other direction, keeping Timestamp to represent a clock timestamp and introduce a new enginepb.TxnTimestamp to represent an arbitrary MVCC timestamp. That change resulted in a significantly larger diff, as it misjudged the extremely small percent of all Timestamp usages which care about the capability of updating remote HLC clocks. The current approach to the synthetic flag on Timestamps also had issues which are avoided by the inversion of the flag's meaning as from_clock (see later commit). Specifically, while inverting the flag optimizes the encoded size of non-clock (currently synthetic) timestamps at the expense of the encoded size of clock timestamps by 2 bytes, it comes with major benefits that outweigh this cost. By making clock timestamps opt-in instead of opt-out, we more closely match the capability model we're trying to establish, where a clock timestamp can do everything a normal timestamp can, but can also be used to update an HLC clock. The opt-in nature mitigates the risk of bugs that forget to set this flag correctly. Instead of risking a capability escalation where a non-clock timestamp is incorrectly interpreted as a clock timestamp and used to update an HLC clock, we risk a much less harmful capability de-escalation where a clock timestamp loses its ability to update an HLC clock. We can then much more carefully audit the cases where the flag needs to be unset, such as in the Timestamp.Add and Timestamp.Forward methods.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this pull request
Dec 30, 2020
This commit splits off a new hlc.ClockTimestamp type from the existing hlc.Timestamp type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. Timestamp serves the role of representing an arbitrary timestamp, one that makes no claim about real time. ClockTimestamp serves the role of representing a real timestamp pulled from one of the HLC clocks in the system. Because of this, it has the added capability to update a peer's HLC clock. As such, a clock timestamp is an promise that some node in the system has a clock with a reading equal to or above its value. The commit also pushes towards a world where the ClockTimestamp specialization is maintained through a combination of static and dynamic typing. While the primary mechanisms that use ClockTimestamps will use static typing, Timestamp will also carry a bit indicating whether it can be downcast to a ClockTimestamp. This bit will replace the current flag structure. So instead of an interaction like the one introduced in cockroachdb#57077 checking whether the value has a "synthetic" flag set, it will instead check whether the value has a "clock timestamp". This serves as an alternative to an approach like the one in cockroachdb#58213, where we split the types in the other direction, keeping Timestamp to represent a clock timestamp and introduce a new enginepb.TxnTimestamp to represent an arbitrary MVCC timestamp. That change resulted in a significantly larger diff, as it misjudged the extremely small percent of all Timestamp usages which care about the capability of updating remote HLC clocks. It also forced all users (tests, etc.) of timestamps to address this question front-and-center, which had benefits but was also a burden on all uses of timestamps. The original intention of this change was to follow it up by inverting the synthetic flag on Timestamps, replacing an "is_synthetic" bit with a "from_clock" bit. While inverting the flag optimized the encoded size of non-clock (currently synthetic) timestamps at the expense of the encoded size of clock timestamps by 2 bytes, it came with major benefits. By making clock timestamps opt-in instead of opt-out, we more closely match the capability model we're trying to establish here with static typing, where a clock timestamp can do everything a normal timestamp can, but can also be used to update an HLC clock. The opt-in nature mitigated the risk of bugs that forget to set this flag correctly. Instead of risking a capability escalation where a non-clock timestamp is incorrectly interpreted as a clock timestamp and used to update an HLC clock, we risked a much less harmful capability de-escalation where a clock timestamp loses its ability to update an HLC clock. We could then much more carefully audit the cases where the flag needs to be unset, such as in the Timestamp.Add and Timestamp.Forward methods. Unfortunately, this "from_clock" approach (attempted in 4e34e20) came with serious complications as well, which became clear only after making the change. Chiefly, the approach made the mixed-version migration of this field significantly more difficult. With v20.2 nodes unaware of the flag, v21.1 nodes would need to find a way to either auto-assign it for all timestamps coming from v21.1 nodes, or to work around its absence. But this latter idea touches on a second complication – the absence of the field resulted (intentionally) in an MVCC encoding that v20.2 nodes would be unable to decode. So v21.1 nodes would need to be very careful to always set the "from_clock" flag on all timestamps when in a mixed version cluster. But this all made the migration towards not setting the flag once in a v21.1-only cluster even more difficult. In the end, opting into "is_synthetic" is a lot easier to work with than opting into "from_clock", so that's how the rest of this PR will operate.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this pull request
Jan 5, 2021
This commit splits off a new hlc.ClockTimestamp type from the existing hlc.Timestamp type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. Timestamp serves the role of representing an arbitrary timestamp, one that makes no claim about real time. ClockTimestamp serves the role of representing a real timestamp pulled from one of the HLC clocks in the system. Because of this, it has the added capability to update a peer's HLC clock. As such, a clock timestamp is an promise that some node in the system has a clock with a reading equal to or above its value. The commit also pushes towards a world where the ClockTimestamp specialization is maintained through a combination of static and dynamic typing. While the primary mechanisms that use ClockTimestamps will use static typing, Timestamp will also carry a bit indicating whether it can be downcast to a ClockTimestamp. This bit will replace the current flag structure. So instead of an interaction like the one introduced in cockroachdb#57077 checking whether the value has a "synthetic" flag set, it will instead check whether the value has a "clock timestamp". This serves as an alternative to an approach like the one in cockroachdb#58213, where we split the types in the other direction, keeping Timestamp to represent a clock timestamp and introduce a new enginepb.TxnTimestamp to represent an arbitrary MVCC timestamp. That change resulted in a significantly larger diff, as it misjudged the extremely small percent of all Timestamp usages which care about the capability of updating remote HLC clocks. It also forced all users (tests, etc.) of timestamps to address this question front-and-center, which had benefits but was also a burden on all uses of timestamps. The original intention of this change was to follow it up by inverting the synthetic flag on Timestamps, replacing an "is_synthetic" bit with a "from_clock" bit. While inverting the flag optimized the encoded size of non-clock (currently synthetic) timestamps at the expense of the encoded size of clock timestamps by 2 bytes, it came with major benefits. By making clock timestamps opt-in instead of opt-out, we more closely match the capability model we're trying to establish here with static typing, where a clock timestamp can do everything a normal timestamp can, but can also be used to update an HLC clock. The opt-in nature mitigated the risk of bugs that forget to set this flag correctly. Instead of risking a capability escalation where a non-clock timestamp is incorrectly interpreted as a clock timestamp and used to update an HLC clock, we risked a much less harmful capability de-escalation where a clock timestamp loses its ability to update an HLC clock. We could then much more carefully audit the cases where the flag needs to be unset, such as in the Timestamp.Add and Timestamp.Forward methods. Unfortunately, this "from_clock" approach (attempted in 4e34e20) came with serious complications as well, which became clear only after making the change. Chiefly, the approach made the mixed-version migration of this field significantly more difficult. With v20.2 nodes unaware of the flag, v21.1 nodes would need to find a way to either auto-assign it for all timestamps coming from v21.1 nodes, or to work around its absence. But this latter idea touches on a second complication – the absence of the field resulted (intentionally) in an MVCC encoding that v20.2 nodes would be unable to decode. So v21.1 nodes would need to be very careful to always set the "from_clock" flag on all timestamps when in a mixed version cluster. But this all made the migration towards not setting the flag once in a v21.1-only cluster even more difficult. In the end, opting into "is_synthetic" is a lot easier to work with than opting into "from_clock", so that's how the rest of this PR will operate.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this pull request
Jan 5, 2021
This commit splits off a new hlc.ClockTimestamp type from the existing hlc.Timestamp type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. Timestamp serves the role of representing an arbitrary timestamp, one that makes no claim about real time. ClockTimestamp serves the role of representing a real timestamp pulled from one of the HLC clocks in the system. Because of this, it has the added capability to update a peer's HLC clock. As such, a clock timestamp is an promise that some node in the system has a clock with a reading equal to or above its value. The commit also pushes towards a world where the ClockTimestamp specialization is maintained through a combination of static and dynamic typing. While the primary mechanisms that use ClockTimestamps will use static typing, Timestamp will also carry a bit indicating whether it can be downcast to a ClockTimestamp. This bit will replace the current flag structure. So instead of an interaction like the one introduced in cockroachdb#57077 checking whether the value has a "synthetic" flag set, it will instead check whether the value has a "clock timestamp". This serves as an alternative to an approach like the one in cockroachdb#58213, where we split the types in the other direction, keeping Timestamp to represent a clock timestamp and introduce a new enginepb.TxnTimestamp to represent an arbitrary MVCC timestamp. That change resulted in a significantly larger diff, as it misjudged the extremely small percent of all Timestamp usages which care about the capability of updating remote HLC clocks. It also forced all users (tests, etc.) of timestamps to address this question front-and-center, which had benefits but was also a burden on all uses of timestamps. The original intention of this change was to follow it up by inverting the synthetic flag on Timestamps, replacing an "is_synthetic" bit with a "from_clock" bit. While inverting the flag optimized the encoded size of non-clock (currently synthetic) timestamps at the expense of the encoded size of clock timestamps by 2 bytes, it came with major benefits. By making clock timestamps opt-in instead of opt-out, we more closely match the capability model we're trying to establish here with static typing, where a clock timestamp can do everything a normal timestamp can, but can also be used to update an HLC clock. The opt-in nature mitigated the risk of bugs that forget to set this flag correctly. Instead of risking a capability escalation where a non-clock timestamp is incorrectly interpreted as a clock timestamp and used to update an HLC clock, we risked a much less harmful capability de-escalation where a clock timestamp loses its ability to update an HLC clock. We could then much more carefully audit the cases where the flag needs to be unset, such as in the Timestamp.Add and Timestamp.Forward methods. Unfortunately, this "from_clock" approach (attempted in 4e34e20) came with serious complications as well, which became clear only after making the change. Chiefly, the approach made the mixed-version migration of this field significantly more difficult. With v20.2 nodes unaware of the flag, v21.1 nodes would need to find a way to either auto-assign it for all timestamps coming from v21.1 nodes, or to work around its absence. But this latter idea touches on a second complication – the absence of the field resulted (intentionally) in an MVCC encoding that v20.2 nodes would be unable to decode. So v21.1 nodes would need to be very careful to always set the "from_clock" flag on all timestamps when in a mixed version cluster. But this all made the migration towards not setting the flag once in a v21.1-only cluster even more difficult. In the end, opting into "is_synthetic" is a lot easier to work with than opting into "from_clock", so that's how the rest of this PR will operate.
craig bot
pushed a commit
that referenced
this pull request
Jan 5, 2021
58349: hlc: strongly type ClockTimestamp as specialization of Timestamp r=nvanbenschoten a=nvanbenschoten Closes #57684 - no need to panic, type system now protects us. This PR splits off a new `hlc.ClockTimestamp` type from the existing `hlc.Timestamp` type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. `Timestamp` serves the role of representing an arbitrary timestamp, one that makes no claim about real-time. `ClockTimestamp` serves the role of representing a real timestamp pulled from one of the HLC clocks in the system. Because of this, it has the added capability to update a peer's HLC clock. As such, a clock timestamp is a promise that some node in the system has a clock with a reading equal to or above its value. The PR also moves to a world where the `ClockTimestamp` specialization is maintained through a combination of static and dynamic typing. While the primary mechanisms that use `ClockTimestamps` will use static typing, Timestamp will also carry a bit indicating whether it can be downcast to a `ClockTimestamp`. This bit will replace the current flag structure. So instead of an interaction like the one introduced in #57077 checking whether the value has an arbitrary "synthetic" flag set, it will instead check whether the value has a "clock timestamp" (same mechanism, but slightly more consistent meaning). This serves as an alternative to an approach like the one in #58213, where we split the types in the other direction, keeping `Timestamp` to represent a clock timestamp and introduce a new `enginepb.TxnTimestamp` to represent an arbitrary MVCC timestamp. That change resulted in a significantly larger diff, as it misjudged the extremely small percent of all `Timestamp` usages which care about the capability of updating remote HLC clocks. It also forced all users (tests, etc.) of timestamps to address this question front-and-center, which had benefits but was also a burden on all uses of timestamps. The original intention of this change was to follow it up by inverting the synthetic flag on `Timestamp`, replacing an "is_synthetic" bit with a "from_clock" bit. While inverting the flag optimized the encoded size of non-clock (currently synthetic) timestamps at the expense of the encoded size of clock timestamps by 2 bytes, it came with major benefits. By making clock timestamps opt-in instead of opt-out, we more closely match the capability model we're trying to establish here with static typing, where a clock timestamp can do everything a normal timestamp can, but can also be used to update an HLC clock. The opt-in nature mitigated the risk of bugs that forget to set this flag correctly. Instead of risking a capability escalation where a non-clock timestamp is incorrectly interpreted as a clock timestamp and used to update an HLC clock, we risked a much less harmful capability de-escalation where a clock timestamp loses its ability to update an HLC clock. We could then much more carefully audit the cases where the flag needs to be unset, such as in the `Timestamp.Add` and `Timestamp.Forward` methods. Unfortunately, this "from_clock" approach (attempted in 4e34e20) came with serious complications as well, which became clear only after making the change. Chiefly, the approach made the mixed-version migration of this field significantly more difficult. With v20.2 nodes unaware of the flag, v21.1 nodes would need to find a way to either auto-assign it for all timestamps coming from v21.1 nodes, or to work around its absence. But this latter idea touches on a second complication – the absence of the field resulted (intentionally) in an MVCC encoding that v20.2 nodes would be unable to decode. So v21.1 nodes would need to be very careful to always set the "from_clock" flag on all timestamps when in a mixed version cluster. But this all made the migration towards not setting the flag once in a v21.1-only cluster even more difficult. In the end, opting into "is_synthetic" is a lot easier to work with than opting into "from_clock", so that's how the rest of this PR operates. 58456: build: fix Pebble nightly benchmarks r=jbowens a=jbowens Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
nanduu04
pushed a commit
to nanduu04/cockroach
that referenced
this pull request
Jan 20, 2021
This commit splits off a new hlc.ClockTimestamp type from the existing hlc.Timestamp type through a type alias. While the two types share the same memory and proto representation, they have different purposes and properties. Timestamp serves the role of representing an arbitrary timestamp, one that makes no claim about real time. ClockTimestamp serves the role of representing a real timestamp pulled from one of the HLC clocks in the system. Because of this, it has the added capability to update a peer's HLC clock. As such, a clock timestamp is an promise that some node in the system has a clock with a reading equal to or above its value. The commit also pushes towards a world where the ClockTimestamp specialization is maintained through a combination of static and dynamic typing. While the primary mechanisms that use ClockTimestamps will use static typing, Timestamp will also carry a bit indicating whether it can be downcast to a ClockTimestamp. This bit will replace the current flag structure. So instead of an interaction like the one introduced in cockroachdb#57077 checking whether the value has a "synthetic" flag set, it will instead check whether the value has a "clock timestamp". This serves as an alternative to an approach like the one in cockroachdb#58213, where we split the types in the other direction, keeping Timestamp to represent a clock timestamp and introduce a new enginepb.TxnTimestamp to represent an arbitrary MVCC timestamp. That change resulted in a significantly larger diff, as it misjudged the extremely small percent of all Timestamp usages which care about the capability of updating remote HLC clocks. It also forced all users (tests, etc.) of timestamps to address this question front-and-center, which had benefits but was also a burden on all uses of timestamps. The original intention of this change was to follow it up by inverting the synthetic flag on Timestamps, replacing an "is_synthetic" bit with a "from_clock" bit. While inverting the flag optimized the encoded size of non-clock (currently synthetic) timestamps at the expense of the encoded size of clock timestamps by 2 bytes, it came with major benefits. By making clock timestamps opt-in instead of opt-out, we more closely match the capability model we're trying to establish here with static typing, where a clock timestamp can do everything a normal timestamp can, but can also be used to update an HLC clock. The opt-in nature mitigated the risk of bugs that forget to set this flag correctly. Instead of risking a capability escalation where a non-clock timestamp is incorrectly interpreted as a clock timestamp and used to update an HLC clock, we risked a much less harmful capability de-escalation where a clock timestamp loses its ability to update an HLC clock. We could then much more carefully audit the cases where the flag needs to be unset, such as in the Timestamp.Add and Timestamp.Forward methods. Unfortunately, this "from_clock" approach (attempted in 4e34e20) came with serious complications as well, which became clear only after making the change. Chiefly, the approach made the mixed-version migration of this field significantly more difficult. With v20.2 nodes unaware of the flag, v21.1 nodes would need to find a way to either auto-assign it for all timestamps coming from v21.1 nodes, or to work around its absence. But this latter idea touches on a second complication – the absence of the field resulted (intentionally) in an MVCC encoding that v20.2 nodes would be unable to decode. So v21.1 nodes would need to be very careful to always set the "from_clock" flag on all timestamps when in a mixed version cluster. But this all made the migration towards not setting the flag once in a v21.1-only cluster even more difficult. In the end, opting into "is_synthetic" is a lot easier to work with than opting into "from_clock", so that's how the rest of this PR will operate.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit splits off a new enginepb.TxnTimestamp type (see
pkg/storage/enginepb/mvcc.go
)from the existing hlc.Timestamp type through a type alias. While the two types
share the same memory and proto representation, they have different purposes and
properties. hlc.Timestamp continues to serve in its original role — representing
a real timestamp pulled from a clock on one of the nodes in the system.
enginepb.TxnTimestamp is introduced to serve in hlc.Timestamp's current
secondary role – representing MVCC time and the time that transactions read and
write at. In the words of #56373 (review),
this splits "clock-domain timestamps" from "transaction-domain timestamps", allowing
us to use to type system to help keep them separate and to make their interactions
more explicit.
This results in boiling most of the ocean, but comes with some benefits in the
end.
It makes conversion from the clock-domain to the transaction-domain explicit.
This must pass through a call to
enginepb.ToTxnTimestamp
.It makes conversions from the transaction-domain to the clock-domain impossible
(not quite there yet).
Similarly, it makes it impossible for transaction-domain timestamps to be
passed into
hlc.Clock.Update
.It allows us to more clearly state that clock-domain timestamps must always trail
present time but transaction-domain timestamps can move into the future. As such,
only transaction-domain timestamps will ever have the synthetic bit set.
I'm interested to get people's take on this. These benefits are nice, but the
size of this diff (it still doesn't come close to compiling) and the newly
introduced burden of needing to choose between multiple timestamp types is not.