Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/base: document new Split requirements #1385

Merged
merged 1 commit into from
Nov 30, 2021

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Nov 23, 2021

Document the new Split requirements necessary for range keys support. See #1341.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion


internal/base/comparer.go, line 102 at r1 (raw file):

//    If Compare(prefix(a), prefix(b)) == 0, then Compare(suffix(a), suffix(b)) == Compare(a, b)
//
// 5) If b begins with a, then prefix(b) = prefix(a).

I wasn't sure what this existing requirement means. @sumeerbhola any idea?

I believe there's no reason why you can't have a key a and a key aaa@t10 such that prefix(a) = 'a' and prefix(aaa@t10) = 'aaa' but aaa@t10 begins with a.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens)


internal/base/comparer.go, line 76 at r1 (raw file):

// the user could potentially store in the database. Typically this means that
// the method must only handle valid MVCC encoded keys and should panic on any
// other input.

this panic comment is a bit odd. If the uses specifies a Split function for a DB, it will call it on any key in the DB -- it has no understanding of what is an MVCC key and what is not.


internal/base/comparer.go, line 91 at r1 (raw file):

//    that prefix:
//
//    Compare(prefix(a), a) < 0 if suffix(a) > 0

should this say len(suffix(a)) > 0?


internal/base/comparer.go, line 102 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I wasn't sure what this existing requirement means. @sumeerbhola any idea?

I believe there's no reason why you can't have a key a and a key aaa@t10 such that prefix(a) = 'a' and prefix(aaa@t10) = 'aaa' but aaa@t10 begins with a.

I agree. This looks incorrect. And one can have two keys that are a@10@10 and a@10, which don't have the same prefix. In the CockroachDB MVCCKey encoding I don't think there is anything that prevents a byte sequence that looks like a timestamp from appearing in the prefix.

Document the new Split requirements necessary for range keys support. See cockroachdb#1341.
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @sumeerbhola)


internal/base/comparer.go, line 76 at r1 (raw file):

Previously, sumeerbhola wrote…

this panic comment is a bit odd. If the uses specifies a Split function for a DB, it will call it on any key in the DB -- it has no understanding of what is an MVCC key and what is not.

yeah, that looks wrong to me too. I updated it.


internal/base/comparer.go, line 91 at r1 (raw file):

Previously, sumeerbhola wrote…

should this say len(suffix(a)) > 0?

Done.


internal/base/comparer.go, line 102 at r1 (raw file):

Previously, sumeerbhola wrote…

I agree. This looks incorrect. And one can have two keys that are a@10@10 and a@10, which don't have the same prefix. In the CockroachDB MVCCKey encoding I don't think there is anything that prevents a byte sequence that looks like a timestamp from appearing in the prefix.

Good point. Removed

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @sumeerbhola)


internal/base/comparer.go, line 102 at r1 (raw file):

In the CockroachDB MVCCKey encoding I don't think there is anything that prevents a byte sequence that looks like a timestamp from appearing in the prefix.

Ah, wait, that would make the NextPrefix prefix-sharing optimization more difficult.

I did a little looking around and found this comment in storage/doc.go:

The binary encoding used on the MVCC keys allows arbitrary keys to be
stored in the map (no restrictions on intermediate nil-bytes, for
example), while still sorting lexicographically and guaranteeing that
all timestamp-suffixed MVCC version keys sort consecutively with the
metadata key. We use an escape-based encoding which transforms all nul
("\x00") characters in the key and is terminated with the sequence
"\x00\x01", which is guaranteed to not occur elsewhere in the encoded
value. See util/encoding/encoding.go for more details.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)


internal/base/comparer.go, line 102 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

In the CockroachDB MVCCKey encoding I don't think there is anything that prevents a byte sequence that looks like a timestamp from appearing in the prefix.

Ah, wait, that would make the NextPrefix prefix-sharing optimization more difficult.

I did a little looking around and found this comment in storage/doc.go:

The binary encoding used on the MVCC keys allows arbitrary keys to be
stored in the map (no restrictions on intermediate nil-bytes, for
example), while still sorting lexicographically and guaranteeing that
all timestamp-suffixed MVCC version keys sort consecutively with the
metadata key. We use an escape-based encoding which transforms all nul
("\x00") characters in the key and is terminated with the sequence
"\x00\x01", which is guaranteed to not occur elsewhere in the encoded
value. See util/encoding/encoding.go for more details.
  • This comment about escape-based encoding is quite peculiar, and seems disconnected from the rest. We don't need any such guarantees to ensure ordering, since EngineKeyCompare has to split and then compare. Do you think we rely on this currently in any place?

  • I am not sure we can do the NextPrefix prefix-sharing optimization. An MVCCKey ends with \x00 (sentinel), followed by the WallTime using binary.BigEndian.PutUint64. Within the roachpb.Key we escape \x00 to \x00\xff https://github.com/cockroachdb/cockroach/blob/978df512d26ae5eda7ed4cf8e0d2da9d241b4858/pkg/util/encoding/encoding.go#L495-L504. If we ever had the WallTime starting with \xff we would be in trouble. Because we are casting an int64 WallTime to uint64, it can never be \xff. So that seems to save us. And if \x00\x01 can only occur at the end of the roachpb.Key it adds another layer of protection. But without fully understanding the sql key encoding, I'd be very wary of doing anything, and even then it seems risky.

  • The SeekGE we use for NextKey does p.iter.SeekGE(append(p.keyBuf, 0, 0)). So \x00\x00 at the end of the roachpb.Key. If the WallTime started with \x00 (which I think is not the case if the system is running at the current time) we would seek to an older version of the same roachpb.Key. I hope I am mistaken in my thinking here.

Copy link
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @sumeerbhola)


internal/base/comparer.go, line 102 at r1 (raw file):

Do you think we rely on this currently in any place?

Not that I know of, but this comment here had me wondering.

I am not sure we can do the NextPrefix prefix-sharing optimization.

It definitely tempers the benefits, but I think it can be adapted: If the shared bytes is greater than the currentPrefixLen, we Split the new key. If the new key's prefix length equals currentPrefixLen, we skip past it.

It's unfortunate that we need to Split the key within the blockIter, but it I suspect it still helps by avoiding key comparisons in higher levels.

we would seek to an older version of the same roachpb.Key

😬

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)


internal/base/comparer.go, line 102 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Do you think we rely on this currently in any place?

Not that I know of, but this comment here had me wondering.

I am not sure we can do the NextPrefix prefix-sharing optimization.

It definitely tempers the benefits, but I think it can be adapted: If the shared bytes is greater than the currentPrefixLen, we Split the new key. If the new key's prefix length equals currentPrefixLen, we skip past it.

It's unfortunate that we need to Split the key within the blockIter, but it I suspect it still helps by avoiding key comparisons in higher levels.

we would seek to an older version of the same roachpb.Key

😬

We use the \0 and SeekGE to advance to the next key in other places too, like https://github.com/cockroachdb/cockroach/blob/d7a362c25f2819077f36807519f39dbdf32dfca0/pkg/storage/mvcc.go#L3663

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)


internal/base/comparer.go, line 102 at r1 (raw file):

The SeekGE we use for NextKey does p.iter.SeekGE(append(p.keyBuf, 0, 0)). So \x00\x00 at the end of the roachpb.Key. If the WallTime started with \x00 (which I think is not the case if the system is running at the current time) we would seek to an older version of the same roachpb.Key. I hope I am mistaken in my thinking here.

I was indeed mistaken. There is no bug here, thankfully. The <key>\x00\x00 is of course subject to the same engine key comparison, which will use <key>\x00 as EngineKey.Key (with no EngineKey.Version).

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)

@jbowens jbowens merged commit 08c5d46 into cockroachdb:master Nov 30, 2021
@jbowens jbowens deleted the comparer-reqs branch November 30, 2021 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants