Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest: virtual iterator bounds checking virtual sstable metadata #2698

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

raggar
Copy link
Contributor

@raggar raggar commented Jun 29, 2023

Added a check inside of FileMetadata.ValidateVirtual() to check if the keys returned from a virtual sstable are within the bounds of the Smallest and Largest metadata keys.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from 8ef8748 to aa51c74 Compare June 29, 2023 18:35
@raggar raggar changed the title manifest: add virtual iterator bounds checking to `FileMetadata.Valid… manifest: virtual iterator bounds checking virtual sstable metadata Jun 29, 2023
v2.ValidateVirtual(parentFile)
internalIterator, _, err := d.newIters(context.TODO(), v1, nil, internalIterOpts{})
require.NoError(t, err)
v1.ValidateVirtual(parentFile, internalIterator, d.cmp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be best to pass d.newIters into ValidateVirtual, so we only incur the cost of creating the iterators if invariants are enabled.

Copy link
Contributor Author

@raggar raggar Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was trying that as well but there is some issues because some of the arguments such as internalIterOpts is not exported (and used in a lot of places), also packages such as keyspan and pebble cause import issues if used in the manifest package (so im not sure how to type the arguments required). Also, I think some places such as sstable/reader_test.go:317:26 do not have a db object, not sure if I should try creating one or is there an alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would solve that by passing in an adapter version of newIters to ValidateVirtual that doesn't take the internalIterOpts field, instead passing in the zero value for it. And that test can probably just pass a nil value for newIters and ValidateVirtual can make a check for whether that function is non-nil before calling it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I tried doing that, but the issue I was facing was how to type the return value of the new function, specifically keyspan.FragmentIterator since I don't think I can import the keyspan module.

@@ -402,6 +404,14 @@ func (m *FileMetadata) ValidateVirtual(createdFrom *FileMetadata) {
if m.Size == 0 {
panic("pebble: virtual sstable size must be set upon creation")
}

if invariants.Enabled {
for key, _ := it.First(); key != nil; key, _ = it.Next() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check we'd wanna do is that there's a point key at exactly the same user key as the lower bound, and another at exactly the upper bound, or a straddling range key/del at those bounds. That would confirm the bounds were generated as tight and there isn't a gap.

This check is really only mimicking the check that we're doing in the iterator stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense yeah i'll check that as well

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 4 times, most recently from 4113e50 to 84eb77a Compare July 10, 2023 13:59
@raggar raggar marked this pull request as ready for review July 10, 2023 13:59
@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 2 times, most recently from 312d59f to 2244422 Compare July 10, 2023 16:07
@raggar raggar requested review from itsbilal and a team July 10, 2023 16:07
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good! Mind throwing in ValidateVirtual() + checkVirtualBounds calls to both leftFile and rightFile in d.excise, as well as at the bottom of ingestLoad1Shared ?

db.go Outdated

it, _, err := d.newIters(context.TODO(), m, opts, iterOpts)
if err != nil {
panic("pebble: error retrieving virtual iterator")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as it'd surface the error:

panic(fmt.Sprintf("pebble: error creating iterator: %s", err.Error()))

db.go Outdated
panic("pebble: error retrieving virtual iterator")
}

// check that virtual sstable bounds are tight from both ends.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/check/Check/

db.go Outdated
// check that virtual sstable bounds are tight from both ends.
first, _ := it.First()
if d.cmp(first.UserKey, m.Smallest.UserKey) != 0 {
panic("pebble: virtual sstable lower bound is not tight")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also make sense to print out the two keys in question here, as well as the file number

panic(fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight: %s != %s", m.FileNum, m.Smallest.UserKey, first.UserKey))

db.go Outdated
}
last, _ := it.Last()
if d.cmp(last.UserKey, m.Largest.UserKey) != 0 {
panic("pebble: virtual sstable upper bound is not tight")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

db.go Outdated
@@ -2489,3 +2489,31 @@ func (d *DB) SetCreatorID(creatorID uint64) error {
func (d *DB) ObjProvider() objstorage.Provider {
return d.objProvider
}

func (d *DB) checkVirtualBounds(m *fileMetadata, opts *IterOptions, iterOpts internalIterOpts) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts and iterOpts seem unused by all callers

Copy link
Contributor Author

@raggar raggar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup! for the change in ingestLoad1Shared is there a way to get a *FileMetadataType from SharedSSTMetadata that can be used for ValidateVirtual?

Reviewable status: 0 of 4 files reviewed, 7 unresolved discussions (waiting on @itsbilal)


db.go line 2493 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

opts and iterOpts seem unused by all callers

yeah, is it every worth keeping parameters like these to have this function be more versatile for people to use in the future if they need this function used in other places (or do we not care about that)?

Copy link
Contributor Author

@raggar raggar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 6 unresolved discussions (waiting on @itsbilal)


db.go line 2500 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This would be better as it'd surface the error:

panic(fmt.Sprintf("pebble: error creating iterator: %s", err.Error()))

done


db.go line 2503 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

nit: s/check/Check/

done


db.go line 2506 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

would also make sense to print out the two keys in question here, as well as the file number

panic(fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight: %s != %s", m.FileNum, m.Smallest.UserKey, first.UserKey))

done


db.go line 2510 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Same here

done

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from 2244422 to 7f872b3 Compare July 13, 2023 13:44
@raggar raggar requested a review from itsbilal July 13, 2023 18:17
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingestLoad1Shared produces a *fileMetadata, you should be able to verify that, right?

Though one complication is that you won't be able to run checkVirtualBounds until after ingestLink has run, as only after that would you be able to create an iterator on the FileMetadata. If you can navigate the ingestion code to run checkVirtualBounds on only the shared files and after ingestLink, that'd be good.

ingest.go Outdated
@@ -1409,6 +1409,9 @@ func (d *DB) excise(
SmallestSeqNum: m.SmallestSeqNum,
LargestSeqNum: m.LargestSeqNum,
}
leftFile.ValidateVirtual(m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be down where we add leftFile to ve.NewFiles, as most of its arguments are not set yet.

ingest.go Outdated
@@ -1511,6 +1514,8 @@ func (d *DB) excise(
SmallestSeqNum: m.SmallestSeqNum,
LargestSeqNum: m.LargestSeqNum,
}
rightFile.ValidateVirtual(m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this will be at the bottom

@raggar raggar requested a review from itsbilal July 18, 2023 17:59
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the old comments still stand?

db.go Outdated
// Check that virtual sstable bounds are tight from both ends.
first, _ := it.First()
if d.cmp(first.UserKey, m.Smallest.UserKey) != 0 {
panic(fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight: %s != %s", m.FileNum, m.Smallest.UserKey, first.UserKey))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use d.opts.Comparer.FormatKey(m.smallest.UserKey) for better-formatted outputs instead of just printing the slice here. And same with first.UserKey and the two keys below.

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from 7f872b3 to 50e84fe Compare July 18, 2023 18:27
Copy link
Contributor Author

@raggar raggar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mb, I forgot to push those

Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @itsbilal)


ingest.go line 1412 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This will be down where we add leftFile to ve.NewFiles, as most of its arguments are not set yet.

done


ingest.go line 1517 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Same here, this will be at the bottom

done

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from 50e84fe to 1f93c44 Compare July 18, 2023 18:35
@raggar raggar requested a review from itsbilal July 18, 2023 18:36
Copy link
Contributor Author

@raggar raggar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @itsbilal)


db.go line 2506 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

you can use d.opts.Comparer.FormatKey(m.smallest.UserKey) for better-formatted outputs instead of just printing the slice here. And same with first.UserKey and the two keys below.

done

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 2 times, most recently from 355e58c to 3df8aab Compare July 18, 2023 20:50
@raggar raggar requested a review from a team July 18, 2023 20:54
@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from 3df8aab to 4959a91 Compare July 19, 2023 13:41
@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 5 times, most recently from 00c9d8e to f8f219a Compare July 21, 2023 18:23
@raggar raggar requested a review from a team July 24, 2023 20:16
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

Reviewed 3 of 4 files at r6, all commit messages.
Reviewable status: 3 of 5 files reviewed, 8 unresolved discussions (waiting on @jbowens, @RaduBerinde, and @RahulAggarwal1016)


db.go line 2493 at r2 (raw file):

Previously, RahulAggarwal1016 (Rahul Aggarwal) wrote…

yeah, is it every worth keeping parameters like these to have this function be more versatile for people to use in the future if they need this function used in other places (or do we not care about that)?

We usually remove it. Easy to add in the future if the need is ever there. I'd just remove it.


db.go line 2512 at r6 (raw file):

	}

	pointIter, rangeDelIter, err := d.newIters(context.TODO(), m, iterOpts, internalIterOpts{})

I'd throw all the code below this behind if m.HasPointKeys { ... }


db.go line 2536 at r6 (raw file):

	if (rangeDel == nil || d.cmp(rangeDel.SmallestKey().UserKey, m.SmallestPointKey.UserKey) != 0) &&
		(pointKey == nil || d.cmp(pointKey.UserKey, m.SmallestPointKey.UserKey) != 0) {
		panic(errors.Wrap(err, fmt.Sprintf("pebble: virtual sstable %s lower bound is not tight.", m.FileNum)))

nit: we usually don't put periods at the end of error messages


db.go line 2546 at r6 (raw file):

	// Check that the upper bound is tight.
	if (rangeDel == nil || d.cmp(rangeDel.LargestKey().UserKey, m.LargestPointKey.UserKey) != 0) &&

We should also do these checks for range keys. It's easier there as you only have one iterator and one set of bounds. That one would be inside an if m.HasRangeKeys {... }


ingest.go line 1606 at r6 (raw file):

		}
		rightFile.ValidateVirtual(m)
		d.checkVirtualBounds(rightFile, nil)

We usually denote zero arguments with a small comment naming them:

d.checkVirtualBounds(rightFile, nil /* iterOptions */)

But seeing as you'll be removing that argument anyway...

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 2 times, most recently from 9fcdc3d to 657436e Compare July 27, 2023 19:55
Copy link
Contributor Author

@raggar raggar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review

Reviewable status: 0 of 5 files reviewed, 8 unresolved discussions (waiting on @itsbilal, @jbowens, and @RaduBerinde)


db.go line 2493 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

We usually remove it. Easy to add in the future if the need is ever there. I'd just remove it.

oh, I realized I actually need it for the sharedFileMetadata (to pass the level)


db.go line 2500 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'd panic with the error value itself to preserve the original error's stack:

panic(errors.Wrap(err, "creating iterator"))

fixed!


db.go line 2512 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: s/rangeIter/rangeDelIter/

Since we have two disparate kinds of ranged keys (range deletions and range keys), we should disambiguate.

fixed!


db.go line 2523 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

The same also applies to range deletions. The smallest key in the file may be a range deletion, which we're not considering here.

fixed!


db.go line 2512 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I'd throw all the code below this behind if m.HasPointKeys { ... }

added!


db.go line 2536 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

nit: we usually don't put periods at the end of error messages

oh yeah forgot about that, fixed!


db.go line 2546 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

We should also do these checks for range keys. It's easier there as you only have one iterator and one set of bounds. That one would be inside an if m.HasRangeKeys {... }

added!


ingest.go line 1606 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

We usually denote zero arguments with a small comment naming them:

d.checkVirtualBounds(rightFile, nil /* iterOptions */)

But seeing as you'll be removing that argument anyway...

oh yeah, forgot about that too (I realized I actually need the second argument)

@raggar raggar requested a review from itsbilal July 27, 2023 20:01
@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 3 times, most recently from 2248e19 to 8d3a4a4 Compare August 1, 2023 14:42
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM: once comments are addressed and CI is green

I think the test failures will be fixed once you thread through the level to the newIter calls?

Reviewed 4 of 4 files at r7, all commit messages.
Reviewable status: 4 of 5 files reviewed, 9 unresolved discussions (waiting on @jbowens, @RaduBerinde, and @RahulAggarwal1016)


db.go line 2524 at r7 (raw file):

		pointKey, _ := pointIter.First()
		var rangeDel *rangekey.Span

this should be *keyspan.Span, which I know is the same type but we only use the exported rangekey version to refer to range keys in practice.


db.go line 2550 at r7 (raw file):

		for key, _ := pointIter.First(); key != nil; key, _ = pointIter.Next() {
			if d.cmp(key.UserKey, m.SmallestPointKey.UserKey) < 0 || d.cmp(key.UserKey, m.LargestPointKey.UserKey) > 0 {
				panic(errors.Wrap(err, fmt.Sprintf("pebble: virtual sstable %s point key %s is not within bounds", m.FileNum, key.UserKey)))

this can be errors.New, not errors.Wrap as err is likely nil. And same for all the cases below except for the panic where err != nil.


db.go line 2567 at r7 (raw file):

	}

	if m.HasRangeKeys {

nit: you can flatten the code a little bit by doing:

if !m.HasRangeKeys {
  return
}
... all the other stuff

@raggar raggar force-pushed the rahul_virtual_sstable_validation branch 6 times, most recently from 266a3eb to 7514a3c Compare August 1, 2023 21:11
itsbilal added a commit to itsbilal/pebble that referenced this pull request Aug 1, 2023
Tests in cockroachdb#2698 combined with `TestIngestShared` caught that we
were not returning range keys in the order expected by other internal
iterators when reading from a foreign sstable. This change
re-sorts range keys in trailer descending order once they've been
coalesced in ForeignSSTTransformer.
@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from 7514a3c to db7f5bd Compare August 1, 2023 22:45
itsbilal added a commit to itsbilal/pebble that referenced this pull request Aug 1, 2023
Tests in cockroachdb#2698 combined with `TestIngestShared` caught that we
were not returning range keys in the order expected by other internal
iterators when reading from a foreign sstable. This change removes
the ForeignSSTTransformer as it's no longer necessary given
all range keys are emitted at the ingestion sequence number.
itsbilal added a commit that referenced this pull request Aug 2, 2023
Tests in #2698 combined with `TestIngestShared` caught that we
were not returning range keys in the order expected by other internal
iterators when reading from a foreign sstable. This change removes
the ForeignSSTTransformer as it's no longer necessary given
all range keys are emitted at the ingestion sequence number.
Added a check inside of `FileMetadata.ValidateVirtual()` to ensure that
the keys returned from a virtual sstable are within the bounds of the
Smallest and Largest metadata keys.

Informs: cockroachdb#2593
@raggar raggar force-pushed the rahul_virtual_sstable_validation branch from db7f5bd to 2ff121d Compare August 3, 2023 13:32
@raggar raggar merged commit fe0ea04 into cockroachdb:master Aug 3, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants