-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: scope backfills to only the tables affected by a schema change #55135
Conversation
The test failures here look like flakes, although I'll look into it, so I think this is actually ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @pbardea)
pkg/ccl/changefeedccl/changefeed_processors.go, line 415 at r2 (raw file):
} func (scb *schemaChangeBoundary) SetForSpan(span roachpb.Span, boundary hlc.Timestamp) {
Do these methods need to be exported?
Ah, this was updated in the following commit - let's move this change to the first commit?
pkg/ccl/changefeedccl/changefeed_processors.go, line 413 at r3 (raw file):
type schemaChangeBoundary struct { timestampsBySpan map[string]hlc.Timestamp
I think a more descriptive name here would help. Would something like resolvedTimestampBySpan
work here?
Also I think that a brief comment about why we want to keep track of the resolved timestamp on a per-span basis would be useful here.
pkg/ccl/changefeedccl/changefeed_test.go, line 846 at r1 (raw file):
// Test schema changes that require a backfill on only some watched tables within a changefeed. func TestChangefeedSchemaChangeBackfillScope(t *testing.T) {
We usually aim for each commit to have all tests passing (this makes it easier to do things like bisect when searching for a regression). Perhaps we could either move this to the last commit, or merge it in with the commit that allows this test to pass?
pkg/ccl/changefeedccl/kvfeed/kv_feed.go, line 226 at r3 (raw file):
spansToBackfill = f.spans } else if len(events) > 0 { // Only backfill for the tables
Nit: i think that this comment needs re-wrapping (we try to wrap to 80 chars) since it was edited.
@ajwerner I also added you since you're much more familiar with this area -- a look would be greatly appreciated! |
084b343
to
bc5d433
Compare
Comments addressed, unrelated test failures have mysteriously vanished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review. The kvfeed stuff looks good. The scan boundary refactor I have more concerns about. I'd prefer if you separated that into a separate PR so we can get the primary fix here merged and have a more focused discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @HonoreDB, and @pbardea)
pkg/ccl/changefeedccl/changefeed_processors.go, line 412 at r4 (raw file):
) type schemaChangeBoundary struct {
I'd love to understand the motivation behind this change? A reason to have more fine-grained tracking of these boundaries would be if we could avoid tearing down rangefeeds for tables not involved in backfills. That feels like a heavier lift than you are taking on here. I feel like the previous representation of a single boundary as the earliest boundary at least revealed in a pretty opaque way that when we have a multi-table kvfeed, we stop the processing of rangefeed events and checkpoints at the earliest of any boundaries of any tables.
I think that it's worth communicating a bit about why we have a scanBoundary concept all the way up in changefeed_processors to begin with. This was added in the 20.1 release cycle as a way to make the shutdown on schema change work properly such that we ensure that this higher level is able to checkpoint its progress before failing. We want to ensure that the kvfeed stops but that the higher level processing continues to gracefully record its progress before shutting down. See
var scErr schemaChangeDetectedError |
pkg/ccl/changefeedccl/changefeed_processors.go, line 708 at r4 (raw file):
// of which tables changed by decoding the span keys in the schemaChangeBoundary. cf.MoveToDraining(pgerror.Newf(pgcode.SchemaChangeOccurred, "schema change occurred at %v", cf.schemaChangeBoundary.arbitraryBoundary().Next().AsOfSystemTime()))
This should return the earliest boundary as that is the boundary leading to this error. If anything, this is the reason for re-abstracting the boundary, perhaps
pkg/ccl/changefeedccl/changefeed_processors.go, line 773 at r4 (raw file):
boundaryForThisSpan := cf.schemaChangeBoundary.forSpan(resolved.Span) if resolved.BoundaryReached && (boundaryForThisSpan.IsEmpty() || resolved.Timestamp.Less(boundaryForThisSpan)) { cf.schemaChangeBoundary.setForSpan(resolved.Span, resolved.Timestamp)
unfortunately, I don't think this is sound given the logic of what produces these events. The idea with the boundary reached is that we're indicating that all spans underneath the kvfeed have reached the span boundary of one of the tables. It could be that the final checkpoint we propagate here corresponds to a different table than the one to which the scan boundary was due.
pkg/ccl/changefeedccl/changefeed_processors.go, line 777 at r4 (raw file):
// If we've moved past a schemaChangeBoundary, make sure to clear it. if !resolved.BoundaryReached && !boundaryForThisSpan.IsEmpty() && boundaryForThisSpan.Less(resolved.Timestamp) { cf.schemaChangeBoundary.clearForSpan(resolved.Span)
Here I think we'd need to clear all of the scan boundaries. In general I'm just not sure that this refactor properly captures what's going on.
…a change Previously, if a changefeed was defined on multiple targets, backfills (if configured) would be run for every target after a schema change on just some. This is unnecessary work. This patch has the kv_feed identify affected spans after a schema change, and only kick off backfills on those spans. Release note (bug fix): Changefeeds defined on multiple tables will now only backfill affected tables after a schema change.
bc5d433
to
a34edd7
Compare
Thanks @ajwerner! I've pulled the refactor out of this PR; will likely try it again once my understanding improves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB and @pbardea)
bors r=ajwerner |
Build succeeded: |
changefeedccl: scope backfills to only the tables affected by a schema change
Release note (bug fix):
Previously, if a changefeed was defined on multiple targets, backfills (if configured) would be run for every target after a schema change on just some.
This is unnecessary work.
This patch has the kv_feed identify affected spans after a schema change, and only kick off backfills on those spans.
It also starts tracking the distinction in the changeFrontier in order to enable future enhancements (I don't need those changes yet so I can drop them if they're problematic).