ttl: implement TTL scheduled job validation & fixup#77741
ttl: implement TTL scheduled job validation & fixup#77741craig[bot] merged 6 commits intocockroachdb:masterfrom
Conversation
43f7fe6 to
1c88bbf
Compare
miretskiy
left a comment
There was a problem hiding this comment.
Reviewed 1 of 9 files at r1, 3 of 3 files at r2, 3 of 7 files at r3, 1 of 3 files at r4, 3 of 8 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @rafiss, and @stevendanna)
pkg/sql/check.go, line 590 at r5 (raw file):
db, err := p.Descriptors().GetImmutableDatabaseByName( ctx, p.Txn(), dbName, tree.DatabaseLookupFlags{Required: true}, )
It seems to me that the entirety of changes in this file, in particular Validate and Repair methods
added to planner, could be made into standalone functions (somewhere under ttl directory).
You only appear to access p.Descriptors() and p.Txn() from planner. Why not just pass those as function
arguments?
pkg/sql/sem/builtins/builtins.go, line 6440 at r5 (raw file):
), "crdb_internal.repair_ttl_table_scheduled_job": makeBuiltin(
it would be helpful if you could explain (perhaps in the pr description), under what conditions
do you expect to have invalid schedules?
A schedule could have been made undroppable, for example. And if you need to delete the schedule
because of alter statement, you don't have to use drop schedule statement -- you could just delete
the schedule.
I suppose you might worry about admin directly deleting the schedule. But that's no different
than admin deleting arbitrary things from system tables -- in general, we don't really protect against that.
And if you're still afraid, then perhaps starting a one-off task to check the sanity (ie.. for each db, validate ttl jobs) might be sufficient?
Does this worry raise to the level of adding new internal functions?
otan
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @rafiss, and @stevendanna)
pkg/sql/check.go, line 590 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
It seems to me that the entirety of changes in this file, in particular Validate and Repair methods
added to planner, could be made into standalone functions (somewhere under ttl directory).
You only appear to accessp.Descriptors()andp.Txn()from planner. Why not just pass those as function
arguments?
EvalContext does not have access to Descriptors. that's all we get in builtins, which is why it's here.
Code quote:
RepairTTLScheduledJobForTablepkg/sql/sem/builtins/builtins.go, line 6440 at r5 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
it would be helpful if you could explain (perhaps in the pr description), under what conditions
do you expect to have invalid schedules?A schedule could have been made undroppable, for example. And if you need to delete the schedule
because of alter statement, you don't have to usedrop schedulestatement -- you could just delete
the schedule.I suppose you might worry about admin directly deleting the schedule. But that's no different
than admin deleting arbitrary things from system tables -- in general, we don't really protect against that.
And if you're still afraid, then perhaps starting a one-off task to check the sanity (ie.. for each db, validate ttl jobs) might be sufficient?Does this worry raise to the level of adding new internal functions?
This commit is intended for "repairing" jobs that may have been broken
by a broken schema change interaction.
added
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)
pkg/sql/check.go, line 590 at r5 (raw file):
Previously, otan (Oliver Tan) wrote…
EvalContextdoes not have access toDescriptors. that's all we get in builtins, which is why it's here.
Couldn't descriptors be added to eval ctx? It's just planner is such a huge interface.
Why add more to it?
|
pkg/sql/check.go, line 590 at r5 (raw file): Previously, miretskiy (Yevgeniy Miretskiy) wrote…
i think adding things to EvalContext is just as contentious because it gives more access than it needs to :) |
I don't know... EvalContext already has access to EvalPlanner, that object does things like type resolution... it seems that |
stevendanna
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 16 of 16 files at r6, 7 of 7 files at r7.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)
a discussion (no related file):
Left a few non-blocking nitpicks.
pkg/sql/check.go, line 652 at r1 (raw file):
return pgerror.Newf( pgcode.Internal, "scheduled job id %d points to table id %d instead of table id %d",
[mega nit, don't change it if you don't want to] But the error messages from this function use scheduled job id, schedule id, and scheduled job all to describe the same ID.
Code quote (from -- commits):
table points to a valid scheduled job which will action the deletion of
expired rows.
pkg/sql/ttl/ttlschedule/ttlschedule.go, line 115 at r8 (raw file):
return true, nil } // If there is a schedule id mismatch we can drop this table.
Suggestion:
// If there is a schedule id mismatch we can drop this schedule.
stevendanna
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r8, 8 of 8 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, and @rafiss)
otan
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)
pkg/sql/check.go, line 652 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
[mega nit, don't change it if you don't want to] But the error messages from this function use
scheduled job id,schedule id, andscheduled joball to describe the same ID.
Done.
otan
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)
pkg/sql/ttl/ttlschedule/ttlschedule.go, line 115 at r8 (raw file):
return true, nil } // If there is a schedule id mismatch we can drop this table.
Done.
Release justification: high benefit addition to new functionality Release note (sql change): Added a `crdb_internal.validate_ttl_scheduled_jobs` builtin which verifies each table points to a valid scheduled job which will action the deletion of expired rows.
This commit introduces the GetTableNameByDesc and GetTableNameByID functions, which fetches a *tree.TableName from the given table objects. Release justification: low-risk refactor Release note: None
Release justification: non-prod code change Release note: None
In preparation for the next commit. Release justification: low-risk changes for new functionality Release note: None
Previously, `DROP SCHEDULE` for a TTL job would fail always. Now, we succeed if the scheduled job is invalid. Release justification: low risk high benefit change to new functionality Release note: None
This commit is intended for "repairing" jobs that may have been broken by a broken schema change interaction. Release justification: high priority fix for new functionality Release note (sql change): Adds a `crdb_internal.repair_ttl_table_scheduled_job` builtin, which repairs the given TTL table's scheduled job by supplanting it with a valid schedule.
|
thanks! bors r+ |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
See individual commits for review.
Resolves #75428
Release justification: high pri addition to new functionality