Skip to content

Comments

ttl: implement TTL scheduled job validation & fixup#77741

Merged
craig[bot] merged 6 commits intocockroachdb:masterfrom
otan-cockroach:validate
Mar 23, 2022
Merged

ttl: implement TTL scheduled job validation & fixup#77741
craig[bot] merged 6 commits intocockroachdb:masterfrom
otan-cockroach:validate

Conversation

@otan
Copy link
Contributor

@otan otan commented Mar 13, 2022

See individual commits for review.

Resolves #75428

Release justification: high pri addition to new functionality

@otan otan requested a review from rafiss March 13, 2022 21:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan otan force-pushed the validate branch 4 times, most recently from 43f7fe6 to 1c88bbf Compare March 14, 2022 03:46
@otan otan marked this pull request as ready for review March 17, 2022 23:10
@otan otan requested a review from a team as a code owner March 17, 2022 23:11
@otan otan requested review from a team, ajwerner and stevendanna and removed request for a team March 17, 2022 23:11
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r1, 3 of 3 files at r2, 3 of 7 files at r3, 1 of 3 files at r4, 3 of 8 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @rafiss, and @stevendanna)


pkg/sql/check.go, line 590 at r5 (raw file):

	db, err := p.Descriptors().GetImmutableDatabaseByName(
		ctx, p.Txn(), dbName, tree.DatabaseLookupFlags{Required: true},
	)

It seems to me that the entirety of changes in this file, in particular Validate and Repair methods
added to planner, could be made into standalone functions (somewhere under ttl directory).
You only appear to access p.Descriptors() and p.Txn() from planner. Why not just pass those as function
arguments?


pkg/sql/sem/builtins/builtins.go, line 6440 at r5 (raw file):

	),

	"crdb_internal.repair_ttl_table_scheduled_job": makeBuiltin(

it would be helpful if you could explain (perhaps in the pr description), under what conditions
do you expect to have invalid schedules?

A schedule could have been made undroppable, for example. And if you need to delete the schedule
because of alter statement, you don't have to use drop schedule statement -- you could just delete
the schedule.

I suppose you might worry about admin directly deleting the schedule. But that's no different
than admin deleting arbitrary things from system tables -- in general, we don't really protect against that.
And if you're still afraid, then perhaps starting a one-off task to check the sanity (ie.. for each db, validate ttl jobs) might be sufficient?

Does this worry raise to the level of adding new internal functions?

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @rafiss, and @stevendanna)


pkg/sql/check.go, line 590 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

It seems to me that the entirety of changes in this file, in particular Validate and Repair methods
added to planner, could be made into standalone functions (somewhere under ttl directory).
You only appear to access p.Descriptors() and p.Txn() from planner. Why not just pass those as function
arguments?

EvalContext does not have access to Descriptors. that's all we get in builtins, which is why it's here.

Code quote:

RepairTTLScheduledJobForTable

pkg/sql/sem/builtins/builtins.go, line 6440 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

it would be helpful if you could explain (perhaps in the pr description), under what conditions
do you expect to have invalid schedules?

A schedule could have been made undroppable, for example. And if you need to delete the schedule
because of alter statement, you don't have to use drop schedule statement -- you could just delete
the schedule.

I suppose you might worry about admin directly deleting the schedule. But that's no different
than admin deleting arbitrary things from system tables -- in general, we don't really protect against that.
And if you're still afraid, then perhaps starting a one-off task to check the sanity (ie.. for each db, validate ttl jobs) might be sufficient?

Does this worry raise to the level of adding new internal functions?

This commit is intended for "repairing" jobs that may have been broken
by a broken schema change interaction.

added

@otan otan requested a review from a team as a code owner March 21, 2022 20:38
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)


pkg/sql/check.go, line 590 at r5 (raw file):

Previously, otan (Oliver Tan) wrote…

EvalContext does not have access to Descriptors. that's all we get in builtins, which is why it's here.

Couldn't descriptors be added to eval ctx? It's just planner is such a huge interface.
Why add more to it?

@miretskiy miretskiy self-requested a review March 21, 2022 20:50
@otan
Copy link
Contributor Author

otan commented Mar 21, 2022


pkg/sql/check.go, line 590 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Couldn't descriptors be added to eval ctx? It's just planner is such a huge interface.
Why add more to it?

i think adding things to EvalContext is just as contentious because it gives more access than it needs to :)
this is how all the other builtins "hack" access to the planner, i'm not inclined to change it for just this PR.

@miretskiy
Copy link
Contributor

i think adding things to EvalContext is just as contentious because it gives more access than it needs to :)
this is how all the other builtins "hack" access to the planner, i'm not inclined to change it for just this PR.

I don't know... EvalContext already has access to EvalPlanner, that object does things like type resolution... it seems that
it could very well support descriptor resolution.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 16 of 16 files at r6, 7 of 7 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)

a discussion (no related file):
:lgtm: Left a few non-blocking nitpicks.



pkg/sql/check.go, line 652 at r1 (raw file):

		return pgerror.Newf(
			pgcode.Internal,
			"scheduled job id %d points to table id %d instead of table id %d",

[mega nit, don't change it if you don't want to] But the error messages from this function use scheduled job id, schedule id, and scheduled job all to describe the same ID.

Code quote (from -- commits):

  table points to a valid scheduled job which will action the deletion of
  expired rows.

pkg/sql/ttl/ttlschedule/ttlschedule.go, line 115 at r8 (raw file):

		return true, nil
	}
	// If there is a schedule id mismatch we can drop this table.

Suggestion:

// If there is a schedule id mismatch we can drop this schedule.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r8, 8 of 8 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, and @rafiss)

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)


pkg/sql/check.go, line 652 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

[mega nit, don't change it if you don't want to] But the error messages from this function use scheduled job id, schedule id, and scheduled job all to describe the same ID.

Done.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @otan, @rafiss, and @stevendanna)


pkg/sql/ttl/ttlschedule/ttlschedule.go, line 115 at r8 (raw file):

		return true, nil
	}
	// If there is a schedule id mismatch we can drop this table.

Done.

Release justification: high benefit addition to new functionality

Release note (sql change): Added a
`crdb_internal.validate_ttl_scheduled_jobs` builtin which verifies each
table points to a valid scheduled job which will action the deletion of
expired rows.
otan added 5 commits March 23, 2022 10:57
This commit introduces the GetTableNameByDesc and GetTableNameByID
functions, which fetches a *tree.TableName from the given table objects.

Release justification: low-risk refactor

Release note: None
Release justification: non-prod code change
Release note: None
In preparation for the next commit.

Release justification: low-risk changes for new functionality

Release note: None
Previously, `DROP SCHEDULE` for a TTL job would fail always. Now, we
succeed if the scheduled job is invalid.

Release justification: low risk high benefit change to new functionality

Release note: None
This commit is intended for "repairing" jobs that may have been broken
by a broken schema change interaction.

Release justification: high priority fix for new functionality

Release note (sql change): Adds a
`crdb_internal.repair_ttl_table_scheduled_job` builtin, which repairs
the given TTL table's scheduled job by supplanting it with a valid
schedule.
@otan
Copy link
Contributor Author

otan commented Mar 23, 2022

thanks!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 23, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 23, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 23, 2022

Build failed (retrying...):

@craig craig bot merged commit 4eca06c into cockroachdb:master Mar 23, 2022
@craig
Copy link
Contributor

craig bot commented Mar 23, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: TTL v1 meta issue

4 participants