Skip to content

changefeedccl: allow users to alter changefeed options #76583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sherman-grewal
Copy link
Contributor

@sherman-grewal sherman-grewal commented Feb 15, 2022

changefeedccl: allow users to alter changefeed options
with the ALTER CHANGEFEED statement

References #75895

Currently, with the ALTER CHANGEFEED statement users
can only add or drop targets from an existing
changefeed. In this PR, we would like to extend this
functionality so that an user can edit and unset
the options of an existing changefeed as well.
The syntax of this addition is the following:

ALTER CHANGEFEED <job_id> SET <options> UNSET <opt_list>

Note that the must follow the same syntax
that is used when creating a changefeed with options.
In particular, if you would like to set an option
that requires a value, you must write

SET opt = 'value'

On the other hand, if you would like to set an
option that requires no value, you must write

SET opt

Furthermore, this PR allows users to unset options.
This can be achieved by writing

UNSET <opt_list>

Where <opt_list> is a list of options that you
would like to unset. For example, if we would like
to unset the diff and resolved options for
changefeed 123, we would achieve this by writing

ALTER CHANGEFEED 123 UNSET diff, resolved

Release note (enterprise change): Added support to
the ALTER CHANGEFEED statement so that users can edit
and unset the options of an existing changefeed. The
syntax of this addition is the following:

ALTER CHANGEFEED <job_id> SET <options> UNSET <opt_list>

@sherman-grewal sherman-grewal requested a review from a team as a code owner February 15, 2022 15:16
@sherman-grewal sherman-grewal requested review from shermanCRL and removed request for a team February 15, 2022 15:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@shermanCRL shermanCRL requested a review from HonoreDB February 15, 2022 15:27
@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch 4 times, most recently from 3bd57a5 to 36eb164 Compare February 16, 2022 14:45
@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch from 36eb164 to 156f625 Compare February 16, 2022 15:44
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB, @sherman-grewal, and @shermanCRL)


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 88 at r2 (raw file):

		targetsChanged := false

		oldStmt, err := parser.ParseOne(job.Payload().Description)

Ooof... Arguably, I should have picked this earlier (during previous version)... But here we are.

Unfortunately creating changefeed statement node based on payload description is unsafe and
is broken. Basically -- Description contains redacted string representation of the original statement,
and thus cannot be reversed (e.g. you will loose any security related information -- keys, etc). -- changefeedJobDescription function.

What to do:

  1. Please add a test (which I expect to fail) where you create changefeed into e.g. s3, and include some of the redactable fields there (create changefeed ... into 's3://....?AWS_SECRET_ACCESS_KEY=my_secret'). Verify that
    after an alter statement you still see AWS_SECRET_ACCESS_KEY=my_secret and not AWS_SECRET_ACCESS_KEY=redacted. I expect this test to fail.
  2. You can (and probably should) fix this issue as a separate PR on top of existing alter implementation.
  3. The changefeed statement must be synthesized based on ChangefeedDetails fields. Now, you don't actually have
    to regenerate exact statement as it was typed when changefeed was created. In particular, the thing you might have problem with is the target list. You have target id, but the name available in ChangeFeedTarget field might no longer be available -- because the table could have been renamed. So, you can't just assume you can put statement_names in the changefeed statement. It would be nice if we can use tree.AliasedTableExpr in changefeed statement -- but as things are right now, that's not the case (I believe TablePattern is not the same as an alias table expr). Perhaps the changefeed statement can be changed somehow to also allow specification of targets as table expressions? If so, you could generate table expression in the form of [id AS name] (this is something that select supports... you can do select * from [30 as namespace] -- try running this from cockroach demo). Other possible hacks include resolving current names for each descriptor, re-synthesizing changefeed statement using current names, and then, "renaming" them to the original statement_time_name.

pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 123 at r2 (raw file):

		}

		if opts.Options != nil {

So, now that we've fixed how the statement is re-generated (above), I have to wonder... Would we be better off if we just take changefeed statement, and append all options there?

As far as I know, appending the same option multiple times overrides the previous occurrences.

If you refactor changefeed statement plan hook so that you have a separate function to compute the job record basically, then you could simply call into that function, and have it perform all of the validation, and return you the "new" job record which you will handle here.
The only thing is that function would probably need to take in a boolean argument indicating if it should update telemetry data (yes, from changefeed, and now from here).


pkg/sql/parser/testdata/alter_changefeed, line 99 at r2 (raw file):

ALTER CHANGEFEED (123) ADD (foo)  SET bar = ('baz'), qux = ('quux')  DROP (corge) -- fully parenthesized
ALTER CHANGEFEED _ ADD foo  SET bar = '_', qux = '_'  DROP corge -- literals removed
ALTER CHANGEFEED 123 ADD _  SET _ = 'baz', _ = 'quux'  DROP _ -- identifiers removed

❤️ these tests.


pkg/ccl/changefeedccl/show_changefeed_jobs_test.go, line 450 at r2 (raw file):

				t.Fatalf("Expected more rows when querying and none found for query: %s", query)
			}
		}

This block is repeated too many times; pull it into helper function?

Suggestion:

		if !rowResults.Next() {
			err := rowResults.Err()
			if err != nil {
				t.Fatalf("Error encountered while querying the next row: %v", err)
			} else {
				t.Fatalf("Expected more rows when querying and none found for query: %s", query)
			}
		}
		
		

@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch from 156f625 to e50cb0b Compare February 16, 2022 20:00
Copy link
Contributor Author

@sherman-grewal sherman-grewal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, and @shermanCRL)


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 88 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Ooof... Arguably, I should have picked this earlier (during previous version)... But here we are.

Unfortunately creating changefeed statement node based on payload description is unsafe and
is broken. Basically -- Description contains redacted string representation of the original statement,
and thus cannot be reversed (e.g. you will loose any security related information -- keys, etc). -- changefeedJobDescription function.

What to do:

  1. Please add a test (which I expect to fail) where you create changefeed into e.g. s3, and include some of the redactable fields there (create changefeed ... into 's3://....?AWS_SECRET_ACCESS_KEY=my_secret'). Verify that
    after an alter statement you still see AWS_SECRET_ACCESS_KEY=my_secret and not AWS_SECRET_ACCESS_KEY=redacted. I expect this test to fail.
  2. You can (and probably should) fix this issue as a separate PR on top of existing alter implementation.
  3. The changefeed statement must be synthesized based on ChangefeedDetails fields. Now, you don't actually have
    to regenerate exact statement as it was typed when changefeed was created. In particular, the thing you might have problem with is the target list. You have target id, but the name available in ChangeFeedTarget field might no longer be available -- because the table could have been renamed. So, you can't just assume you can put statement_names in the changefeed statement. It would be nice if we can use tree.AliasedTableExpr in changefeed statement -- but as things are right now, that's not the case (I believe TablePattern is not the same as an alias table expr). Perhaps the changefeed statement can be changed somehow to also allow specification of targets as table expressions? If so, you could generate table expression in the form of [id AS name] (this is something that select supports... you can do select * from [30 as namespace] -- try running this from cockroach demo). Other possible hacks include resolving current names for each descriptor, re-synthesizing changefeed statement using current names, and then, "renaming" them to the original statement_time_name.

As discussed offline, there was a misunderstanding when this comment was published. In particular, @miretskiy thought that we were using the CREATE CHANGEFEED AST node (parsed by the old job description) to update the details of the existing changefeed. However, this is not the case. The AST node is only used to help generate the new job description of the updated changefeed. As a result, the SinkURI of the ChangefeedDetails remains unchanged after the alteration (leaving the un-redacted information), but sensitive information in the SinkURI remains redacted in the job description as desired.

That being said, this comment helped us catch the fact that we use the SinkURI from the AST node to perform validations, but it would be preferred to use the actual SinkURI (from the ChangefeedDetails) to perform validations instead. Thank you @miretskiy for this catch! 🙌


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 123 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

So, now that we've fixed how the statement is re-generated (above), I have to wonder... Would we be better off if we just take changefeed statement, and append all options there?

As far as I know, appending the same option multiple times overrides the previous occurrences.

If you refactor changefeed statement plan hook so that you have a separate function to compute the job record basically, then you could simply call into that function, and have it perform all of the validation, and return you the "new" job record which you will handle here.
The only thing is that function would probably need to take in a boolean argument indicating if it should update telemetry data (yes, from changefeed, and now from here).

See the comment above.


pkg/sql/parser/testdata/alter_changefeed, line 99 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

❤️ these tests.

:)


pkg/ccl/changefeedccl/show_changefeed_jobs_test.go, line 450 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

This block is repeated too many times; pull it into helper function?

Awesome suggestion, the code looks a lot more clean now. Thanks!! 💯

@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch 2 times, most recently from 1271e04 to 2e2ae92 Compare February 16, 2022 21:20
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r1, 1 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, @sherman-grewal, and @shermanCRL)


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 123 at r3 (raw file):

		}

		if opts.Options != nil {

Still, I would love to see if there is a way to avoid having to do this options parsing by hand so to speak;
Consider that


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 142 at r3 (raw file):

					if _, ok := changefeedbase.OptionsWithNoValue[key]; ok {
						switch v := changefeedbase.AlterNoValueOptionType(value); v {
						case changefeedbase.OptUnsetValue:

So, is the expectation is that the user will specify foo=false if they want to unset something?
This probably needs to be comment in on (in the changefeed.Base); And probably in the PR description.


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 193 at r3 (raw file):

			if _, shouldProtect := details.Opts[changefeedbase.OptProtectDataFromGCOnPause]; shouldProtect && !p.ExecCfg().Codec.ForSystemTenant() {
				return errorutil.UnsupportedWithMultiTenancy(67271)
			}

I don't know; something about this code where we are re-doing work already done in create changefeed statement just doesn't sit right with me. I do think that re-generating changefeed statement, and letting all of the validation checks happen in 1 place is a way to go. Is there any way we can get to that point?

I honestly think the answer is ... yes. CreateChangefeed is a simple(ish) statement:

type CreateChangefeed struct {
	Targets TargetList
	SinkURI Expr
	Options KVOptions
}

The source of truth during alter changefeed is the ChangeFeedDetails:

message ChangefeedDetails {
  map<uint32, ChangefeedTarget> targets
  string sink_uri = 3 [(gogoproto.customname) = "SinkURI"];
  map<string, string> opts = 4;
}

I believe each field in changefeed details can be converted to tree.CreateChangefeed format.
E.g. if you need to convert a stree (sink_uri) to Expr -- that's simple, tree.NewDString("....") is an expression
that evaluates to string value. Similar for KVOptions. Just add the new options to the list.
TargetList -- that's a bit more involved. Basically, you don't know what the table name is right now.
But what you do know is the descritor ID. So, you could resolve those descriptors to be fully qualified names right now. Now, having created the tree.CreateChangefeed statement you should be able to call into basically the
same code as used in changefeed_stmt. Your goal is to get job.Record out of the function, w/out actually writing new job record. The new job record will have correctly redacted information, etc. The only thing you need to do after that is go over updated changefeedDetails.targets , and replace the names with the original names.

This may sound like a lot of work, but I think it's actually not. YOu're using all of the existing validation function, you're keeping the code in 1 place; and I just think it'll make more sense.

Let me know what you think and if you agree or not.


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 291 at r3 (raw file):

		jobDescription := tree.AsString(createChangefeedStmt)

		newPayload.Description = jobDescription

If I understand correctly, we don't need to call changefeedJobDescription because createChagefeedStmt was generated previous description, which should have had URI redacted.
At the very least, this deserves a comment; Better yet, I think I would prefer the entirety of taking Description, and parsing into the changefeed statement to be moved into a helper function (with comments). something like getUpdatedChangefeedDescription(....).

@miretskiy miretskiy self-requested a review February 16, 2022 22:09
Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @sherman-grewal, and @shermanCRL)


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 193 at r3 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I don't know; something about this code where we are re-doing work already done in create changefeed statement just doesn't sit right with me. I do think that re-generating changefeed statement, and letting all of the validation checks happen in 1 place is a way to go. Is there any way we can get to that point?

I honestly think the answer is ... yes. CreateChangefeed is a simple(ish) statement:

type CreateChangefeed struct {
	Targets TargetList
	SinkURI Expr
	Options KVOptions
}

The source of truth during alter changefeed is the ChangeFeedDetails:

message ChangefeedDetails {
  map<uint32, ChangefeedTarget> targets
  string sink_uri = 3 [(gogoproto.customname) = "SinkURI"];
  map<string, string> opts = 4;
}

I believe each field in changefeed details can be converted to tree.CreateChangefeed format.
E.g. if you need to convert a stree (sink_uri) to Expr -- that's simple, tree.NewDString("....") is an expression
that evaluates to string value. Similar for KVOptions. Just add the new options to the list.
TargetList -- that's a bit more involved. Basically, you don't know what the table name is right now.
But what you do know is the descritor ID. So, you could resolve those descriptors to be fully qualified names right now. Now, having created the tree.CreateChangefeed statement you should be able to call into basically the
same code as used in changefeed_stmt. Your goal is to get job.Record out of the function, w/out actually writing new job record. The new job record will have correctly redacted information, etc. The only thing you need to do after that is go over updated changefeedDetails.targets , and replace the names with the original names.

This may sound like a lot of work, but I think it's actually not. YOu're using all of the existing validation function, you're keeping the code in 1 place; and I just think it'll make more sense.

Let me know what you think and if you agree or not.

Agree, syncing validations in multiple places seems unnecessarily painful. And hard--I think this is missing at least one, validating that you're providing a schema registry while setting the format to avro.

@miretskiy miretskiy self-requested a review February 23, 2022 00:57
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 8 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @sherman-grewal, and @shermanCRL)


-- commits, line 16 at r6:
I think this needs to be updated to also mention UNSET


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 176 at r6 (raw file):

					}
					if _, ok := changefeedbase.AlterChangefeedUnsupportedOptions[key]; ok {
						return errors.Errorf(`cannot alter option %s`, key)

you could probably wrap these errors w/ e.g.

return pgerror.Newf(pgcode.InvalidParameterValue, ..._ 

(see examples in changefeeed_stmt )


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 240 at r6 (raw file):

		)
		if err != nil {
			return err

I think we should wrap this error to make it clear that the failure is due to "alter" ....
errors.Wrap(err, "alter changefeed failed") or some such.


pkg/ccl/changefeedccl/changefeedbase/options.go, line 246 at r6 (raw file):

// AlterChangefeedUnsupportedOptions are changefeed options that we do not allow
// users to alter
var AlterChangefeedUnsupportedOptions = makeStringSet(OptCursor, OptInitialScan, OptNoInitialScan)

Cursor?


pkg/ccl/changefeedccl/alter_changefeed_test.go, line 126 at r6 (raw file):

		waitForJobStatus(sqlDB, t, feed.JobID(), `paused`)

		sqlDB.Exec(t, fmt.Sprintf(`ALTER CHANGEFEED %d SET diff`, feed.JobID()))

probably need to add unset test too.


pkg/ccl/changefeedccl/alter_changefeed_test.go, line 252 at r6 (raw file):

	details, ok := job.Details().(jobspb.ChangefeedDetails)
	require.True(t, ok)

please add a comment on what it is we are testing (i.e. that alter changefeed does not accidently redacts secret keys)

Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. I have few minor nits. Perhaps @HonoreDB might have more comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @sherman-grewal, and @shermanCRL)

@miretskiy miretskiy self-requested a review February 23, 2022 00:58
Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r1, 6 of 8 files at r5, 2 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @sherman-grewal, and @shermanCRL)

@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch from eb508c3 to bc8be04 Compare February 24, 2022 15:25
Copy link
Contributor Author

@sherman-grewal sherman-grewal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, @sherman-grewal, and @shermanCRL)


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 176 at r6 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

you could probably wrap these errors w/ e.g.

return pgerror.Newf(pgcode.InvalidParameterValue, ..._ 

(see examples in changefeeed_stmt )

Done.


pkg/ccl/changefeedccl/alter_changefeed_stmt.go, line 240 at r6 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I think we should wrap this error to make it clear that the failure is due to "alter" ....
errors.Wrap(err, "alter changefeed failed") or some such.

Done.


pkg/ccl/changefeedccl/changefeedbase/options.go, line 246 at r6 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Cursor?

Yeah, it's in the deny list so it will not be allowed to be altered


pkg/ccl/changefeedccl/alter_changefeed_test.go, line 126 at r6 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

probably need to add unset test too.

Done.


pkg/ccl/changefeedccl/alter_changefeed_test.go, line 252 at r6 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

please add a comment on what it is we are testing (i.e. that alter changefeed does not accidently redacts secret keys)

Done.

@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch 3 times, most recently from d44a73a to ab16ed4 Compare February 24, 2022 18:58
with the ALTER CHANGEFEED statement

Currently, with the ALTER CHANGEFEED statement users
can only add or drop targets from an existing
changefeed. In this PR, we would like to extend this
functionality so that an user can edit and unset
the options of an existing changefeed as well.
The syntax of this addition is the following:

ALTER CHANGEFEED <job_id> SET <options> UNSET <opt_list>

Note that the <options> must follow the same syntax
that is used when creating a changefeed with options.
In particular, if you would like to set an option
that requires a value, you must write

SET opt = 'value'

On the other hand, if you would like to set an
option that requires no value, you must write

SET opt

Furthermore, this PR allows users to unset options.
This can be achieved by writing

UNSET <opt_list>

Where <opt_list> is a list of options that you
would like to unset. For example, if we would like
to unset the diff and resolved options for
changefeed 123, we would achieve this by writing

ALTER CHANGEFEED 123 UNSET diff, resolved

Release note (enterprise change): Added support to
the ALTER CHANGEFEED statement so that users can edit
and unset the options of an existing changefeed. The
syntax of this addition is the following:

ALTER CHANGEFEED <job_id> SET <options> UNSET <opt_list>
@sherman-grewal sherman-grewal force-pushed the cdc/alter-changefeed-set-options branch from ab16ed4 to 75dc9fe Compare February 24, 2022 19:52
@sherman-grewal
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 25, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants