Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl,sql: support importing non-public schemas from pgdump #57183

Merged
merged 1 commit into from
Feb 25, 2021

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Nov 26, 2020

Previously, a PGDUMP import did not support any non-public schema
statements. Now that CRDB has user defined schemas, bundle format
IMPORTs need to be taught how to parse, create and cleanup schema
related PGDUMP operations.

Note this PR only adds support for CREATE SCHEMA and usage of the
schema in CREATE TABLE/SEQUENCE PGDUMP statements. ALTER SCHEMA
statements are ignored and support might be added in a follow up.

Release justification: low risk, high reward. Import PGDUMP does not support user
defined schemas.

Release note (sql change): IMPORT PGDUMP can now import dump files with
non-public schemas.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru
Copy link
Contributor Author

I'm still working on adding some more tests and maybe cleaning up some comments, but I wanted to start getting some review passes.

cc: @mokaixu

@adityamaru adityamaru changed the title [WIP] importccl,sql: support importing non-public schemas from pgdump importccl,sql: support importing non-public schemas from pgdump Dec 8, 2020
@adityamaru
Copy link
Contributor Author

This should be RFAL! I also tested this using one of the PGDUMP files which have been provided to us from a customer demo after commenting out unsupported PGDUMP statements.

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach generally looks good to me. I'm sending out a few comments to start, but I want to do another pass.

Reviewed 3 of 17 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @miretskiy, and @pbardea)


pkg/ccl/importccl/import_stmt.go, line 1223 at r1 (raw file):

	txn *kv.Txn,
) error {
	if descriptor.GetID() == 0 {

Can we use sqlbase.InvalidID here?


pkg/ccl/importccl/import_stmt.go, line 1227 at r1 (raw file):

	}
	if descriptor.GetID() != id {
		log.Fatalf(ctx, "%v", errors.AssertionFailedf("cannot create descriptor with an unexpected (%v) ID: %v", id, descriptor))

nit: this error messages is a bit confusing to me. Perhaps something like cannot create descriptor with an expected ID %v; expected %v?


pkg/ccl/importccl/read_import_pgdump.go, line 252 at r1 (raw file):

) ([]*schemadesc.Mutable, error) {
	var dbDesc *dbdesc.Immutable
	var err error

Do we need this err declaration?


pkg/ccl/importccl/read_import_pgdump.go, line 387 at r1 (raw file):

var mockDescID = defaultCSVTableID

func getNextMockDescID() descpb.ID {

Can we add a comment here explaining why we need these mock IDs (why we can't allocate the "real" ID initially)? Also as a nit, perhaps we can rename this to something like placeholderDescID? I tend to associate mock with objects that simulate a production variant to be used in tests.

Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @pbardea)


pkg/ccl/importccl/import_stmt.go, line 1223 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Can we use sqlbase.InvalidID here?

Done.


pkg/ccl/importccl/import_stmt.go, line 1227 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

nit: this error messages is a bit confusing to me. Perhaps something like cannot create descriptor with an expected ID %v; expected %v?

modified the message.


pkg/ccl/importccl/read_import_pgdump.go, line 252 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Do we need this err declaration?

we need dbDesc and so I needed to declare err as well. But I've moved the err declaration inside the db.Txn scope to make it clear that it is not used outside the scope of that closure.


pkg/ccl/importccl/read_import_pgdump.go, line 387 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Can we add a comment here explaining why we need these mock IDs (why we can't allocate the "real" ID initially)? Also as a nit, perhaps we can rename this to something like placeholderDescID? I tend to associate mock with objects that simulate a production variant to be used in tests.

done!

@adityamaru
Copy link
Contributor Author

I also remembered having a discussion re: setting the schema descs to Offline, and then publishing them similar to how we deal with tables. I added some logic around that too in this iteration.

@adityamaru adityamaru force-pushed the schema-pgdump branch 5 times, most recently from ab79837 to 68bb5e9 Compare February 19, 2021 04:46
@adityamaru
Copy link
Contributor Author

@miretskiy @pbardea just putting this on your radar, apologies for the slightly bloated PR.

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @miretskiy, and @pbardea)


pkg/ccl/importccl/import_processor_test.go, line 941 at r2 (raw file):

	if format.Format == roachpb.IOFileFormat_PgDump {
		spec.tables = map[string]*execinfrapb.ReadImportDataSpec_ImportTable{

nit: slight preference on this conditional to only set the table name for readability, e.g.

fullTableName := "simple"
if pgDump {
   fullTableName = "public.simple"
}
spec.tables = ... // using fullTableName here.

pkg/ccl/importccl/import_stmt.go, line 1127 at r2 (raw file):

	// schema ID.
	tableRewrites := schemaRewrites
	if schemaRewrites == nil {

nit: although it's the same, I think checking for tableRewrites being nil makes more sense here since that's what we're setting in the next line.


pkg/ccl/importccl/import_stmt.go, line 1305 at r2 (raw file):

) error {
	if descriptor.GetID() == descpb.InvalidID {
		log.Fatalf(ctx, "%v", errors.AssertionFailedf("cannot create descriptor with an empty ID: %v", descriptor))

Should we be returning an error here? If not, should we have a comment explaining why? Same directly below.


pkg/ccl/importccl/import_stmt.go, line 2039 at r2 (raw file):

					schema.Desc.GetID(), newDesc)
			}
			newSchemaDesc.State = descpb.DescriptorState_PUBLIC

Does newSchemaDesc have a setPublic() method? If so, we should use that.


pkg/ccl/importccl/import_stmt.go, line 2182 at r2 (raw file):

			// TODO(adityamaru): Do we fail the import rollback, think about what
			// implication that has.
			return err

I think it's okay to error here, especially if it's after we've dropped the table. I think that the error here will return an error along the lines of "be careful, we couldn't clean up and manual cleanup may be necessary". Which is exactly what would be necessary here since the user would need to drop these newly created schemas.


pkg/ccl/importccl/import_stmt.go, line 2285 at r2 (raw file):

		Description:   "dropping schemas as part of an import job rollback",
		Username:      p.User(),
		DescriptorIDs: nil,

Should this have the schema descriptor's IDs?

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @miretskiy, and @pbardea)


pkg/ccl/importccl/import_stmt.go, line 1332 at r2 (raw file):

	mutDesc, ok := descriptor.(catalog.MutableDescriptor)
	if !ok {
		log.Fatalf(ctx, "unexpected type %T when creating descriptor", descriptor)

Do we have a rule of thumb on when we use errors.AssertionFailedf as we did above?


pkg/ccl/importccl/import_stmt.go, line 1346 at r2 (raw file):

Quoted 4 lines of code…
	if err := txn.Run(ctx, b); err != nil {
		return err
	}
	return nil

trivial nit: I don't know how wee feel about return txn.Run(ctx, b) in cases like this.


pkg/ccl/importccl/import_stmt.go, line 1852 at r2 (raw file):

					if len(details.Schemas) != 0 {
						schemaPreparedDetails, schemaRewrites, newSchemaIDToName, oldSchemaIDToName,
							queuedSchemaJobs, err = r.prepareSchemasForIngestion(ctx, p, curDetails, txn, descsCol)

💭 I wonder if there is a type/struct that would could extract here to pass between these functions. Perhaps that is overkill in this case though.

Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @pbardea, and @stevendanna)


pkg/ccl/importccl/import_processor_test.go, line 941 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

nit: slight preference on this conditional to only set the table name for readability, e.g.

fullTableName := "simple"
if pgDump {
   fullTableName = "public.simple"
}
spec.tables = ... // using fullTableName here.
</blockquote></details>

Done.
___
*[pkg/ccl/importccl/import_stmt.go, line 1127 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9EFgpCTfhlCCzZS8R:-MUK60BK0T4BlP8pSqei:b-896fix) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L1127)):*
<details><summary><i>Previously, pbardea (Paul Bardea) wrote…</i></summary><blockquote>

nit: although it's the same, I think checking for `tableRewrites` being nil makes more sense here since that's what we're setting in the next line.
</blockquote></details>

Done.
___
*[pkg/ccl/importccl/import_stmt.go, line 1305 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9EYuz86qfnDJKL0OA:-MUK6TEw2SfkPcHP8wWZ:br96x0p) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L1305)):*
<details><summary><i>Previously, pbardea (Paul Bardea) wrote…</i></summary><blockquote>

Should we be returning an error here? If not, should we have a comment explaining why? Same directly below.
</blockquote></details>

I think we should be returning an error, changed.
___
*[pkg/ccl/importccl/import_stmt.go, line 1332 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9C1739XZTwe8JfGsP:-MUKEeymBY22SwCez0k1:b-kimflq) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L1332)):*
<details><summary><i>Previously, stevendanna (Steven Danna) wrote…</i></summary><blockquote>

Do we have a rule of thumb on when we use `errors.AssertionFailedf` as we did above?
</blockquote></details>

I took these statements to mirror what we already do in `descriptor.go`. Import is unfortunate in the sense that it needs to replicate a lot of the SQL code. I think AssertionFailedf are to be used in code paths that we don't think users will/should hit under "normal circumstances" and they should be reported via sentry. I can definitely see an argument that this cast failure fits into that category but I just went off of precedent.

Switched to returning them though, so that we can rollback the import.
___
*[pkg/ccl/importccl/import_stmt.go, line 1346 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9Cdb65am2XICe9hZv:-MUK77KLDKduV9cF472y:beonz4l) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L1346)):*
<details><summary><i>Previously, stevendanna (Steven Danna) wrote…</i></summary><blockquote>

> ```
> 	if err := txn.Run(ctx, b); err != nil {
> 		return err
> 	}
> 	return nil
> ```

trivial nit: I don't know how wee feel about `return txn.Run(ctx, b)` in cases like this.
</blockquote></details>

good point, cleaned up.
___
*[pkg/ccl/importccl/import_stmt.go, line 1852 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9IhuX0kGA43LM-BeT:-MUKRQRx7qitr6AcAPkE:bhxld0c) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L1852)):*
<details><summary><i>Previously, stevendanna (Steven Danna) wrote…</i></summary><blockquote>

:thought_balloon: I wonder if there is a type/struct that would could extract here to pass between these functions.  Perhaps that is overkill in this case though.
</blockquote></details>

Good idea, the return values are getting out of hand. Added a struct, though there is some subtlety of what we can and cannot have nil between the two methods, without special-casing in a bunch of places for PGDUMP.
___
*[pkg/ccl/importccl/import_stmt.go, line 2039 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9Esdo8aYM-foG9IHI:-MUK7H966ui-tMsWPGQY:bjbhv4w) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L2039)):*
<details><summary><i>Previously, pbardea (Paul Bardea) wrote…</i></summary><blockquote>

Does `newSchemaDesc` have a `setPublic()` method? If so, we should use that.
</blockquote></details>

good point, fixed.
___
*[pkg/ccl/importccl/import_stmt.go, line 2182 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9FFoW42FHFRXK06kn:-MUKDw8D6zOwBf2HS_yY:bcsym3n) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L2182)):*
<details><summary><i>Previously, pbardea (Paul Bardea) wrote…</i></summary><blockquote>

I think it's okay to error here, especially if it's after we've dropped the table. I think that the error here will return an error along the lines of "be careful, we couldn't clean up and manual cleanup may be necessary". Which is exactly what would be necessary here since the user would need to drop these newly created schemas.
</blockquote></details>

makes sense, thanks.
___
*[pkg/ccl/importccl/import_stmt.go, line 2285 at r2](https://reviewable.io/reviews/cockroachdb/cockroach/57183#-MU9FYzLDJY3FQvI-Z0k:-MUKEdQu0nLoKsKAaB9B:b-rmsntx) ([raw file](https://github.com/cockroachdb/cockroach/blob/68bb5e9bc56addb355e903144df38d69073927bb/pkg/ccl/importccl/import_stmt.go#L2285)):*
<details><summary><i>Previously, pbardea (Paul Bardea) wrote…</i></summary><blockquote>

Should this have the schema descriptor's IDs?
</blockquote></details>

nice catch, done.


<!-- Sent from Reviewable.io -->

@adityamaru
Copy link
Contributor Author

I have no idea what reviewable did to my responses...you might have to open reviewable to see them, but I think I addressed all the comments.

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, @pbardea, and @stevendanna)

@adityamaru adityamaru removed the request for review from ajwerner February 24, 2021 21:33
@adityamaru adityamaru requested a review from dt February 24, 2021 21:33
Previously, a PGDUMP import did not support any non-public schema
statements. Now that CRDB has user defined schemas, bundle format
IMPORTs need to be taught how to parse, create and cleanup schema
related PGDUMP operations.

Note this PR only adds support for `CREATE SCHEMA` and usage of the
schema in `CREATE TABLE/SEQUENCE` PGDUMP statements. `ALTER SCHEMA`
statements are ignored and support might be added in a follow up.

Release justification (bug fix): Import PGDUMP does not support user
defined schemas.

Release note (sql change): IMPORT PGDUMP can now import dump files with
non-public schemas.
@adityamaru
Copy link
Contributor Author

TFTR!

bors r+

@craig craig bot merged commit 4d24f8a into cockroachdb:master Feb 25, 2021
@craig
Copy link
Contributor

craig bot commented Feb 25, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants