Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli/dump: support columns less tables #46406

Merged
merged 1 commit into from
Apr 16, 2020
Merged

Conversation

C0rWin
Copy link
Contributor

@C0rWin C0rWin commented Mar 21, 2020

This commit allows to dump and treat columns less tables by explicitly
allowing to explicitly specify hidden/reserved columns.

Fixes #37768.

Signed-off-by: Artem Barger bartem@il.ibm.com

Release note (bug fix): providing with ability to dump columns less tables

@C0rWin C0rWin requested a review from a team as a code owner March 21, 2020 22:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- if it is easy to do (don't recall what the current dump tests look like), I'd also include a test showing that it now works

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this needs unit tests
  2. there needs to be a release note

@C0rWin
Copy link
Contributor Author

C0rWin commented Mar 26, 2020

1. this needs unit tests

2. there needs to be a release note

Thanks for comments @dt and @knz
Fixed. Added unit test and release note

@C0rWin C0rWin requested a review from knz March 26, 2020 22:05
@C0rWin
Copy link
Contributor Author

C0rWin commented Mar 27, 2020

put into work in progress, there are previous unit tests I need to take care of

@blathers-crl
Copy link

blathers-crl bot commented Apr 5, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan.

@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 5, 2020

@dt. @knz I've updated PR + legacy tests fixed while a new one to cover the functionality is added. Would appreciate your inputs/comments.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I can see how this is moving forward.

Regarding the testing, I think we're not yet half way there. It's important not just to check that dump produces something, but also that that something is actually valid. The behavior is create table -> populate data -> dump -> re-load from dump -> compare data. The data must match.

The other thing I'm not yet 100% about is whether this approach is sufficent. This current approach only works if the only hidden columns are specifically those that are auto-created by crdb. There are multiple ways this can break:

  • if CockroachDB currently makes certain columns non-visible for other reasons, this will not work. (I don't know about this. @rohany can you comment on this? does the ALTER PK story make this happen?)

  • if CockroachDB even in the future makes certain columns non-visible, this will start breaking without anyone noticing.

  • in particular, if CockroachDB in later version keeps the auto-generated rowid column, but gives it a different name (e.g. "timestamp" or "$internal_rowid" or whaterver), then current dumps with your patch will not work anymore.

I think the root of my concern is twofold:

  1. the CREATE TABLE statement that dump produces in version X should always work at version Y>X. If you rely on current implicit behavior, but we never really documented/specified that behavior, the dump may non-work with another version.

    Now, suppose that the pretty-printer for CREATE TABLE was always making the existence of the rowid column (albeit hidden) explicit, in a way that future crdb versions would be guaranteed to honor, then there would be no surprise.

  2. the INSERT statements generated for data rows in the dump should only refer to columns created explicitly by CREATE TABLE. The reason for this is the following. dump is the way that users get confidence they are not locked into cockroachdb -- they can restore their dumps on a different SQL database. But that does not work if the INSERT refers to "magic" columns, where the user has no idea when looking at CREATE TABLE how to create a similar column in their own database. By making the hidden columns visible in the generated CREATE TABLE statement, we'd be informing the user how to "do the same" in their non-crdb database.

Again I encourage you to look at #26644 which provides a possible/candidate approach that addresses both goals.
Another possible approach would be to display the hidden column definitions in a comment; this would help with goal (2) but not with goal (1).

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@rohany
Copy link
Contributor

rohany commented Apr 6, 2020

if CockroachDB currently makes certain columns non-visible for other reasons, this will not work. (I don't know about this. @rohany can you comment on this? does the ALTER PK story make this happen?)

primary key changes make some new and non-public indexes, but don't touch the columns in the table. However if a user wants to switch to a hash sharded primary key it will add a new hidden column.

@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 6, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 6, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 6, 2020
@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 6, 2020

Regarding the testing, I think we're not yet half way there. It's important not just to check that dump produces something, but also that that something is actually valid. The behavior is create table -> populate data -> dump -> re-load from dump -> compare data. The data must match.

Although I was following pattern of tests I've encountered in the dump_test.go I agree with your comment and it's totally make sense to add two missing steps, while not sure whenever it there is path to continue this PR counting all other of your comments.

  • if CockroachDB even in the future makes certain columns non-visible, this will start breaking without anyone noticing.
  • in particular, if CockroachDB in later version keeps the auto-generated rowid column, but gives it a different name (e.g. "timestamp" or "$internal_rowid" or whaterver), then current dumps with your patch will not work anymore.

I think this is already broken, since if there is table w/o explicit primary key then rowid is auto generated and dump produced contains it.

the CREATE TABLE statement that dump produces in version X should always work at version Y>X. If you rely on current implicit behavior, but we never really documented/specified that behavior, the dump may non-work with another version.

Now, suppose that the pretty-printer for CREATE TABLE was always making the existence of the rowid column (albeit hidden) explicit, in a way that future crdb versions would be guaranteed to honor, then there would be no surprise.

This is very fair point, but I would also expect that going forward and changing format or some internal structures the backward campatibility would be preserved.

the INSERT statements generated for data rows in the dump should only refer to columns created explicitly by CREATE TABLE. The reason for this is the following. dump is the way that users get confidence they are not locked into cockroachdb -- they can restore their dumps on a different SQL database. But that does not work if the INSERT refers to "magic" columns, where the user has no idea when looking at CREATE TABLE how to create a similar column in their own database. By making the hidden columns visible in the generated CREATE TABLE statement, we'd be informing the user how to "do the same" in their non-crdb database.

to me this sounds a bit contradictive to #26644, and by the way can you give a hint why it was closed and not merged? Is still pending to be completed?

At any rate I really appreciate your, @knz, feedback and would be glad to understand whenever I'd better to abandon current direction? And in case you think I'd better to close current PR, I think it will beneficial to understand a bit more about current status of #26644.

@knz
Copy link
Contributor

knz commented Apr 7, 2020

The behavior is create table -> populate data -> dump -> re-load from dump -> compare data. The data must match.

Although I was following pattern of tests I've encountered in the dump_test.go I agree with your comment and it's totally make sense to add two missing steps

I think there's precedent for this in TestDumpRandom.

  • if CockroachDB even in the future makes certain columns non-visible, this will start breaking without anyone noticing.
  • in particular, if CockroachDB in later version keeps the auto-generated rowid column, but gives it a different name (e.g. "timestamp" or "$internal_rowid" or whaterver), then current dumps with your patch will not work anymore.

I think this is already broken, since if there is table w/o explicit primary key then rowid is auto generated and dump produced contains it.

Oh haha I had missed that. That's not great, but indeed nothing new for you to care about in this PR.

Now, suppose that the pretty-printer for CREATE TABLE was always making the existence of the rowid column (albeit hidden) explicit, in a way that future crdb versions would be guaranteed to honor, then there would be no surprise.

This is very fair point, but I would also expect that going forward and changing format or some internal structures the backward campatibility would be preserved.

Can you explain more?

the INSERT statements generated for data rows in the dump should only refer to columns created explicitly by CREATE TABLE. [...] By making the hidden columns visible in the generated CREATE TABLE statement, we'd be informing the user how to "do the same" in their non-crdb database.

to me this sounds a bit contradictive to #26644, and by the way can you give a hint why it was closed and not merged? Is still pending to be completed?

It was closed because there as no priority at that time to work on that problem.

At any rate I really appreciate your, @knz, feedback and would be glad to understand whenever I'd better to abandon current direction? And in case you think I'd better to close current PR, I think it will beneficial to understand a bit more about current status of #26644.

Given that you've identified (above) that the dump already generates CREATE TABLE statements that refer to non-existent columns, there's nothing worse introduced by your PR here so I'd encourage you to continue this work.

However I think some additional mileage would be obtained wrt vendor lock-in (or the lack thereof) by revealing to the user what's going on. Is there a way perhaps to generate SQL comments (starting with --) when pretty-printing CREATE statements, to reveal the hidden columns?

@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 8, 2020

This is very fair point, but I would also expect that going forward and changing format or some internal structures the backward compatibility would be preserved.

Can you explain more?

Yes, I was just referring to the fact that since at any rate dump already reveals non visible columns extra work has to be done to maintain backward compatibility.

Given that you've identified (above) that the dump already generates CREATE TABLE statements that refer to non-existent columns, there's nothing worse introduced by your PR here so I'd encourage you to continue this work.

In such a case I will fix unit test and push updated commit.

However I think some additional mileage would be obtained wrt vendor lock-in (or the lack thereof) by revealing to the user what's going on. Is there a way perhaps to generate SQL comments (starting with --) when pretty-printing CREATE statements, to reveal the hidden columns?

I think that this could be done, while might require a bit more intrusive changes.

@blathers-crl
Copy link

blathers-crl bot commented Apr 9, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan.

@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 9, 2020

Given that you've identified (above) that the dump already generates CREATE TABLE statements that refer to non-existent columns, there's nothing worse introduced by your PR here so I'd encourage you to continue this work.

@knz I've pushed revisited unit tests to make sure end-to-end functionality, include applying dumps back, also few more test cases.

This commit allows to dump and treat columnsless tables by explicitly
allowing to explicitly specify hidden/reserved columns.

Signed-off-by: Artem Barger <bartem@il.ibm.com>

Release note (bug fix): providing with ability to dump columns less tables
@C0rWin C0rWin requested a review from knz April 10, 2020 00:04
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@dt
Copy link
Member

dt commented Apr 15, 2020

LGTM though I'll let @knz take one more look before we tell bors to merge it

@dt
Copy link
Member

dt commented Apr 15, 2020

And thanks again for picking this up and for working though all the review suggestions!

@dt
Copy link
Member

dt commented Apr 16, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 16, 2020

Build succeeded

@craig craig bot merged commit 1a51a57 into cockroachdb:master Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli/dump: fails with column-less tables
5 participants