-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cliccl/load: update load show
with summary
subcommand to display backup meta information
#61131
Conversation
7f5072d
to
a3d6a63
Compare
load show
args to display subset of backup metadataload show
args to display subset of backup metadata
e5a4de7
to
ffc78bf
Compare
load show
args to display subset of backup metadataload show
args to display subset of backup metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, did a first pass of it. Will go for a more indepth pass later today as well.
Reviewed 1 of 4 files at r1, 1 of 4 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa and @pbardea)
pkg/ccl/cliccl/load.go, line 214 at r1 (raw file):
} func showDescriptor(desc backupccl.BackupManifest) {
Since this command claims to show all descriptors, we chatted about that we'll probably also want to display things like user defined schemas and types. Is the plan to add those in a follow up PR?
pkg/ccl/cliccl/load.go, line 232 at r1 (raw file):
} fmt.Printf("Tables:\n")
I think listing the descriptors would be interesting, more as a debugging tool. But I think users would be particularly interested in the tables included in the backup. I think that offering a specific cockroach load show tables command
that shows the fully qualified name would be nice since the output would be better format to pipe into other command (like potentially reading the data for that table).
This can be added in a follow up.
Listing the tables like:
$ cockroach load show tables <backup_url>
foodb.fooschema.footable
foodb.public.footable2
Should be what the user is really after, especially since the expected use-case of this tool is to see what tables are in the backup, so that they can get access to the data.
pkg/ccl/cliccl/load.go, line 172 at r4 (raw file):
if _, ok := options[descriptors]; ok { err = showDescriptor(desc) if err != nil {
nit: just as an FYI, we frequently do:
if err := showDescriptors(desc); err != nil {
return err
}
With this pattern/
pkg/ccl/cliccl/load.go, line 213 at r4 (raw file):
for _, f := range desc.Files { fmt.Printf("%s%s:\n", tabfmt, f.Path) fmt.Printf(" Span: %s\n", f.Span)
Should we be using tabfmt
here and below?
pkg/ccl/cliccl/load_test.go, line 56 at r4 (raw file):
} // load show with spans option
nit: We usually format full line comments as full sentences: // Test load show with span optiosn
. I think this would be a good candidate for a subtest e.g. t.Run("show-spans", func(t *testing.T) { // test here}
.
pkg/ccl/cliccl/load_test.go, line 59 at r4 (raw file):
out, err = c.RunWithCapture(fmt.Sprintf("load show %s spans --external-io-dir=%s", backupPath, dir)) require.NoError(t, err) expectedSpansOutput :=
I think we can put this output to 1 line.
pkg/ccl/cliccl/load_test.go, line 75 at r4 (raw file):
out, err = c.RunWithCapture(fmt.Sprintf("load show %s descriptors --external-io-dir=%s", backupPath, dir)) require.NoError(t, err) expectedDescOutput :=
nit: can we unindent the first line of this string all the way for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
pkg/ccl/cliccl/load.go, line 214 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Since this command claims to show all descriptors, we chatted about that we'll probably also want to display things like user defined schemas and types. Is the plan to add those in a follow up PR?
Yes. It is displaying user-defined schemas now (under the Schemas:
in load show descriptors
command). Will include types in the same PR.
pkg/ccl/cliccl/load.go, line 232 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think listing the descriptors would be interesting, more as a debugging tool. But I think users would be particularly interested in the tables included in the backup. I think that offering a specific
cockroach load show tables command
that shows the fully qualified name would be nice since the output would be better format to pipe into other command (like potentially reading the data for that table).This can be added in a follow up.
Listing the tables like:
$ cockroach load show tables <backup_url> foodb.fooschema.footable foodb.public.footable2
Should be what the user is really after, especially since the expected use-case of this tool is to see what tables are in the backup, so that they can get access to the data.
Will add it in a follow up PR.
pkg/ccl/cliccl/load.go, line 213 at r4 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Should we be using
tabfmt
here and below?
Done.
pkg/ccl/cliccl/load_test.go, line 59 at r4 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think we can put this output to 1 line.
Done.
ffc78bf
to
b6f00f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa and @pbardea)
pkg/ccl/cliccl/load.go, line 46 at r6 (raw file):
Short: "show backups summary", Long: "Shows summary of meta information about a SQL backup.", Args: cobra.MinimumNArgs(1),
Should this be cobra.ExactArgs(1)
?
pkg/ccl/cliccl/load.go, line 53 at r6 (raw file):
Use: "show [command]", Short: "show backup information", Long: "Shows subset(s) of information about a SQL backup.",
I think we can keep this as the old description now.
pkg/ccl/cliccl/load.go, line 85 at r6 (raw file):
func newBlobFactory(ctx context.Context, dialing roachpb.NodeID) (blobs.BlobClient, error) { if dialing != 0 { return nil, errors.Errorf(`only support nodelocal (0/self) under cli inspection`)
nit: The phrasing of this error message is a bit confusing to me. Perhaps something like errors.Errorf(
accessing node %d during nodelocal access is unsupported for CLI inspection; only local access is supported with nodelocal://self, dialing)
?
pkg/ccl/cliccl/load.go, line 165 at r6 (raw file):
schemaIDToNameChain := make(map[descpb.ID]string) schemaIDToNameChain[keys.PublicSchemaID] = sessiondata.PublicSchemaName typeIDs := make([]descpb.ID, 0)
It looks like the slices and maps here are storing the same information? Can we just use the map here?
pkg/ccl/cliccl/load.go, line 186 at r6 (raw file):
typeIDToNameChain[id] = parentSchema + "." + typeName typeIDs = append(typeIDs, id) }
We'll still want showDescriptors
to show the tables as well I think. I think that users may be interested in showing just the tables as a separate command that we can add in a follow-up, but I think we'll want to keep tables here like we had before.
pkg/ccl/cliccl/load.go, line 202 at r6 (raw file):
if len(typeIDs) > 0 { fmt.Printf("Types:\n")
Let's put the header above the if
check to explicitly show that there are no types included in the backup.
pkg/ccl/cliccl/load_test.go, line 59 at r6 (raw file):
expectedDescOutput := ` Databases: 52.testdb
I think we should avoid formatting this as id.name
since names are partitioned by .
(e.g. testdb.public.footype
). Let's instead format it as either 52: testdb
or testdb (52)
.
pkg/cli/cli_test.go, line 50 at r6 (raw file):
skip.UnderShort(t) c := NewCliTest(TestCliParams{T: t})
I think I preferred NewCLITest
returning a CLITest
. I think things get a bit more confusing when renaming the structs to TestCli
. The rename is also a separate change from what the commit is achieving, which is exporting the helpers for ccl tests to access.
pkg/cli/doctor_test.go, line 24 at r6 (raw file):
func TestDoctorCluster(t *testing.T) { defer leaktest.AfterTest(t)() c := NewCliTest(TestCliParams{T: t})
nit: I think the old capitalization of CLI is what we typically do. For acronyms, the later letters usually follow the casing of the first one.
pkg/cli/testutils.go, line 114 at r6 (raw file):
} // NewCliTest export for cclcli
nit: we usually format comments that are on their own line as full sentences (ie end in a .
). Here and the couple of other places in this file.
e71331d
to
d5207e2
Compare
d8b2acd
to
f33e921
Compare
Previously, `load show` only supports showing metadata of a single backup manifest. Users are ignorant of the incremental backups and have no ways to inspect the incremental backup manifests unless they know the incremental paths beforehand. This PR updates `load show` with `incremental` subcommand to display incremental backups, i.e. displaying the auto appended incremental backup paths of a full backup. After listing the incremental paths, users are able to inspect an incremental backup manifest with `load show summary`. see cockroachdb#61131 Release note (cli change): Update `load show` with `incremental` subcommand to display incremental backups. User can list incremental backup paths of a full backup by running `cockroach load show incremental <backup_url>.`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a minor comment nit, also tagging @itsbilal to take a look from the CLI side. Note: the load command is currently undocumented and we're cleaning it up. The main thing to review from the CLI side is 5c95fcf.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa and @pbardea)
pkg/cli/testutils.go, line 43 at r9 (raw file):
} // TestCLI represents test for cli.
nit: I think we can say a bit more in this comment, something like: TestCLI wraps a test server and is used by tests to make assertions about the output of CLI commands.
Previously, users can list backups created by `BACKUP INTO` with `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. This PR updates `load show` with `backups` subcommand, which allows users to list backups in a backup collection created by `BACKUP INTO`. With the same purpose as other `load show` subcommands, this update allows users to list backups without running `SHOW BACKUP IN` in a sql session. see cockroachdb#61131 cockroachdb#61829 to checkout other `load show` subcommand. Release note (cli change): Add `load show backups` to display backup collection. Previously, users can list backups created by `BACKUP INTO` via `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. Now, users are able to list backups in a collection with `cockroach load show backups <collection_url>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5c96fcf looks good to me! I've left some comments but they're mostly just about style / struct naming that should be fairly easy to fix. Good work!
Reviewed 16 of 16 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa, @itsbilal, and @pbardea)
pkg/ccl/cliccl/load.go, line 223 at r8 (raw file):
// Note that these descriptors could be from any past version of the cluster, // in case more fields need to be added to the output. dbIDs := make([]descpb.ID, 0)
Nit: You can save some repeat allocations by passing a capacity like dbIDs := make([]descpb.ID, 0, len(desc.Descriptors))
. It's okay if it's larger than what it ends up being.
pkg/ccl/cliccl/load_test.go, line 41 at r8 (raw file):
sqlDB := sqlutils.MakeSQLRunner(db) sqlDB.Exec(t, `CREATE DATABASE testDB`)
Probably a good idea to throw a require.NoError
around these Execs.
pkg/cli/testutils.go, line 44 at r8 (raw file):
// TestCLI represents test for cli type TestCLI struct {
I see Paul alluded to this earlier, but can we go with calling this CLITest
(and the params struct CLITestParams
) instead? That way it's both more consistent with the existing code, and leads to less confusion / naming conflicts with functions named TestCLI*
that are actual test functions. I don't feel very strongly about this so if y'all have agreed to go with this for some reason, feel free to ignore this little point.
f33e921
to
9f6b7a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @pbardea)
pkg/ccl/cliccl/load.go, line 46 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Should this be
cobra.ExactArgs(1)
?
Done.
pkg/ccl/cliccl/load.go, line 53 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think we can keep this as the old description now.
Done.
pkg/ccl/cliccl/load.go, line 165 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
It looks like the slices and maps here are storing the same information? Can we just use the map here?
Note: the slices are used to keep descriptors sorted by IDs. We can remove it once we found sorting descriptors is not necessarily needed.
pkg/ccl/cliccl/load.go, line 186 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
We'll still want
showDescriptors
to show the tables as well I think. I think that users may be interested in showing just the tables as a separate command that we can add in a follow-up, but I think we'll want to keep tables here like we had before.
Done.
pkg/ccl/cliccl/load.go, line 202 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Let's put the header above the
if
check to explicitly show that there are no types included in the backup.
Done.
pkg/ccl/cliccl/load.go, line 223 at r8 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: You can save some repeat allocations by passing a capacity like
dbIDs := make([]descpb.ID, 0, len(desc.Descriptors))
. It's okay if it's larger than what it ends up being.
Done
pkg/ccl/cliccl/load_test.go, line 59 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think we should avoid formatting this as
id.name
since names are partitioned by.
(e.g.testdb.public.footype
). Let's instead format it as either52: testdb
ortestdb (52)
.
Done.
pkg/ccl/cliccl/load_test.go, line 41 at r8 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Probably a good idea to throw a
require.NoError
around these Execs.
SQLRunner exec comment says:'Exec is a wrapper around gosql.Exec that kills the test on error'. If I didn't get it wrong, the error will be catched by running sqlDB.Exec
(I tried, it did). But if I missed something else, please correct me :)
pkg/cli/cli_test.go, line 50 at r6 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think I preferred
NewCLITest
returning aCLITest
. I think things get a bit more confusing when renaming the structs toTestCli
. The rename is also a separate change from what the commit is achieving, which is exporting the helpers for ccl tests to access.
Note: cliTest is renamed to TestCLI to pass linter check.
pkg/cli/testutils.go, line 44 at r8 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I see Paul alluded to this earlier, but can we go with calling this
CLITest
(and the params structCLITestParams
) instead? That way it's both more consistent with the existing code, and leads to less confusion / naming conflicts with functions namedTestCLI*
that are actual test functions. I don't feel very strongly about this so if y'all have agreed to go with this for some reason, feel free to ignore this little point.
I tried CLITest and CLITestParams, but given that those exported structs are in cli pkg, struct names starting with 'CLI..' is failing by the linter. If you got other thoughts about the naming, I will follow up with this :)
Previously, `cockroach load show <backup_path>` is used for debugging purposes, and displays all the metadata of a single backup manifest file. With this PR, we extend `load show` with more options to display subset of data. Also, we add support of "nodelocal://0/" access. Release justification: Non-production code changes Release note (cli change): Add `load show` args to display subset of backup metadata. Users can display subsets of metadata of a single manifest by using `load show files/descriptors/metadata/spans <backup_url>`.
Previously, clitest only has private access within cli package, while cliccl lack sufficient tests for ccl cli tools. With this commit, clitest utils are exported for cliccl and cliccl can reuse them for unit tests. The `cliTest` struct and the `cliTestParams` struct are renamed as `TestCLI` and `TestCLIParams`. Release justification: Low risk, high benefit changes to existing functionality. Release note: None
Previously, `load show` provides different options to display subset of meta information of backup, but the option parsing could be error prone and prefer a better way to provide users with subset options(e.g.with --include). This commit updates `load show` with `summary` subcommand to display all metadata of an individual backup for debugging. Release justification: non-production code changes. Release note (cli change): Update `load show` with `summary` subcommand to display meta information of an individual backup. Users are now able to inspect backup metadata, files, spans and descriptors with cli command `cockroach load show summary <backup_url>` without a running cluster.
9f6b7a0
to
aa7238a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think this is good to merge!
TFTR! |
Build succeeded: |
Previously, `load show` only supports showing metadata of a single backup manifest. Users are ignorant of the incremental backups and have no ways to inspect the incremental backup manifests unless they know the incremental paths beforehand. This PR updates `load show` with `incremental` subcommand to display incremental backups, i.e. displaying the auto appended incremental backup paths of a full backup. After listing the incremental paths, users are able to inspect an incremental backup manifest with `load show summary`. see cockroachdb#61131 Release note (cli change): Update `load show` with `incremental` subcommand to display incremental backups. User can list incremental backup paths of a full backup by running `cockroach load show incremental <backup_url>.`
Previously, users can list backups created by `BACKUP INTO` with `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. This PR updates `load show` with `backups` subcommand, which allows users to list backups in a backup collection created by `BACKUP INTO`. With the same purpose as other `load show` subcommands, this update allows users to list backups without running `SHOW BACKUP IN` in a sql session. see cockroachdb#61131 cockroachdb#61829 to checkout other `load show` subcommand. Release note (cli change): Add `load show backups` to display backup collection. Previously, users can list backups created by `BACKUP INTO` via `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. Now, users are able to list backups in a collection with `cockroach load show backups <collection_url>.
61406: sql: validate against array type usages when dropping enum values r=ajwerner a=arulajmani Previously, when dropping an enum value, we weren't validating if the enum value was used in a column of array type. This patch fixes the bug. Fixes #60004 Release justification: bug fix to new functionality Release note: None 61650: roachtest: exclude `lost+found` directory r=knz a=rickystewart This directory is created by the filesystem and unowned file chunks are put there by `fsck`. The directory and its contents aren't readable by anyone except `root`, so this can cause the `du -c /mnt/data1` that `roachtest` performs to fail -- add an `--exclude` to handle this. We already ignore this directory in other contexts (for example, see `pkg/storage/mvcc.go`). Fixes #53663. Release justification: Non-production code change Release note: None 61788: importccl: unskip userfile benchmark r=pbardea a=adityamaru I've run this ~20 times and it averages ~13s to run. I suspect the fixes to linked issues mentioned in #59126 might have mitigated this. Going to unskip due to lack of reproducibility. Fixes: #59126 Release note: None 61828: contention: store contention events on non-SQL keys r=yuzefovich a=yuzefovich Previously, whenever we tried to add a contention event on a non-SQL key, it would encounter an error during decoding tableID/indexID pair, and the event was dropped. This commit extends the contention registry to additionally store information about contention on non-SQL keys. That information is stored in two levels: - on the top level, all `SingleNonSQLKeyContention` objects are ordered by their keys - on the bottom level, all `SingleTxnContention` objects are ordered by the number of times that transaction was observed to contend with other transactions. `SingleTxnContention` protobuf message is moved out of `SingleKeyContention` and is reused for non-SQL keys. This commit also updates the status server API response. I assume that no changes are needed with regards to backwards compatibility since the original version was merged just a few weeks ago, and we haven't had a beta released since then. Fixes: #60669. Release note (sql change): CockroachDB now also stores the information about contention on non-SQL keys. 61862: cliccl: add `load show backups` to display backup collection r=pbardea a=Elliebababa Previously, users can list backups created by `BACKUP INTO` with `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. This PR updates `load show` with `backups` subcommand, which allows users to list backups in a backup collection created by `BACKUP INTO`. With the same purpose as other `load show` subcommands, this update allows users to list backups without running `SHOW BACKUP IN` in a sql session. see #61131 #61829 to checkout other `load show` subcommand. Release note (cli change): Add `load show backups` to display backup collection. Previously, users can list backups created by `BACKUP INTO` via `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. Now, users are able to list backups in a collection with `cockroach load show backups <collection_url>. 61877: bench/ddl_analysis: fix test for real r=ajwerner a=ajwerner See individual commits. Last is the critical one. Fixes #61856. 61937: colexec: make vectorized stats concurrency safe r=yuzefovich a=yuzefovich **colflow: clean up vectorized stats for rowexec processors** Previously, the wrapped row-execution KV reading processors were implementing `execinfra.KVReader` interface, but they were never used as such, only the ColBatchScans would get used to retrieve the KV stats. This is the case because the row-execution processors report their execution stats themselves, and we don't want to duplicate that info. This commit moves `KVReader` interface into `colexecop` package and now only the ColBatchScans implement it. This allowed for some cleanup around the vectorized stats code, but the main reason for performing this change is that the contract of the interface will be modified by the follow-up commit to mention the safety under concurrent usage, and I didn't want to change the row-execution processors for that since the relevant methods never get called anyway. Additionally, this commit begins emitting of rows-read and bytes-read by the zigzagJoiners and invertedJoiners to complete the metrics picture. Release note: None **colexec: make vectorized stats concurrency safe** Previously, the collection of vectorized stats was not synchronized with the operators themselves. Namely, it was possible to call methods like `GetBytesRead` on the ColBatchScans and Inboxes from a different goroutine (the root materializer or the outbox) than from the main goroutine of the operator. This is now fixed by putting mutexes in place and updating `colexecop.KVReader` interface to require concurrency-safe implementations. Fixes: #61899. Release note: None Co-authored-by: arulajmani <arulajmani@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: elliebababa <ellie24.huang@gmail.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>
Previously,
cockroach load show <backup_path>
is used for debugging purposes, and displays all the metadata (including Spans, Files, Descriptors) of a single backup manifest file.With this PR, we update
load show
withsummary
subcommand. Users can inspect metadata of a single manifest by usingload show summary <backup_url>
. User-defined types and user-defined schemas are added to the output. Also, we add support of "nodelocal://0/" access.see
#18434
#60790
Release justification: Non-production code changes
Release note (cli change): Update
load show
withsummary
subcommand to display information of backup metadata. Users can inspect metadata information of a single manifest by using.cockroach load show summary <backup_url>
.