-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli/debug: allow the user to refine which files get included #64094
Conversation
cc @thtruo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
a discussion (no related file):
First commit reviewed over at #64071, apologies for not doing it all here.
Thanks for adding the missing tests!
This looks like a useful addition. I think the only blocker is the function name changes I mentioned in #64071 to get this building again.
pkg/cli/zip_per_node.go, line 388 at r3 (raw file):
if err := zc.z.createRaw(nodePrinter.start("writing dump"), name, file.Contents); err != nil { return err }
It feels like there might be some room for a helper function or two in this file that would reduce some of the code duplication. But, that looks true of the existing code as well.
f1abc6c
to
79c68c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna)
a discussion (no related file):
Previously, stevendanna (Steven Danna) wrote…
First commit reviewed over at #64071, apologies for not doing it all here.
Thanks for adding the missing tests!
This looks like a useful addition. I think the only blocker is the function name changes I mentioned in #64071 to get this building again.
Thanks this seems to build now.
pkg/cli/zip_per_node.go, line 388 at r3 (raw file):
Previously, stevendanna (Steven Danna) wrote…
It feels like there might be some room for a helper function or two in this file that would reduce some of the code duplication. But, that looks true of the existing code as well.
This is going to be complicated until go provides us with generics. Every one of these blocks has a different mix of types and function signatures.
348284f
to
085bd52
Compare
085bd52
to
0f55cc6
Compare
64071: cli/debug: new sub-command `list-files` r=stevendanna a=knz Prerequisite to #64094. This is meant to assist the human operator select appropriate flags for the `debug zip` command. Release note (cli change): The new command `debug list-files` show the list of files that can be retrieved via the `debug zip` command. It supports the `--nodes` and `--exclude-nodes` parameters in the same way as `debug zip`. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
8493677
to
3368cc2
Compare
63767: bazel: tweak logic for staging test artifacts in bazci r=rail a=rickystewart Because Bazel aggressively caches build/test artifacts, if we're not careful, bazci can copy OLD artifacts from previous build/test runs into the artifacts directory. This is particularly an issue because TeamCity watches that directory for `test.xml` reports, and if an old `test.xml` reports a test failure, TeamCity will notice that and report the failure in the UI -- and importantly, even if we replace that `test.xml` with a completely different one that reports that the test succeeed, TC will not amend what's displayed in the UI accordingly. So this can manifest as reported test failures from unrelated PR's showing up in TC in an apparently unpredictable (though uncommmon) manner. We fix this by making bazci a little smarter about when we choose to stage artifacts: 1. The first time the watcher loops over all the test artifacts, never stage anything (the artifacts are probably cached -- not enough time has passed for any legitimate artifacts to appear, probably). 2. Only stage artifacts incrementally if their stats have changed since the initial round of caching. 3. During the final loop, stage ALL artifacts (if they haven't been staged yet), just to make sure we don't miss anything. Also add lots of comments to make these design decisions and their motivations a little clearer. Resolves #63740. Release note: None 64071: cli/debug: new sub-command `list-files` r=stevendanna a=knz Prerequisite to #64094. This is meant to assist the human operator select appropriate flags for the `debug zip` command. Release note (cli change): The new command `debug list-files` show the list of files that can be retrieved via the `debug zip` command. It supports the `--nodes` and `--exclude-nodes` parameters in the same way as `debug zip`. Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
The node selection logic was untested. This commit adds the missing test. Release note: None
We have observed in practice that the majority of the size of generated `debug zip` outputs is attributable to the discrete files retrieved from the server: - log files - goroutine dumps - heap profiles As clusters get larger and older, these files increase in number, and the size of the `debug zip` output grows accordingly. Meanwhile, in practice, there are many support cases where we know ahead of running the `debug zip` command that only certain files are going to be useful. Therefore, we wish to be able to restrict which files get included / excluded from the generated archive. Hence the functional change described below. (Note: there is another expressed wish to limit the size of generated archives based on a *time or date range*, not just file names / types. This will be handled in a separate commit.) Release note (cli change): It is now possible to fine tune which files get retrieved from the server nodes by the `debug zip` command, using the new flag `--include-files` and `--exclude-files`. These flags take glob patterns that are applied to the file names server-side. For example, to include only log files, use `--include-files='*.log'`. The command `cockroach debug list-files` also accepts these flags and can thus be used to experiment with the new flags before running the `debug zip` command.
Release note: None
3368cc2
to
d13fda9
Compare
@tbg reviewed this in #64128 (review) Addressing these comments here: #64128 (review) |
64128: cli/zip: clamp file retrieval by date, default to last 48 hours r=tbg,stevendanna a=knz All commits but the last from #64094 and prior. Fixes #57795 The size of generated `debug zip` outputs is mostly attributable to the discrete files retrieved from the server: - log files - goroutine dumps - heap profiles Even though we have various garbage collection techniques applied to these files, they still yield very large `debug zip` outputs: - the default retained cap for log files is 100MB per group, so with the default of up to 5 log groups per node, we are retaining up to 500MB of log files per node. - the default retained cap for goroutine dumps, heap profiles and memory stats is 128MB for each, so we are retaining up to 384MB of these files per node. For example, for a 10-node cluster we are looking at up to 7GB of data retained in these files in the default configuration. While this amount of data is relatively innocuous while it remains in the data directories server-side, the `debug zip` behavior which was previously to retrieve *all* of it was detrimental to the process of troubleshooting anomalies: this is a lot of data to move around, it cannot be e-mailed effectively, etc. In comparison, the other items retrieved by the `debug zip` command (range descriptors, SQL system tables) are typically just kilobytes large. Even a large cluster with tens of thousands of ranges and system tables typically incurs `zip` payloads no greater than a dozen megabytes. Hence the change described in the release note below. Release note (cli change): The `cockroach debug zip` command now retrieves only the log files, goroutine dumps and heap profiles pertaining to the last 48 hours prior to the command invocation. This behavior is supported entirely client-side, i.e. it is not necessary to upgrade the server nodes to effect these newly configurable limits. This behavior can be customized by the two new flags `--files-from` and `--files-until`. Both are optional. See the command-line help text for details. The other data items retrieved by `debug zip` are not affected by this time limit. The two new flags are also supported by `cockroach debug list-files`. It is advised to experiment with `list-files` prior to issuing a `debug zip` command that may retrieve a large amount of data. 64211: lint: check `make bazel-generate` earlier in the `teamcity-check` script r=rail a=rickystewart We used to do this after `make buildshort` and `make generate`, but those can take a while. Do it before so obvious Bazel linter errors break the build sooner. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
First commit from #64071.
Fixes #54654.
We have observed in practice that the majority of the size of
generated
debug zip
outputs is attributable to the discrete filesretrieved from the server:
As clusters get larger and older, these files increase in number, and
the size of the
debug zip
output grows accordingly.Meanwhile, in practice, there are many support cases where we know
ahead of running the
debug zip
command that only certain files aregoing to be useful. Therefore, we wish to be able to restrict
which files get included / excluded from the generated archive.
Hence the functional change described below.
(Note: there is another expressed wish to limit the size of generated
archives based on a time or date range, not just file names /
types. This will be handled in a separate commit.)
Release note (cli change): It is now possible to fine tune which files
get retrieved from the server nodes by the
debug zip
command, usingthe new flag
--include-files
and--exclude-files
. These flagstake glob patterns that are applied to the file names server-side. For
example, to include only log files, use
--include-files='*.log'
.The command
cockroach debug list-files
also accepts these flags andcan thus be used to experiment with the new flags before running the
debug zip
command.