-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: profilecli query-blocks merge #3618
Conversation
cmd/profilecli/output.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common code for output from query.go
fabe9a7
to
dade773
Compare
"merge" in this context refers to merging multiple profiles and their samples to produce a single result (e.g., flamegraph, a pprof file, etc.). The name is still valid, even if we are operating on one block :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, some files were added mistakenly (lel.txt
and so on).
Also, I propose to revisit the way the CLI interface is extended
cmd/profilecli/query-blocks.go
Outdated
Start: meta.MinTime.Time().UnixMilli(), | ||
End: meta.MaxTime.Time().UnixMilli(), | ||
}, | ||
100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the max_nodes
parameter. I'd say that it should be configurable. In case of pprof (SelectMergePprof
) it should default to 0
queryBlocksSeriesParams := addQueryBlocksSeriesParams(queryBlocksSeriesCmd) | ||
queryBlocksMergeCmd := queryBlocksCmd.Command("merge", "Request merged profile.") | ||
queryBlocksMergeParams := addQueryBlocksMergeParams(queryBlocksMergeCmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's design the CLI interface first. I believe that merge
might be confusing.
I propose the following interface:
profilecli query merge // Already exists. Should be hidden and replaced in docs with "profile".
profilecli query profile // Alias for "merge".
profilecli query series // Existing subcommand.
profilecli query go-pgo // Queries pprof for Go PGO.
etc.
Now, in the command handler, we check whether --block
flag is specified. There's a common practice to use singular form for flags that accept multiple values; the flag should be specified multiple times:
profilecli query series --block=A
profilecli query series \
--block=A \
--block=B \
Next, let's make query
subcommand to support storage backend configuration (this is very easy).
Finally, let's remove query-blocks
subcommand.
Alternatively, we could extend the existing admin blocks
subcommand:
profilecli admin block query profile
profilecli admin block query series
profilecli admin block query go-pgo
etc.
However, I believe query X --block=A
is more intuitive. On the other hand profilecli admin block query
is more correct from the semantics standpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a draft for both solutions:
- refactor: unify query and query-blocks #3623
- refactor: move query-blocks to admin blocks query #3625
I'm not including support for storage backend configuration yet. I'll do it once I've chose one.
I agree extending profilecli query
may be more elegant in terms of usage. But on the other hand, I think it makes doc more difficult to express.
Here we have the --help
. Marking with >
those flags for pyroscope server mode, with <
those flags for single blocks (--block
required), and finally marking with =
those flags that are used for both modes:
profilecli query profile --help
Request merged profile.
Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
-v, --verbose Enable verbose logging.
> --url="http://localhost:4040"
URL of the profile store.
> --tenant-id="" The tenant ID to be used for the X-Scope-OrgID header.
> --username="" The username to be used for basic auth.
> --password="" The password to be used for basic auth.
> --protocol=connect The protocol to be used for communicating with the server.
> --from="now-1h" Beginning of the query.
> --to="now" End of the query.
= --query="{}" Label selector to query.
< --local-path="./data/anonymous/local"
Path to blocks directory.
< --bucket-name=BUCKET-NAME The name of the object storage bucket.
< --object-store-type="gcs" The type of the object storage (e.g., gcs).
< --block=BLOCK ... Block ids to query on (accepts multiples)
< --block-tenant-id=BLOCK-TENANT-ID
Tenant id of the queried block for remote bucket
= --output="console" How to output the result, examples: console, raw, pprof=./my.pprof
= --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"
Profile type to query.
= --stacktrace-selector=STACKTRACE-SELECTOR ...
Only query locations with those symbols. Provide multiple times starting with the root
So at the end you have a set of shared flags, some that will need some values if you use --block
, and others that are only needed in the absence of that flag. But I think it's difficult to explain that you have multiple submodes here, without writing multiple clarifications.
So that makes me lean more towards admin blocks query
solution, which in terms of docs feels more precise:
profilecli admin blocks query --help
Request merged profile on local/remote block.
Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
-v, --verbose Enable verbose logging.
--path="./data/anonymous/local"
Path to blocks directory
--bucket-name=BUCKET-NAME The name of the object storage bucket.
--object-store-type="gcs" The type of the object storage (e.g., gcs).
--block=BLOCK ... Block ids to query on (accepts multiples)
--tenant-id=TENANT-ID Tenant id of the queried block for remote bucket
--query="{}" Label selector to query.
--output="console" How to output the result, examples: console, raw, pprof=./my.pprof
--profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"
Profile type to query.
--stacktrace-selector=STACKTRACE-SELECTOR ...
Only query locations with those symbols. Provide multiple times starting with the root
dade773
to
15a6074
Compare
cmd/profilecli/my.pprof
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file was introduced unintentionally time ago
feat: profilecli query-blocks merge
15a6074
to
d9902ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
In this PR we extend the new profilecli command
query-blocks
with amerge
, analogous toprofilecli query merge
. With this command, you can execute queries directly to a single block hosted in your localhost or a remote bucket.Partially solves #3559
Main trade-off
Note that opposed to
profilecli query-blocks series
, this can only query a single block.Merging data from different blocks is a complex task. The implementation for that is distributed in the codebase. It implies defining query plans, using streams, and following/duplicating read path. That can be handled easily by a pyroscope server, but doing it in
profilecli
means duplicating code and introducing a good amount of boilerplate code for stream handling. For all those, I decided to simplify the capabilities here while still delivering some value, limiting the amount of blocks to query to just 1.Funny enough, we should maybe rename
merge
to another command name here.Capabilities
This feature gives similar capabilities as
profilecli query merge
but for a specified local/remote block:--profile-type
or specify a query with--query
.--stacktrace-selector
--local-path
) or remotely (--bucket-name
,--tenant-id
and--object-store-type
- only gcs supported right now).--block-ids
flag.to
andfrom
) are not needed: it will query the whole blocks instead.doc
Usage example:
Querying profiles on a local block, and filter on service_name
Querying series on a remote block, raw output: