feat: profilecli query-blocks merge #3618

alsoba13 · 2024-10-09T15:36:57Z

In this PR we extend the new profilecli command query-blocks with a merge, analogous to profilecli query merge. With this command, you can execute queries directly to a single block hosted in your localhost or a remote bucket.

Partially solves #3559

Main trade-off

Note that opposed to profilecli query-blocks series, this can only query a single block.

Merging data from different blocks is a complex task. The implementation for that is distributed in the codebase. It implies defining query plans, using streams, and following/duplicating read path. That can be handled easily by a pyroscope server, but doing it in profilecli means duplicating code and introducing a good amount of boilerplate code for stream handling. For all those, I decided to simplify the capabilities here while still delivering some value, limiting the amount of blocks to query to just 1.

Funny enough, we should maybe rename merge to another command name here.

Capabilities

This feature gives similar capabilities as profilecli query merge but for a specified local/remote block:

You may choose the profile type --profile-type or specify a query with --query.
You may choose the output format (console, raw, pprof)
You can choose a --stacktrace-selector
You may choose to use it locally (--local-path) or remotely (--bucket-name, --tenant-id and --object-store-type - only gcs supported right now).
You specify queried block with the --block-ids flag.
Time ranges (to and from) are not needed: it will query the whole blocks instead.

doc

profilecl query-blocks merge --help
usage: profilecli query-blocks merge [<flags>]

Request merged profile.

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
  -v, --verbose                  Enable verbose logging.
      --local-path="./data/anonymous/local"
                                 Path to blocks directory.
      --bucket-name=BUCKET-NAME  The name of the object storage bucket.
      --object-store-type="gcs"  The type of the object storage (e.g., gcs).
      --block-ids=BLOCK-IDS ...  List of blocks ids to query on
      --tenant-id=TENANT-ID      Tenant id of the queried block for remote bucket
      --query="{}"               Label selector to query.
      --output="console"         How to output the result, examples: console, raw, pprof=./my.pprof
      --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"
                                 Profile type to query.
      --stacktrace-selector=STACKTRACE-SELECTOR ...
                                 Only query locations with those symbols. Provide multiple times starting with the root

Usage example:

Querying profiles on a local block, and filter on service_name

profilecli query-blocks merge --block-ids=01J9RQ8QENNY6ZEA84K30GZM1C --query='{service_name="ride-sharing-app"}' | head
level=info msg="query-block merge" blockIds=[01J9RQ8QENNY6ZEA84K30GZM1C] localPath=./data/anonymous/local bucketName= tenantId= query="{service_name=\"ride-sharing-app\"}" type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
PeriodType:
Period: 0
Samples:
/[dflt]
  580000000: 17 3 4 5 6 19 20 9 10 11 9 12 13 14
  140000000: 2 21 41
 1010000000: 17 104 105 24 4 5 6 19 20 9 10 11 9 12 13 14
   10000000: 138 130 95 86 87
  350000000: 17 3 4 5 6 7 8 9 10 11 9 12 13 14
 28330000000: 1 2 21 22 23 24 4 5 6 19 20 9 10 11 9 12 13 14
...

Querying series on a remote block, raw output:

profilecli query-blocks merge --bucket-name=dev-us-central-0-profiles-dev-001-data --tenant-id=1218 --block-ids=01J9RWPHE83FGAQCE0Z9GAXV4K --query='{service_name="profiles-dev-002/ingester", pod="pyroscope-ingester-1", span_name="HTTP POST - grpc_health_v1_health", __type__="cpu"}' --output raw | head
level=info msg="query-block merge" blockIds=[01J9RWPHE83FGAQCE0Z9GAXV4K] localPath=./data/anonymous/local bucketName=dev-us-central-0-profiles-dev-001-data tenantId=1218 query="{service_name=\"profiles-dev-002/ingester\", pod=\"pyroscope-ingester-1\", span_name=\"HTTP POST - grpc_health_v1_health\", __type__=\"cpu\"}" type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
&googlev1.Profile{
  SampleType: []*googlev1.ValueType{
    &googlev1.ValueType{
      Type: 0,
      Unit: 0,
    },
  },
  Sample: []*googlev1.Sample{
    &googlev1.Sample{
      LocationId: []uint64{

alsoba13 · 2024-10-09T16:10:33Z

cmd/profilecli/output.go

common code for output from query.go

aleks-p · 2024-10-09T17:48:32Z

Funny enough, we should maybe rename merge to another command name here.

"merge" in this context refers to merging multiple profiles and their samples to produce a single result (e.g., flamegraph, a pprof file, etc.). The name is still valid, even if we are operating on one block :)

kolesnikovae

Apparently, some files were added mistakenly (lel.txt and so on).

Also, I propose to revisit the way the CLI interface is extended

kolesnikovae · 2024-10-10T03:53:18Z

cmd/profilecli/query-blocks.go

+			Start:         meta.MinTime.Time().UnixMilli(),
+			End:           meta.MaxTime.Time().UnixMilli(),
+		},
+		100,


This is the max_nodes parameter. I'd say that it should be configurable. In case of pprof (SelectMergePprof) it should default to 0

kolesnikovae · 2024-10-10T04:16:12Z

cmd/profilecli/main.go

 	queryBlocksSeriesParams := addQueryBlocksSeriesParams(queryBlocksSeriesCmd)
+	queryBlocksMergeCmd := queryBlocksCmd.Command("merge", "Request merged profile.")
+	queryBlocksMergeParams := addQueryBlocksMergeParams(queryBlocksMergeCmd)


Let's design the CLI interface first. I believe that merge might be confusing.

I propose the following interface:

profilecli query merge // Already exists. Should be hidden and replaced in docs with "profile". profilecli query profile // Alias for "merge". profilecli query series // Existing subcommand. profilecli query go-pgo // Queries pprof for Go PGO. etc.

Now, in the command handler, we check whether --block flag is specified. There's a common practice to use singular form for flags that accept multiple values; the flag should be specified multiple times:

profilecli query series --block=A profilecli query series \ --block=A \ --block=B \

Next, let's make query subcommand to support storage backend configuration (this is very easy).

Finally, let's remove query-blocks subcommand.

Alternatively, we could extend the existing admin blocks subcommand:

profilecli admin block query profile profilecli admin block query series profilecli admin block query go-pgo etc.

However, I believe query X --block=A is more intuitive. On the other hand profilecli admin block query is more correct from the semantics standpoint.

I did a draft for both solutions:

refactor: unify query and query-blocks #3623

refactor: move query-blocks to admin blocks query #3625

I'm not including support for storage backend configuration yet. I'll do it once I've chose one.

I agree extending profilecli query may be more elegant in terms of usage. But on the other hand, I think it makes doc more difficult to express.

Here we have the --help. Marking with > those flags for pyroscope server mode, with < those flags for single blocks (--block required), and finally marking with = those flags that are used for both modes:

profilecli query profile --help Request merged profile. Flags: -h, --help Show context-sensitive help (also try --help-long and --help-man). --version Show application version. -v, --verbose Enable verbose logging. > --url="http://localhost:4040" URL of the profile store. > --tenant-id="" The tenant ID to be used for the X-Scope-OrgID header. > --username="" The username to be used for basic auth. > --password="" The password to be used for basic auth. > --protocol=connect The protocol to be used for communicating with the server. > --from="now-1h" Beginning of the query. > --to="now" End of the query. = --query="{}" Label selector to query. < --local-path="./data/anonymous/local" Path to blocks directory. < --bucket-name=BUCKET-NAME The name of the object storage bucket. < --object-store-type="gcs" The type of the object storage (e.g., gcs). < --block=BLOCK ... Block ids to query on (accepts multiples) < --block-tenant-id=BLOCK-TENANT-ID Tenant id of the queried block for remote bucket = --output="console" How to output the result, examples: console, raw, pprof=./my.pprof = --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds" Profile type to query. = --stacktrace-selector=STACKTRACE-SELECTOR ... Only query locations with those symbols. Provide multiple times starting with the root

So at the end you have a set of shared flags, some that will need some values if you use --block, and others that are only needed in the absence of that flag. But I think it's difficult to explain that you have multiple submodes here, without writing multiple clarifications.

So that makes me lean more towards admin blocks query solution, which in terms of docs feels more precise:

profilecli admin blocks query --help Request merged profile on local/remote block. Flags: -h, --help Show context-sensitive help (also try --help-long and --help-man). --version Show application version. -v, --verbose Enable verbose logging. --path="./data/anonymous/local" Path to blocks directory --bucket-name=BUCKET-NAME The name of the object storage bucket. --object-store-type="gcs" The type of the object storage (e.g., gcs). --block=BLOCK ... Block ids to query on (accepts multiples) --tenant-id=TENANT-ID Tenant id of the queried block for remote bucket --query="{}" Label selector to query. --output="console" How to output the result, examples: console, raw, pprof=./my.pprof --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds" Profile type to query. --stacktrace-selector=STACKTRACE-SELECTOR ... Only query locations with those symbols. Provide multiple times starting with the root

alsoba13 · 2024-10-10T07:27:53Z

cmd/profilecli/my.pprof

I think this file was introduced unintentionally time ago

feat: profilecli query-blocks merge

kolesnikovae

LGTM!

alsoba13 commented Oct 9, 2024

View reviewed changes

cmd/profilecli/output.go Outdated

Copy link

Contributor Author

alsoba13 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common code for output from query.go

alsoba13 force-pushed the alsoba13/query-blocks-merge branch from fabe9a7 to dade773 Compare October 9, 2024 16:13

alsoba13 marked this pull request as ready for review October 9, 2024 16:25

alsoba13 requested a review from a team as a code owner October 9, 2024 16:25

kolesnikovae requested changes Oct 10, 2024

View reviewed changes

alsoba13 force-pushed the alsoba13/query-blocks-merge branch from dade773 to 15a6074 Compare October 10, 2024 07:27

alsoba13 commented Oct 10, 2024

View reviewed changes

cmd/profilecli/my.pprof Outdated

Copy link

Contributor Author

alsoba13 Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file was introduced unintentionally time ago

feat: profilecli query-blocks merge

d9902ea

feat: profilecli query-blocks merge

alsoba13 force-pushed the alsoba13/query-blocks-merge branch from 15a6074 to d9902ea Compare October 14, 2024 07:29

kolesnikovae approved these changes Oct 14, 2024

View reviewed changes

alsoba13 merged commit 86427c6 into main Oct 14, 2024
18 checks passed

alsoba13 deleted the alsoba13/query-blocks-merge branch October 14, 2024 14:31

knylander-grafana mentioned this pull request Nov 15, 2024

[DOC] Add v1.10 release notes and fix page weights #3692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: profilecli query-blocks merge #3618

feat: profilecli query-blocks merge #3618

alsoba13 commented Oct 9, 2024 •

edited

Loading

alsoba13 Oct 9, 2024

aleks-p commented Oct 9, 2024

kolesnikovae left a comment •

edited

Loading

kolesnikovae Oct 10, 2024 •

edited

Loading

kolesnikovae Oct 10, 2024 •

edited

Loading

alsoba13 Oct 14, 2024 •

edited

Loading

alsoba13 Oct 10, 2024

kolesnikovae left a comment

feat: profilecli query-blocks merge #3618

feat: profilecli query-blocks merge #3618

Conversation

alsoba13 commented Oct 9, 2024 • edited Loading

Main trade-off

Capabilities

doc

Usage example:

alsoba13 Oct 9, 2024

Choose a reason for hiding this comment

aleks-p commented Oct 9, 2024

kolesnikovae left a comment • edited Loading

Choose a reason for hiding this comment

kolesnikovae Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

kolesnikovae Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

alsoba13 Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

alsoba13 Oct 10, 2024

Choose a reason for hiding this comment

kolesnikovae left a comment

Choose a reason for hiding this comment

alsoba13 commented Oct 9, 2024 •

edited

Loading

kolesnikovae left a comment •

edited

Loading

kolesnikovae Oct 10, 2024 •

edited

Loading

kolesnikovae Oct 10, 2024 •

edited

Loading

alsoba13 Oct 14, 2024 •

edited

Loading