Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: profilecli query-blocks merge #3618

Merged
merged 1 commit into from
Oct 14, 2024
Merged

Conversation

alsoba13
Copy link
Contributor

@alsoba13 alsoba13 commented Oct 9, 2024

In this PR we extend the new profilecli command query-blocks with a merge, analogous to profilecli query merge. With this command, you can execute queries directly to a single block hosted in your localhost or a remote bucket.

Partially solves #3559

Main trade-off

Note that opposed to profilecli query-blocks series, this can only query a single block.

Merging data from different blocks is a complex task. The implementation for that is distributed in the codebase. It implies defining query plans, using streams, and following/duplicating read path. That can be handled easily by a pyroscope server, but doing it in profilecli means duplicating code and introducing a good amount of boilerplate code for stream handling. For all those, I decided to simplify the capabilities here while still delivering some value, limiting the amount of blocks to query to just 1.

Funny enough, we should maybe rename merge to another command name here.

Capabilities

This feature gives similar capabilities as profilecli query merge but for a specified local/remote block:

  • You may choose the profile type --profile-type or specify a query with --query.
  • You may choose the output format (console, raw, pprof)
  • You can choose a --stacktrace-selector
  • You may choose to use it locally (--local-path) or remotely (--bucket-name, --tenant-id and --object-store-type - only gcs supported right now).
  • You specify queried block with the --block-ids flag.
  • Time ranges (to and from) are not needed: it will query the whole blocks instead.

doc

profilecl query-blocks merge --help
usage: profilecli query-blocks merge [<flags>]

Request merged profile.

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
  -v, --verbose                  Enable verbose logging.
      --local-path="./data/anonymous/local"
                                 Path to blocks directory.
      --bucket-name=BUCKET-NAME  The name of the object storage bucket.
      --object-store-type="gcs"  The type of the object storage (e.g., gcs).
      --block-ids=BLOCK-IDS ...  List of blocks ids to query on
      --tenant-id=TENANT-ID      Tenant id of the queried block for remote bucket
      --query="{}"               Label selector to query.
      --output="console"         How to output the result, examples: console, raw, pprof=./my.pprof
      --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"
                                 Profile type to query.
      --stacktrace-selector=STACKTRACE-SELECTOR ...
                                 Only query locations with those symbols. Provide multiple times starting with the root

Usage example:

Querying profiles on a local block, and filter on service_name

profilecli query-blocks merge --block-ids=01J9RQ8QENNY6ZEA84K30GZM1C --query='{service_name="ride-sharing-app"}' | head
level=info msg="query-block merge" blockIds=[01J9RQ8QENNY6ZEA84K30GZM1C] localPath=./data/anonymous/local bucketName= tenantId= query="{service_name=\"ride-sharing-app\"}" type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
PeriodType:
Period: 0
Samples:
/[dflt]
  580000000: 17 3 4 5 6 19 20 9 10 11 9 12 13 14
  140000000: 2 21 41
 1010000000: 17 104 105 24 4 5 6 19 20 9 10 11 9 12 13 14
   10000000: 138 130 95 86 87
  350000000: 17 3 4 5 6 7 8 9 10 11 9 12 13 14
 28330000000: 1 2 21 22 23 24 4 5 6 19 20 9 10 11 9 12 13 14
...

Querying series on a remote block, raw output:

profilecli query-blocks merge --bucket-name=dev-us-central-0-profiles-dev-001-data --tenant-id=1218 --block-ids=01J9RWPHE83FGAQCE0Z9GAXV4K --query='{service_name="profiles-dev-002/ingester", pod="pyroscope-ingester-1", span_name="HTTP POST - grpc_health_v1_health", __type__="cpu"}' --output raw | head
level=info msg="query-block merge" blockIds=[01J9RWPHE83FGAQCE0Z9GAXV4K] localPath=./data/anonymous/local bucketName=dev-us-central-0-profiles-dev-001-data tenantId=1218 query="{service_name=\"profiles-dev-002/ingester\", pod=\"pyroscope-ingester-1\", span_name=\"HTTP POST - grpc_health_v1_health\", __type__=\"cpu\"}" type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
&googlev1.Profile{
  SampleType: []*googlev1.ValueType{
    &googlev1.ValueType{
      Type: 0,
      Unit: 0,
    },
  },
  Sample: []*googlev1.Sample{
    &googlev1.Sample{
      LocationId: []uint64{

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common code for output from query.go

@alsoba13 alsoba13 force-pushed the alsoba13/query-blocks-merge branch from fabe9a7 to dade773 Compare October 9, 2024 16:13
@alsoba13 alsoba13 marked this pull request as ready for review October 9, 2024 16:25
@alsoba13 alsoba13 requested a review from a team as a code owner October 9, 2024 16:25
@aleks-p
Copy link
Contributor

aleks-p commented Oct 9, 2024

Funny enough, we should maybe rename merge to another command name here.

"merge" in this context refers to merging multiple profiles and their samples to produce a single result (e.g., flamegraph, a pprof file, etc.). The name is still valid, even if we are operating on one block :)

Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, some files were added mistakenly (lel.txt and so on).

Also, I propose to revisit the way the CLI interface is extended

Start: meta.MinTime.Time().UnixMilli(),
End: meta.MaxTime.Time().UnixMilli(),
},
100,
Copy link
Collaborator

@kolesnikovae kolesnikovae Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the max_nodes parameter. I'd say that it should be configurable. In case of pprof (SelectMergePprof) it should default to 0

Comment on lines 84 to +86
queryBlocksSeriesParams := addQueryBlocksSeriesParams(queryBlocksSeriesCmd)
queryBlocksMergeCmd := queryBlocksCmd.Command("merge", "Request merged profile.")
queryBlocksMergeParams := addQueryBlocksMergeParams(queryBlocksMergeCmd)
Copy link
Collaborator

@kolesnikovae kolesnikovae Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's design the CLI interface first. I believe that merge might be confusing.

I propose the following interface:

profilecli query merge   // Already exists. Should be hidden and replaced in docs with "profile".
profilecli query profile // Alias for "merge".
profilecli query series  // Existing subcommand.
profilecli query go-pgo  // Queries pprof for Go PGO.
etc.

Now, in the command handler, we check whether --block flag is specified. There's a common practice to use singular form for flags that accept multiple values; the flag should be specified multiple times:

profilecli query series --block=A

profilecli query series \
  --block=A \
  --block=B \

Next, let's make query subcommand to support storage backend configuration (this is very easy).

Finally, let's remove query-blocks subcommand.


Alternatively, we could extend the existing admin blocks subcommand:

profilecli admin block query profile
profilecli admin block query series
profilecli admin block query go-pgo
etc.

However, I believe query X --block=A is more intuitive. On the other hand profilecli admin block query is more correct from the semantics standpoint.

Copy link
Contributor Author

@alsoba13 alsoba13 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a draft for both solutions:

I'm not including support for storage backend configuration yet. I'll do it once I've chose one.

I agree extending profilecli query may be more elegant in terms of usage. But on the other hand, I think it makes doc more difficult to express.

Here we have the --help. Marking with > those flags for pyroscope server mode, with < those flags for single blocks (--block required), and finally marking with = those flags that are used for both modes:

profilecli query profile --help
Request merged profile.

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
  -v, --verbose                  Enable verbose logging.
>      --url="http://localhost:4040"  
                                 URL of the profile store.
>      --tenant-id=""             The tenant ID to be used for the X-Scope-OrgID header.
>      --username=""              The username to be used for basic auth.
>      --password=""              The password to be used for basic auth.
>      --protocol=connect         The protocol to be used for communicating with the server.
>      --from="now-1h"            Beginning of the query.
>      --to="now"                 End of the query.
=      --query="{}"               Label selector to query.
<      --local-path="./data/anonymous/local"  
                                 Path to blocks directory.
<      --bucket-name=BUCKET-NAME  The name of the object storage bucket.
<      --object-store-type="gcs"  The type of the object storage (e.g., gcs).
<      --block=BLOCK ...          Block ids to query on (accepts multiples)
<      --block-tenant-id=BLOCK-TENANT-ID  
                                 Tenant id of the queried block for remote bucket
=      --output="console"         How to output the result, examples: console, raw, pprof=./my.pprof
=      --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"  
                                 Profile type to query.
=      --stacktrace-selector=STACKTRACE-SELECTOR ...  
                                 Only query locations with those symbols. Provide multiple times starting with the root

So at the end you have a set of shared flags, some that will need some values if you use --block, and others that are only needed in the absence of that flag. But I think it's difficult to explain that you have multiple submodes here, without writing multiple clarifications.

So that makes me lean more towards admin blocks query solution, which in terms of docs feels more precise:

profilecli admin blocks query --help
Request merged profile on local/remote block.

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
  -v, --verbose                  Enable verbose logging.
      --path="./data/anonymous/local"  
                                 Path to blocks directory
      --bucket-name=BUCKET-NAME  The name of the object storage bucket.
      --object-store-type="gcs"  The type of the object storage (e.g., gcs).
      --block=BLOCK ...          Block ids to query on (accepts multiples)
      --tenant-id=TENANT-ID      Tenant id of the queried block for remote bucket
      --query="{}"               Label selector to query.
      --output="console"         How to output the result, examples: console, raw, pprof=./my.pprof
      --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"  
                                 Profile type to query.
      --stacktrace-selector=STACKTRACE-SELECTOR ...  
                                 Only query locations with those symbols. Provide multiple times starting with the root

@alsoba13 alsoba13 force-pushed the alsoba13/query-blocks-merge branch from dade773 to 15a6074 Compare October 10, 2024 07:27
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file was introduced unintentionally time ago

feat: profilecli query-blocks merge
@alsoba13 alsoba13 force-pushed the alsoba13/query-blocks-merge branch from 15a6074 to d9902ea Compare October 14, 2024 07:29
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alsoba13 alsoba13 merged commit 86427c6 into main Oct 14, 2024
18 checks passed
@alsoba13 alsoba13 deleted the alsoba13/query-blocks-merge branch October 14, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants