Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(profilecli): clarifying the CLI help message and documentation for aggregate-callees #3638

Merged
merged 5 commits into from
Oct 23, 2024

Conversation

JansonLv
Copy link
Contributor

In the default configuration related to profilecli query go-pgo, the exported PGO file does not optimize or improve performance. Using --no-aggregate-callees results in a performance boost, but the prompt for the aggregate-callees parameter is not user-friendly and may mislead users into thinking they should use --aggregate-callees=false. Therefore, I suggest changing the default value of aggregate-callees to false and adding relevant prompts.

@JansonLv JansonLv requested a review from a team as a code owner October 18, 2024 03:08
@CLAassistant
Copy link

CLAassistant commented Oct 18, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Oct 18, 2024

Hi @JansonLv, thanks for the contribution!

Indeed, I believe a clarification could improve UX. The --no-X semantics is enforced with the CLI framework we use, and I think I should have added a clarification regarding --no-aggregate-callees.

However, I have a couple of questions:

In the default configuration related to profilecli query go-pgo, the exported PGO file does not optimize or improve performance. Using --no-aggregate-callees results in a performance boost

I'd like to learn more about the case:

  • Which Go version you're using
  • Can we get profiles exported with and without aggregate-callees option? I understand that profiles may include sensitive data; but dropping strings from the pprof file should help (I can provide you a script that does this)

For the context: the go-pgo optimization is inspired by the discussion and the snippet by Michael Pratt (Go team). I suspect that in the most recent Go version, it may be harmful. Another possibility is that there is a bug in how the callees are being aggregated.

@kolesnikovae kolesnikovae self-requested a review October 18, 2024 06:35
@kolesnikovae kolesnikovae self-assigned this Oct 18, 2024
@JansonLv
Copy link
Contributor Author

JansonLv commented Oct 18, 2024

@kolesnikovae Hi, thank you for your reply. I am using version go1.22.7.

@kolesnikovae
Copy link
Collaborator

Thank you @JansonLv! I'll double check that our go-pgo works as expected on go tip. In the meantime, you can find the script in the cmd/trimstrings branch – the simplest way to run it is to clone pyroscope repo and run

go install ./cmd/trimstrings

@kolesnikovae kolesnikovae changed the title fix the PGO issue, modify the default value of aggregate-callees and … fix(plrofilecli): disable aggregate-callees in go-pgo query by default Oct 18, 2024
@kolesnikovae kolesnikovae changed the title fix(plrofilecli): disable aggregate-callees in go-pgo query by default fix(profilecli): disable aggregate-callees in go-pgo query by default Oct 18, 2024
@JansonLv
Copy link
Contributor Author

@kolesnikovae thanks,Please check if this is the file you are looking for.
with_agg_profile_clean.pb.gz
without_agg_profile_clean.pb.gz

@JansonLv
Copy link
Contributor Author

hi,@kolesnikovae Sorry, that was generated by another project; you can review the information from these two files, and I have modified the trimstrings code, processing only part of the information.
with_agg_profile_clean.pb.gz
without_agg_profile_clean.pb.gz

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Oct 18, 2024

Thank you so much for providing the samples!

Profiles look good, however they contain surprisingly few samples ~800 and ~1000 samples correspondingly (16.64s of CPU time in both). Idea of go-pgo query was to reduce the profile size and speed up Go builds – usually, when you merge profiles over a period of time (e.g., a couple of days), they include millions of samples, which becomes a problem.

UPD: Same for the second pair: 756/1223/17.45s. I'd suggest querying a longer period of time.

I'm wondering how you've measured the impact of PGO – is it possible to check the compiler logs?

UPD2: correct command:

go build -pgo=auto -gcflags="-m -d=pgodebug=99" . 2>&1 | grep -e "inlining call to" -e "PGO devirtualizing call to" | wc -l

@JansonLv
Copy link
Contributor Author

JansonLv commented Oct 18, 2024

@kolesnikovae I obtained this through stress testing the service and did not use any related Go commands. Thank you for your guidance.

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Oct 18, 2024

Thank you for sharing this! I'd like to figure out why aggregation affects PGO results, because this is really helpful feature and I wouldn't like disabling it by default.

In the log I see use of CGO and lots of undefined reference errors and that both gcc and ld failed – I'm curious if the build was successful. Could you please build the program with -m -d=pgodebug=99 added to -gcflags? Something like:

go build -pgo=auto -gcflags="-m -d=pgodebug=99" . 2>&1 | grep -e "inlining call to" -e "PGO devirtualizing call to" | wc -l

This will tell us how many optimisations Go compiler made (very roughy), and estimate impact of the PGO (not the app performance).

As for load/stress tests – as far as I understand from the log, it might involve IO (message broker), which would very likely dominate the result. Usually, the expected improvement (reduce in CPU consumption) is within 2-5%. Also, please note that PGO won't help with C code.

@JansonLv
Copy link
Contributor Author

JansonLv commented Oct 18, 2024

hi,@kolesnikovae I'm so sorry to keep you waiting for so long. It took me some time to set up the relevant environment locally.

In the log I see use of CGO and lots of undefined reference errors and that both gcc and ld failed – I'm curious if the build was successful.

There is indeed a problem, which is why I didn't add the required tag for github.com/confluentinc/confluent-kafka-go/v2/kafka."

in the end, the result is still 0, even though I used the one obtained through the /debug/pprof/profile. I also want to figure out why this is happening.

I think we can keep the aggregation for now, as it will take some time to investigate this issue.

Enjoy your weekend

./main.go:6:13: PGO devirtualize considering call cmd.Execute()
{"Pkg":"main","Pos":"/mnt/c/Users/bodhi/worker/server/main.go:6:13","Caller":"main.main","Direct":true,"Interface":false,"Weight":0,"Hottest":"server/cmd.Execute","HottestWeight":0,"Devirtualized":"","DevirtualizedWeight":0}
hot-callsite-thres-from-CDF=0.02697599136768276
hot-cg before inline in dot format:
digraph G {
forcelabels=true;
"main.main" [color=black, style=solid, label="main.main"];
"server/cmd.Execute" [color=black, style=solid, label="server/cmd.Execute"];
edge [color=black, style=solid];
"main.main" -> "server/cmd.Execute" [label="0.00"];
}
./main.go:5:6: can inline main

@kolesnikovae
Copy link
Collaborator

Hi @JansonLv, no worries at all – I'm here to help. Thank you for sharing this! Please let me know if I can help you in any way.

In the meantime, I believe that clarifying the CLI help message and documentation (in ./docs) would improve the user experience. If you're willing to contribute, your effort would be very welcome :) Otherwise, I can take care of this

@JansonLv
Copy link
Contributor Author

Hi,@kolesnikovae, I'm very honored to contribute to Grafana and improve the user experience. I will make the necessary modifications and commit them as soon as possible. I'm also glad to have your help.

@JansonLv JansonLv requested a review from a team as a code owner October 19, 2024 04:36
@JansonLv
Copy link
Contributor Author

Hi, @kolesnikovae, I have submitted the code, but I am uncertain whether the last sentence ‘Try both options to see which gives better for your PGO’ is suitable. I hope to get your opinion on this.

Wish you a pleasant weekend

@JansonLv JansonLv changed the title fix(profilecli): disable aggregate-callees in go-pgo query by default docs(profilecli): clarifying the CLI help message and documentation for aggregate-callees Oct 21, 2024
Co-authored-by: Anton Kolesnikov <anton.e.kolesnikov@gmail.com>
@JansonLv
Copy link
Contributor Author

hi, @kolesnikovae , I noticed that the hot-callsite-thres-from-CDF flag, which is related to PGO (Profile-Guided Optimization), may vary depending on the profile. So, would the parameters obtained from the original profile differ from those obtained after getting PGO through Pyroscope and then performing PGO compilation? It should be related to keep-locations, but there shouldn't be a connection if it's large enough; theoretically, it shouldn't have anything to do with aggregate-callees either, right?

@kolesnikovae
Copy link
Collaborator

I think that CDF will be affected in any case. This is what is going on (simplified):

  • a PGO profile is a graph { node, edge }, where each node represents a call site, and edges (function A calls B) have weights.
  • PGO finds "hot nodes": nodes connected by the most heavy edges that give 99% (default) of the total weight.
  • hot-callsite-thres-from-CDF is the weight of the edge that made it to go over the threshold.

Thus, the aggregation and trimming will affect the hot-callsite-thres-from-CDF, because they change the graph. Nevertheless, its impact on the PGO results is expected to be negligible. As far as I can see, hot-callsite-thres-from-CDF is only used in debug purposes

@JansonLv
Copy link
Contributor Author

@kolesnikovae , Thank you for explanation, I will spend the next two weeks comparing them in practice online to decide whether to continue exploring further.

@kolesnikovae
Copy link
Collaborator

Thank you for your patience, @JansonLv! I'm very interested in the results of your research, and I would greatly appreciate it if you could share your experiences on how Pyroscope could be enhanced to better support PGO – your insights would be highly valuable!

Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JansonLv
Copy link
Contributor Author

JansonLv commented Oct 22, 2024

Thank you for your patience, @kolesnikovae

Based on online practice, it has been demonstrated that using the --no—aggregate-callees parameter is more likely to bring about performance improvements.

We have two web services:

one of these services involves a lot of I/O processing and has high latency. In this case, PGO did not result in any noticeable improvement.

For the other web service, which only has a small number of I/O requests, when we initially deployed it with the --aggregate-callees flag, there was almost no observable improvement. However, during the second deployment with the --no—aggregate-callees flag, the results were significantly better, with CPU load decreasing by about 15%.

This situation aligns with my expectations but seems to be outside of yours. It appears that we need to delve deeper into this issue.

Given this, should we reopen an issue to specifically discuss this problem? As this is not something that can be resolved quickly.

This is a comparison before (--aggregate-callees) and after (--no—aggregate-callees) the deployment:

image

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Oct 22, 2024

Hi @JansonLv, of course, please feel free to create a new issue dedicated to the problem. It would be great if we could take a look at the compilation log with pgodebug turned on

@JansonLv
Copy link
Contributor Author

Thank you for your patience, @kolesnikovae,I have added two compilation logs, but they are too complex, and I may not be able to provide much insight. See if they can be of any help to you.

#3645

Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the docs!

@kolesnikovae kolesnikovae merged commit 860d824 into grafana:main Oct 23, 2024
18 checks passed
@knylander-grafana knylander-grafana added the type/docs Improvements for doc docs. Used by Docs team for project management label Oct 24, 2024
@JansonLv JansonLv deleted the fix/pgo-aggregate-callees branch November 19, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Improvements for doc docs. Used by Docs team for project management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants