Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should jvm.gc.duration histogram have any default buckets? #274

Closed
trask opened this issue Aug 18, 2023 · 15 comments · Fixed by #317
Closed

Should jvm.gc.duration histogram have any default buckets? #274

trask opened this issue Aug 18, 2023 · 15 comments · Fixed by #317
Assignees

Comments

@trask
Copy link
Member

trask commented Aug 18, 2023

Currently the jvm.gc.duration histogram has this bucket definition:

This metric SHOULD be specified with
ExplicitBucketBoundaries
of [] (single bucket histogram capturing count, sum, min, max).

Opening this as a tracking issue since it has come up as an open question from the semantic convention working group.

@jack-berg
Copy link
Member

My point of view:

  • Most users will just want a summary metrics on this (min, max, sum, count), and a single bucket histogram fulfills this
  • Users that care can use views upgrade to a histogram with bucket boundaries that reflect the thresholds they care about
  • It will be hard to find a default set of bucket boundaries that is useful in all GC situations. Not impossible, but will take some thoughtful analysis including data from real-world systems. And even after all that, we'll never get an answer that satisfies everyone.
  • If we insist on having default buckets, perhaps we keep the boundaries simple and informed by defaults for JVM gcs. For example, the G1 gc has a -XX:MaxGCPauseMillis=200. If we set the bucket boundaries to be [200], users can know the percentage of GCs which met the goal for desired maximum pause time.

@jack-berg
Copy link
Member

And more generally, how do we generally think about the tradeoff between size of metric and value with respect to adding more histogram buckets?

On one end of the spectrum, we could have very small buckets (i.e. one bucket per millisecond). These would produce unfeasibly large payloads, but offer high density for computing percentiles. On the other end, we have histograms with zero buckets, which have the smallest payload but offer the no ability to compute percentiles. Then we have the messy middle ground, where we try to choose a set of bucket boundaries that balances payload size with having buckets useful for computing percentiles.

In the case of http...duration, there was some prior art in the prometheus bucket boundaries which made the conversation straight forward. But prior art will often not be available.

Without a general set of guidelines for making this decision, I suspect each new proposed histogram metric will repeat this conversation.

@trask
Copy link
Member Author

trask commented Aug 28, 2023

We discussed in last week's Java SIG meeting.

One thing that we confirmed is that the gc durations (which are emitted by MemoryPoolMXBean#getUsage()), encompass the entire GC cycle, not only the "application pause" phase(s) of the GC cycle, which means that it isn't related to the -XX:MaxGCPauseMillis value.

Still trying to reach out to more JVM folks who may have idea(s) for bucket boundaries here.

@trask
Copy link
Member Author

trask commented Aug 28, 2023

Just reiterating what I think is the goal: if possible, to have a small number of buckets (<=3?) which would be useful for a majority of applications to help identify long GCs which would make sense to drill into.

However I'm wondering if this is possible, since it's generally the long "application pauses" (e.g. what you would get from gc logs) that are useful to drill into, and I'm just not sure we get that from these gc metrics.

@kittylyst
Copy link

The problem is, I don't think we can meaningfully characterize "a majority of applications" (or, for that matter, "long GC pauses") - there's just too much difference between workloads.

So I'm +1 on @jack-berg original comment.

@trask
Copy link
Member Author

trask commented Aug 30, 2023

What about 3 super generic bucket boundaries:

  • 0.1 seconds
  • 1 second
  • 10 seconds

Do we think this could be used to answer some basic questions for a reasonable set of applications, e.g. show me some long old (or young) GC events?

One possible advantage to having a couple of buckets is to make it more visible to users that this is a histogram that they can tune further if they want.

(sorry, just trying to play out all possibilities before we make the decision to not have any buckets)

@kittylyst
Copy link

I wouldn't even like to guess what percentage of application processes would be automatically killed (e.g. by k8s) if they experienced a 10s GC STW event.

My feeling is just that domain of possible workloads is just too complex for any single set of defaults to make sense.

Curse you, JVM, for being so applicable to such a wide range of possible execution parameters!

@jackshirazi
Copy link

We can leave out all the low latency applications, they know about monitoring the GC pause latencies and either do it another way or will configure the buckets as they need. So the remaining applications are broadly those that need reasonable inter-service latency pause times (typically these need pauses to be under 25ms), and those that need reasonable user-interaction pause times (pauses need to be under 250ms) and throughput applications that need to avoid a timeout (common ones are 5/10/30 seconds, usually because of proxy or comms issues at these boundaries). So for me these give slightly uneven boundaries of 0.025, 0.25 and 2.5 seconds.

Complicating this is that the young gen times are STW but as we've seen the old gen ones are not necessarily so we'd get some high values that don't actually matter

I'm fine with no histo. If there's a single bucket, I'd go for 250ms

@kittylyst
Copy link

Sound advice - maybe this should go into the documentation if we decide to go for single bucket? In fact, maybe a general writeup of the consensus would also be helpful.

@breedx-splk
Copy link
Contributor

breedx-splk commented Aug 31, 2023

If there's a single bucket, I'd go for 250ms

Might be pedantic, but if there's a single boundary then technically there are 2 buckets, right? The data above the boundary and the data below the boundary...right?

In any case, I appreciate the pragmatism in this discussion here. I do think that @jackshirazi has slightly better numbers (and reason for choosing those) vs. @trask's .1/1/10s. I especially think that it's important to have something in the lower tens of millis, in part because the threshhold of human visual perception is around 12ms.

I'm +1 for 0.025, 0.25, and 2.5s.

@trask
Copy link
Member Author

trask commented Aug 31, 2023

How important do we think limiting buckets for cost is?

This has got me wondering about using the same buckets as http durations, since it gives nice coverage of the range of interesting timings discussed above.

[ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ]

@PeterF778
Copy link

I don't want to be too picky, but what is the point of having a boundary of 0 if we know the values we observe cannot be negative?

@trask
Copy link
Member Author

trask commented Sep 1, 2023

hm, I'm not sure, I just opened #298 to get more attention to this question

@trask
Copy link
Member Author

trask commented Nov 9, 2023

Proposal: [0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10]

Reasoning:

@jack-berg
Copy link
Member

I'd rather go with the original [0.01, .1, 1, 10]. Its easy to make the argument that more buckets is useful, but subjectively speaking from my own intuition, I think that fewer buckets will suffice for most users. I think if we go with more buckets, more often than not, users who look closely will opt to reduce the number of buckets. In contrast if we go with fewer buckets, more often than not, I think users will be content and stick with the default.

I wouldn't block the proposal for more buckets, but I do think less is best in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

7 participants