Skip to content

Adds stats for the tier 2 optimizer #109329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markshannon opened this issue Sep 12, 2023 · 8 comments
Closed

Adds stats for the tier 2 optimizer #109329

markshannon opened this issue Sep 12, 2023 · 8 comments
Assignees

Comments

@markshannon
Copy link
Member

markshannon commented Sep 12, 2023

Currently we have no stats for anything regarding the tier 2 optimizer.
Without them we are making too many guesses about what we should be doing.

The performance numbers tell us that things aren't working as well as they should, although not too badly either.
However, performance numbers tell us nothing about why that is, or what is happening.

For example, #109038 should have increased the important ratio of (number of uops executed)/(traces started) but we have no idea if it actually did.

We need the following stats soon:

  • Total micro-ops executed
  • Total number of traces started
  • Total number of traces created
  • Optimization attempts

The following stats would also be nice, but are less urgent:

  • Per uop execution counts, like we have for tier 1 instructions.
  • Exit reason counts: polymorphism vs branch mis-prediction
  • A histogram of uops executed per trace

Linked PRs

@brandtbucher
Copy link
Member

We've been collecting and dumping stats for all of the counters in your first list for three months now... but they seem to be ignored in the summarize_stats.py script:

  • Optimization uops executed
  • Optimization traces executed
  • Optimization traces created
  • Optimization attempts

@mdboom, maybe you know why these aren't being included in the markdown summaries? From a quick skim, it looks like it might be because we ignore any counters that don't start with "Calls to", "Frame", "GC", or "Object". Maybe we should rework this to not ignore new counters?

@brandtbucher
Copy link
Member

I'd also like to see the reasons why projecting stopped:

  • Trace too long (anecdotally, I think this is the leading reason... for example, all five of nbody's traces should be 68-286 uops, but we cap them at 64, which means no loops are closed)
  • Unsupported opcode (maybe even with counters for each offender)
  • Inner loop found
  • Too many frame pushes (currently capped at a depth of 5)
  • Too many frame pops (if we return from the original frame)
  • Recursive function call (I don't think we bail on mutual recursion, but detecting that shouldn't be too hard since we have all of the code objects handy)
  • Call to unknown code object

@gvanrossum, any other stats we might like to see?

@gvanrossum
Copy link
Member

Those lists look pretty exhaustive. I’d also count anything that I deemed worthy of a DPRINTF call.

@mdboom
Copy link
Contributor

mdboom commented Sep 25, 2023

Maybe we should rework this to not ignore new counters?

Seems plausible. Let's coordinate on how to get this done -- I'm happy to take this on as I have a bit more time these days.

@mdboom
Copy link
Contributor

mdboom commented Sep 26, 2023

I have a PR for the basics up, and then I will tackle some of the other suggestions in smaller chunks.

Some questions:

  • Currently the Tier 2 interpreter affects the results for the Tier 1 interpreter by virtue of calling STAT_INC. Is that just by accident of how we got here? I would assume we want to have completely separate opcode execution counts for the Tier 1 and Tier 2 interpreters, but thought I should confirm.

  • "A histogram of uops executed per trace" -- I assume this is a histogram of the number of uops executed per trace, not a count-by-type-of-uop-per-trace?

@gvanrossum
Copy link
Member

  • Currently the Tier 2 interpreter affects the results for the Tier 1 interpreter by virtue of calling STAT_INC. Is that just by accident of how we got here?

I think it would be more useful to have separate counters per tier. But it would be somewhat complex to do that -- the header files that define them would have to check for TIER_ONE and TIER_TWO (only one of these should be defined).

"A histogram of uops executed per trace" [...]

Sounds like that question is for @brandtbucher.

@brandtbucher
Copy link
Member

brandtbucher commented Sep 27, 2023

I assume this is a histogram of the number of uops executed per trace, not a count-by-type-of-uop-per-trace?

Yeah, we want to see the distribution in number of uops executed per entry into tier two.

That's helpful because we can execute many uops in traces that are statically short (if they close a hot loop) and few uops in traces that are statically long (if we deopt quickly). We want to optimize for "uops executed in tier two before deopting", not necessarily "trace projection length".

To ease implementation, it's fine to bucket these. The distribution will probably be heavily skewed towards small numbers (since we tend to enter and exit "bad" traces more often), so maybe powers of ten or two would work best?

mdboom added a commit to mdboom/cpython that referenced this issue Oct 5, 2023
gvanrossum pushed a commit that referenced this issue Oct 31, 2023
This keeps a separate 'miss' counter for each micro-opcode, incremented whenever a guard uop takes a deoptimization side exit.
FullteaR pushed a commit to FullteaR/cpython that referenced this issue Nov 3, 2023
This keeps a separate 'miss' counter for each micro-opcode, incremented whenever a guard uop takes a deoptimization side exit.
@markshannon
Copy link
Member Author

This is all working nicely now.

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
This keeps a separate 'miss' counter for each micro-opcode, incremented whenever a guard uop takes a deoptimization side exit.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
This keeps a separate 'miss' counter for each micro-opcode, incremented whenever a guard uop takes a deoptimization side exit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants