-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support in crossgen2 for 32-byte alignment #32602
Conversation
Cc @mjsabby since the steady state throughput achievable with R2R is in his area of interest. |
Thanks @MichalStrehovsky. Sounds good. What is the reason for not doing Tier1 code in Crossgen2 by default? |
From looking at how the Tier1 flag is used within RyuJIT, it seems like up until the 32byte alignment pull request, setting the flag made no difference for the generated code (it does try to enable FAST_CODE but since Crossgen2 could set it (not sure if it would be good to set it by default - the default still tries to balance size and speed), but it's more of a question of how |
Right, overloading JIT/EE interface flags like this always ends up poorly. TIER1 flag should be set only when we are running as a JIT and producing TIER1 code.
That's something we may want to fix. |
This hits on some broader issues:
To express crossgen2's "fastest code possible" option, |
We might still be interested in that when we do perf tests comparing R2R code with jitted code - so maybe this pull request on its own could be still useful (we can just whack the Tiet-1 flag in for testing, but we need to pipe through the resulting alignments).
Do you know if there's an issue tracking that? I can make one if there isn't. |
I think we can use #7751, it has most of the previous discussion on this topic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, while this has no effect due to a lack of triggering the jit to produce the higher alignments, I believe that we should attempt to respect the jit's request for alignment as much as possible. (In case something comes along like, copying Vector256 constants into aligned memory for improved vector performance work or something.
I believe the current opt for size/opt for speed arguments to crossgen2 actually flow into the CorJitFlag.CORJIT_FLAG_SIZE_OPT flag that controls swapping between MinimumFunctionAlignment and OptimumFunctionAlignment. I believe we should probably have a separate conversation about the idea of optimize for size/speed/general purpose, as I think this is a relatively small portion of that discussion.
Thanks David! I think this code will also be useful when we want to do perf measurements comparing with JITted code. Additional stability would be useful. |
What's in this pull request won't take effect in crossgen2 right now because crossgen2 doesn't set the Tier-1 flag. This pull request is more of a discussion point that will either see another commit, or will be closed.
32-byte alignment was added in #2249, affecting Tier-1 code only. One of the other pull requests that are open right now was discussing
compCodeOpt
and that made me realize this might be worth revisiting: I wanted to switch the check in JIT fromemitComp->opts.jitFlags->IsSet(JitFlags::JIT_FLAG_TIER1)
tocompCodeOpt() == FAST_CODE
(so that this takes effect for crossgen2 as well), only to find out that compCodeOpt is currently hardcoded to return BLENDED.Do we have an issue tracking what we want to do with
compCodeOpt
? There are customers who rely solely on R2R and giving them a knob that says "give me fastest code possible" might be useful for them. Crossgen2 has that knob, it just seems like it doesn't do anything.Alternatively, we could turn on the Tier-1 flag in crossgen2 when the user requests optimizing for time.
Cc @dotnet/crossgen-contrib @AndyAyersMS