Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64/SVE: Add support to handle predicate registers as callee-trash #104065
Arm64/SVE: Add support to handle predicate registers as callee-trash #104065
Changes from 5 commits
b23e311
a15808b
570583f
a4b687c
2a175f9
c19234e
08f3748
1e69fd3
bcfd8a8
bb97d80
5535c69
775db15
a9b64a8
0d91f29
387ceef
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, I wonder if this code (for SVE vectors) should be refactored call out to an emit_R_R_I function instead of falling into the non-sve code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some guarantee that all the predicate temps end up adjacent on this list? Otherwise it seems like this indexing scheme might not work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see below, we iterate over all the
type
and calltmpPreAllocateTemps
with the number of slots we need for that type.runtime/src/coreclr/jit/lsra.cpp
Lines 7814 to 7828 in 2b4a4bc
In
tmpPreAllocateTemps()
, we iterate through the number of slots we want to allocate and create them:runtime/src/coreclr/jit/regset.cpp
Lines 694 to 708 in 2b4a4bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere I should just zero it out if we are not running on SVE machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the TP cost coming from the additional killed registers? I assume that's because we don't have a predicate registers equivalent of
compFloatingPointUsed
.I wonder if you could just add a case for predicate registers here:
runtime/src/coreclr/jit/lsrabuild.cpp
Line 3094 in 55f2bc6
And then during allocation, mask out the predicate registers when processing kills if no predicate registers were used.
We would still be creating additional RegRecords though, but maybe this helps a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I guess we were creating those
RegRecord
s even before this PR, so I imagine it would help quite a bit for this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am already doing it in https://github.com/dotnet/runtime/pull/104065/files#diff-ad66a6bcf1fd550d5ad10d995c03218afbbc39463d36e1f2a224f9ca070a2f99R858-R860. Predicate registers exist only in presence of floating point usage. Yes, we do the newly added extra predicate registers in
processKills()
and that's what show up impacting TP. For non-sve arm64 machine, we don't have to iterate through them.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although note that when we altjit, we say that "sve capability enable", so we will see predicate registers and will process them during kills. The TP information will be misleading for those cases, but I will add this anyway so that on non-sve arm64 machine, we do not process them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of predicate registers is going to be much more rare than using float registers, hence adding this extra check would help regardless.
I don't see a good reason to try optimizing for non-SVE machines. In the future we would expect most arm64 machines to be SVE enabled, right?
I think we should rather optimize for the common case of "predicate registers not used". It should be possible now that we are only creating one
RefTypeKill
per call.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Agree. I will do a separate pass for it. #104157 to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, I think I will just revert bcfd8a8 and will do it properly in #104157