-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8336464: C2: Force CastX2P to be a two-address instruction #20159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch forces `CastX2P` to be a two-address instruction, so that C2 could allocate the same register for dst and src. Then we can remove the instruction completely in the assembly. The motivation comes from some cast operations like `castPP`. The difference for ADLC between `castPP` and `CastX2P` lies in that `CastX2P` always has different types for dst and src. We can force ADLC to generate an extra `two_adr()` for `CastX2P` like it does automatically for `castPP`, which could tell register allocator that the instruction needs the same register for dst and src. However, sometimes, RA and GCM in C2 can't work as we expected. For example, we have Assembly on the existing code: ``` ldp x10, x11, [x17,openjdk#136] add x10, x10, x15 add x11, x11, x10 ldr x12, [x17,openjdk#152] str x16, [x10] add x10, x12, x15 str x16, [x11] str x16, [x10] ``` After applying the patch, the assembly is: ``` ldr x10, [x16,openjdk#136] <--- 1 add x10, x10, x15 ldr x11, [x16,openjdk#144] <--- 2 mov x13, x10 <--- 3 str x17, [x13] ldr x12, [x16,openjdk#152] add x10, x11, x10 str x17, [x10] add x10, x12, x15 str x17, [x10] ``` C2 generate a totally extra mov, see 3, and we even lost the chance to merge load pair, see 1 and 2. That's terrible. Although this scenario would disappear after combining with openjdk#20157, I'm still not sure if this patch is worthwhile.
|
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
src/hotspot/share/adlc/output_h.cpp
Outdated
| uint matching_input = instr->two_address(_globalNames); | ||
|
|
||
| #if defined(AARCH64) | ||
| // Allocate the same register for src and dst, then we can remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't so much AArch64 specific as specific to any machine that doesn't have separate address and data registers. x86 prefers some registers to form addresses, but for others perhaps either a target macro or a callback function in $cpu.ad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matcher::use_same_src_and_dest_reg_for_CastX2P ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, @theRealAph . Updated it in the new commit.
CastX2P to be a two-address instructionCastX2P to be a two-address instruction
CastX2P to be a two-address instruction
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this change. There reason we keep src and dst in different register for different types is most likely for cases when src could be used in other operations. Overwriting src register may give you more spills than before.
If there are no other src usages RA should handle this I think in shared code.
Thanks for your review @vnkozlov . In my initial idea, if we keep In the final code, we can remove If we keep In the final code, it will be: I thought that keeping But I tried some written cases showed in the PR description, which violated my thoughts in an unexpected way. Then I'm also not sure about it. Anyway, it's a try and comments are welcome :) |
|
One idea I had long time ago JDK-6768706 is to add mach instructions with complex matching rules which uses CatX2P/CastP2X in common patterns. Then you can avoid moves between registers. |
|
I thought reusing one of the inputs for the destination is the default, and we have to add TEMP to rules to prevent this from happening. So I don't understand why sometimes the register allocator doesn't reuse the register when there are no other uses. There is a concept of "chain rule" in ADLC that I don't quite understand, but I suspect that it is related. |
|
@fg1417 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
Thanks for all your reviews and comments! I'll come back when I find a better way. Now, I'd like to convert to draft :) |
|
@fg1417 This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
3 similar comments
|
@fg1417 This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@fg1417 This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@fg1417 This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@fg1417 This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
|
You may be over-reacting. |
|
On the other hand, the move generated by |
Hi @theRealAph, thanks for your comments. That quite makes sense to me and I learnt a lot! I tried some hand-written assembly benchmarks. As you expected, the performance results of
Yes, there may be no obvious performance uplift after removing these zero latency Thanks. |
This patch forces
CastX2Pto be a two-address instruction, so that C2 could allocate the same register fordstandsrc. Then we can remove the instruction completely in the assembly.The motivation comes from some cast operations like
castPP. The difference for ADLC betweencastPPandCastX2Plies in thatCastX2Palways has different types fordstandsrc. We can force ADLC to generate an extratwo_adr()forCastX2Plike it does automatically forcastPP, which could tell register allocator that the instruction needs the same register fordstandsrc.However, sometimes, RA and GCM in C2 can't work as we expected.
For example, we have Assembly on the existing code:
After applying the patch independently, the assembly is:
C2 generates a totally extra
mov, see 3, and we even lost the chance to merge load pair, see 1 and 2. That's terrible.Although this scenario would disappear after combining with #20157, I'm still not sure if this patch is worthwhile.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20159/head:pull/20159$ git checkout pull/20159Update a local copy of the PR:
$ git checkout pull/20159$ git pull https://git.openjdk.org/jdk.git pull/20159/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20159View PR using the GUI difftool:
$ git pr show -t 20159Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20159.diff
Using Webrev
Link to Webrev Comment