Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class value null initial cache array #8

Closed

Conversation

plevart
Copy link
Contributor

@plevart plevart commented Aug 15, 2020

null initial cacheArray in ClassValueMap

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 15, 2020

Welcome to the OpenJDK organization on GitHub!

This repository is currently a read-only git mirror of the official Mercurial repository (located at https://hg.openjdk.java.net/). As such, we are not currently accepting pull requests here. If you would like to contribute to the OpenJDK project, please see https://openjdk.java.net/contribute/ on how to proceed.

This pull request will be automatically closed.

@bridgekeeper bridgekeeper bot closed this Aug 15, 2020
ameisen referenced this pull request in ameisen/jdk-mc Sep 7, 2020
JornVernee referenced this pull request in JornVernee/jdk Nov 14, 2020
lewurm added a commit to lewurm/openjdk that referenced this pull request Oct 6, 2021
Restore looks like this now:
```
  0x0000000106e4dfcc:   movk    x9, #0x5e4, lsl openjdk#16
  0x0000000106e4dfd0:   movk    x9, #0x1, lsl openjdk#32
  0x0000000106e4dfd4:   blr x9
  0x0000000106e4dfd8:   ldp x2, x3, [sp, openjdk#16]
  0x0000000106e4dfdc:   ldp x4, x5, [sp, openjdk#32]
  0x0000000106e4dfe0:   ldp x6, x7, [sp, openjdk#48]
  0x0000000106e4dfe4:   ldp x8, x9, [sp, openjdk#64]
  0x0000000106e4dfe8:   ldp x10, x11, [sp, openjdk#80]
  0x0000000106e4dfec:   ldp x12, x13, [sp, openjdk#96]
  0x0000000106e4dff0:   ldp x14, x15, [sp, openjdk#112]
  0x0000000106e4dff4:   ldp x16, x17, [sp, openjdk#128]
  0x0000000106e4dff8:   ldp x0, x1, [sp], openjdk#144
  0x0000000106e4dffc:   ldp xzr, x19, [sp], openjdk#16
  0x0000000106e4e000:   ldp x22, x23, [sp, openjdk#16]
  0x0000000106e4e004:   ldp x24, x25, [sp, openjdk#32]
  0x0000000106e4e008:   ldp x26, x27, [sp, openjdk#48]
  0x0000000106e4e00c:   ldp x28, x29, [sp, openjdk#64]
  0x0000000106e4e010:   ldp x30, xzr, [sp, openjdk#80]
  0x0000000106e4e014:   ldp x20, x21, [sp], openjdk#96
  0x0000000106e4e018:   ldur    x12, [x29, #-24]
  0x0000000106e4e01c:   ldr x22, [x12, openjdk#16]
  0x0000000106e4e020:   add x22, x22, #0x30
  0x0000000106e4e024:   ldr x8, [x28, openjdk#8]
```
e1iu pushed a commit to e1iu/jdk that referenced this pull request Mar 29, 2022
This patch optimizes the backend implementation of VectorMaskToLong for
AArch64, given a more efficient approach to mov value bits from
predicate register to general purpose register as x86 PMOVMSK[1] does,
by using BEXT[2] which is available in SVE2.

With this patch, the final code (input mask is byte type with
SPECIESE_512, generated on an SVE vector reg size of 512-bit QEMU
emulator) changes as below:

Before:

        mov     z16.b, p0/z, #1
        fmov    x0, d16
        orr     x0, x0, x0, lsr openjdk#7
        orr     x0, x0, x0, lsr openjdk#14
        orr     x0, x0, x0, lsr openjdk#28
        and     x0, x0, #0xff
        fmov    x8, v16.d[1]
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#8

        orr     x8, xzr, #0x2
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#16

        orr     x8, xzr, #0x3
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#24

        orr     x8, xzr, #0x4
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#32

        mov     x8, #0x5
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#40

        orr     x8, xzr, #0x6
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#48

        orr     x8, xzr, #0x7
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#56

After:

        mov     z16.b, p0/z, #1
        mov     z17.b, #1
        bext    z16.d, z16.d, z17.d
        mov     z17.d, #0
        uzp1    z16.s, z16.s, z17.s
        uzp1    z16.h, z16.h, z17.h
        uzp1    z16.b, z16.b, z17.b
        mov     x0, v16.d[0]

[1] https://www.felixcloutier.com/x86/pmovmskb
[2] https://developer.arm.com/documentation/ddi0602/2020-12/SVE-Instructions/BEXT--Gather-lower-bits-from-positions-selected-by-bitmask-

Change-Id: Ia983a20c89f76403e557ac21328f2f2e05dd08e0
e1iu pushed a commit to e1iu/jdk that referenced this pull request Apr 21, 2022
This patch optimizes the backend implementation of VectorMaskToLong for
AArch64, given a more efficient approach to mov value bits from
predicate register to general purpose register as x86 PMOVMSK[1] does,
by using BEXT[2] which is available in SVE2.

With this patch, the final code (input mask is byte type with
SPECIESE_512, generated on an SVE vector reg size of 512-bit QEMU
emulator) changes as below:

Before:

        mov     z16.b, p0/z, #1
        fmov    x0, d16
        orr     x0, x0, x0, lsr openjdk#7
        orr     x0, x0, x0, lsr openjdk#14
        orr     x0, x0, x0, lsr openjdk#28
        and     x0, x0, #0xff
        fmov    x8, v16.d[1]
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#8

        orr     x8, xzr, #0x2
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#16

        orr     x8, xzr, #0x3
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#24

        orr     x8, xzr, #0x4
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#32

        mov     x8, #0x5
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#40

        orr     x8, xzr, #0x6
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#48

        orr     x8, xzr, #0x7
        whilele p1.d, xzr, x8
        lastb   x8, p1, z16.d
        orr     x8, x8, x8, lsr openjdk#7
        orr     x8, x8, x8, lsr openjdk#14
        orr     x8, x8, x8, lsr openjdk#28
        and     x8, x8, #0xff
        orr     x0, x0, x8, lsl openjdk#56

After:

        mov     z16.b, p0/z, #1
        mov     z17.b, #1
        bext    z16.d, z16.d, z17.d
        mov     z17.d, #0
        uzp1    z16.s, z16.s, z17.s
        uzp1    z16.h, z16.h, z17.h
        uzp1    z16.b, z16.b, z17.b
        mov     x0, v16.d[0]

[1] https://www.felixcloutier.com/x86/pmovmskb
[2] https://developer.arm.com/documentation/ddi0602/2020-12/SVE-Instructions/BEXT--Gather-lower-bits-from-positions-selected-by-bitmask-

Change-Id: Ia983a20c89f76403e557ac21328f2f2e05dd08e0
gnu-andrew added a commit to gnu-andrew/jdk that referenced this pull request Jun 22, 2022
gnu-andrew added a commit to gnu-andrew/jdk that referenced this pull request Jun 22, 2022
gnu-andrew added a commit to gnu-andrew/jdk that referenced this pull request Jul 10, 2022
fg1417 pushed a commit to fg1417/jdk that referenced this pull request Aug 17, 2022
After JDK-8283091, the loop below can be vectorized partially.
Statement 1 can be vectorized but statement 2 can't.
```
// int[] iArr; long[] lArrFld; int i1,i2;
for (i1 = 6; i1 < 227; i1++) {
  iArr[i1] += lArrFld[i1]++; // statement 1
  iArr[i1 + 1] -= (i2++); // statement 2
}
```

But we got incorrect results because the vector packs of iArr are
scheduled incorrectly like:
```
...
load_vector XMM1,[R8 + openjdk#16 + R11 << openjdk#2]
movl    RDI, [R8 + openjdk#20 + R11 << openjdk#2] # int
load_vector XMM2,[R9 + openjdk#8 + R11 << openjdk#3]
subl    RDI, R11    # int
vpaddq  XMM3,XMM2,XMM0  ! add packedL
store_vector [R9 + openjdk#8 + R11 << openjdk#3],XMM3
vector_cast_l2x  XMM2,XMM2  !
vpaddd  XMM1,XMM2,XMM1  ! add packedI
addl    RDI, openjdk#228   # int
movl    [R8 + openjdk#20 + R11 << openjdk#2], RDI # int
movl    RBX, [R8 + openjdk#24 + R11 << openjdk#2] # int
subl    RBX, R11    # int
addl    RBX, openjdk#227   # int
movl    [R8 + openjdk#24 + R11 << openjdk#2], RBX # int
...
movl    RBX, [R8 + openjdk#40 + R11 << openjdk#2] # int
subl    RBX, R11    # int
addl    RBX, openjdk#223   # int
movl    [R8 + openjdk#40 + R11 << openjdk#2], RBX # int
movl    RDI, [R8 + openjdk#44 + R11 << openjdk#2] # int
subl    RDI, R11    # int
addl    RDI, openjdk#222   # int
movl    [R8 + openjdk#44 + R11 << openjdk#2], RDI # int
store_vector [R8 + openjdk#16 + R11 << openjdk#2],XMM1
...
```
simplified as:
```
load_vector iArr in statement 1
unvectorized loads/stores in statement 2
store_vector iArr in statement 1
```
We cannot pick the memory state from the first load for LoadI pack
here, as the LoadI vector operation must load the new values in memory
after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2).
We must take the memory state of the last load where we have assigned
new values ('iArr[i1 + 1] - (i2++)') to the iArr array.

In JDK-8240281, we picked the memory state of the first load. Different
from the scenario in JDK-8240281, the store, which is dependent on an
earlier load here, is in a pack to be scheduled and the LoadI pack
depends on the last_mem. As designed[2], to schedule the StoreI pack,
all memory operations in another single pack should be moved in the same
direction. We know that the store in the pack depends on one of loads in
the LoadI pack, so the LoadI pack should be scheduled before the StoreI
pack. And the LoadI pack depends on the last_mem, so the last_mem must
be scheduled before the LoadI pack and also before the store pack.
Therefore, we need to take the memory state of the last load for the
LoadI pack here.

To fix it, the pack adds additional checks while picking the memory state
of the first load. When the store locates in a pack and the load pack
relies on the last_mem, we shouldn't choose the memory state of the
first load but choose the memory state of the last load.

[1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
[2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232

Jira: ENTLLT-5482
Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8
CustomizedGitHooks: yes
JimLaskey pushed a commit to JimLaskey/jdk that referenced this pull request Nov 16, 2022
JimLaskey pushed a commit to JimLaskey/jdk that referenced this pull request Nov 16, 2022
caojoshua pushed a commit to caojoshua/jdk that referenced this pull request Mar 21, 2023
…njdk#8)

Enforce position-independent materialization.

We double down JVMState/SafepointNode from the original AllocateNode.
This ensures that materialization isn't dependent on current JVMState.

Co-authored-by: Xin Liu <xxinliu@amazon.com>
gnu-andrew added a commit to gnu-andrew/jdk that referenced this pull request Mar 31, 2023
gnu-andrew added a commit to gnu-andrew/jdk that referenced this pull request Mar 31, 2023
iklam added a commit to veresov/jdk that referenced this pull request Jun 21, 2023
… now. See Test : openjdk#8, WithAOT (with loop) for "LIT" + (String)b in PrelinkedStringConcat.java
robehn pushed a commit to robehn/jdk that referenced this pull request Aug 15, 2023
gnu-andrew added a commit to gnu-andrew/jdk that referenced this pull request Aug 18, 2023
fg1417 pushed a commit to fg1417/jdk that referenced this pull request Nov 21, 2023
…ng into ldp/stp on AArch64

Macro-assembler on aarch64 can merge adjacent loads or stores
into ldp/stp[1]. For example, it can merge:
```
str     w20, [sp, openjdk#16]
str     w10, [sp, openjdk#20]
```
into
```
stp     w20, w10, [sp, openjdk#16]
```

But C2 may generate a sequence like:
```
str     x21, [sp, openjdk#8]
str     w20, [sp, openjdk#16]
str     x19, [sp, openjdk#24] <---
str     w10, [sp, openjdk#20] <--- Before sorting
str     x11, [sp, openjdk#40]
str     w13, [sp, openjdk#48]
str     x16, [sp, openjdk#56]
```
We can't do any merging for non-adjacent loads or stores.

The patch is to sort the spilling or unspilling sequence in
the order of offset during instruction scheduling and bundling
phase. After that, we can get a new sequence:
```
str     x21, [sp, openjdk#8]
str     w20, [sp, openjdk#16]
str     w10, [sp, openjdk#20] <---
str     x19, [sp, openjdk#24] <--- After sorting
str     x11, [sp, openjdk#40]
str     w13, [sp, openjdk#48]
str     x16, [sp, openjdk#56]
```

Then macro-assembler can do ld/st merging:
```
str     x21, [sp, openjdk#8]
stp     w20, w10, [sp, openjdk#16] <--- Merged
str     x19, [sp, openjdk#24]
str     x11, [sp, openjdk#40]
str     w13, [sp, openjdk#48]
str     x16, [sp, openjdk#56]
```

To justify the patch, we run `HelloWorld.java`
```
public class HelloWorld {
    public static void main(String [] args) {
        System.out.println("Hello World!");
    }
}
```
with `java -Xcomp -XX:-TieredCompilation HelloWorld`.

Before the patch, macro-assembler can do ld/st merging for
3688 times. After the patch, the number of ld/st merging
increases to 3871 times, by ~5 %.

Tested tier1~3 on x86 and AArch64.

[1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079
jatin-bhateja pushed a commit to jatin-bhateja/jdk that referenced this pull request Mar 4, 2024
openjdk-notifier bot pushed a commit that referenced this pull request Jun 18, 2024
openjdk-notifier bot pushed a commit that referenced this pull request Jul 31, 2024
Simplify, erase, remove ClassOption.STRONG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant