8342103: C2 compiler support for Float16 type and associated scalar operations #22754

jatin-bhateja · 2024-12-15T18:05:02Z

Hi All,

This patch adds C2 compiler support for various Float16 operations added by PR#22128

Following is the summary of changes included with this patch:-

Detection of various Float16 operations through inline expansion or pattern folding idealizations.
Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
- These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
New Ideal type for constant and non-constant Float16 IR nodes. Please refer to FAQs for more details.
Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
X86 backend implementation for all supported intrinsics.
Functional and Performance validation tests.

Kindly review the patch and share your feedback.

Best Regards,
Jatin

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8342103: C2 compiler support for Float16 type and associated scalar operations (Enhancement - P4)

Contributors

Paul Sandoz <psandoz@openjdk.org>
Bhavana Kilambi <bkilambi@openjdk.org>
Joe Darcy <darcy@openjdk.org>
Raffaello Giulietti <rgiulietti@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754
$ git checkout pull/22754

Update a local copy of the PR:
$ git checkout pull/22754
$ git pull https://git.openjdk.org/jdk.git pull/22754/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22754

View PR using the GUI difftool:
$ git pr show -t 22754

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22754.diff

Using Webrev

Link to Webrev Comment

jatin-bhateja · 2024-12-15T18:05:22Z

Some FAQs on the newly added ideal type for half-float IR nodes:-

Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type?
A. Newly defined half float type named TypeH is special as its basic type is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register.

Q. Problem with ConF?
A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width.

All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short.

bridgekeeper · 2024-12-15T18:05:58Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

jatin-bhateja · 2024-12-15T18:06:11Z

/contributor add @PaulSandoz

jatin-bhateja · 2024-12-15T18:06:25Z

/contributor add @Bhavana-Kilambi

openjdk · 2024-12-15T18:06:39Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2024-12-15T18:07:08Z

@jatin-bhateja
Contributor Paul Sandoz <psandoz@openjdk.org> successfully added.

jatin-bhateja · 2024-12-15T18:07:08Z

/contributor add @jddarcy

jatin-bhateja · 2024-12-15T18:07:29Z

/contributor add @rgiulietti

openjdk · 2024-12-15T18:07:32Z

@jatin-bhateja
Contributor Bhavana Kilambi <bkilambi@openjdk.org> successfully added.

openjdk · 2024-12-15T18:08:02Z

@jatin-bhateja
Contributor Joe Darcy <darcy@openjdk.org> successfully added.

openjdk · 2024-12-15T18:08:28Z

@jatin-bhateja
Contributor Raffaello Giulietti <rgiulietti@openjdk.org> successfully added.

openjdk · 2024-12-15T18:08:56Z

@jatin-bhateja The following labels will be automatically applied to this pull request:

core-libs
graal
hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

jatin-bhateja · 2024-12-15T18:09:18Z

/label add hotspot-compiler-dev

openjdk · 2024-12-15T18:09:26Z

@jatin-bhateja
The hotspot-compiler label was successfully added.

mlbridge · 2024-12-15T18:19:32Z

Webrevs

eme64

Can you quickly summarize what tests you have, and what they test?

eme64 · 2024-12-16T07:19:43Z

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java

-    @IR(applyIfCPUFeatureOr = {"f16c", "true", "avx512vl", "true", "zvfh", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
+    @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"},
+        counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
+    @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"},
+        counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
+    @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "zvfh", "true"},
+        counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})


Looks like this is having vector changes?
And this is pre-existing: but why are we using VECTOR_SIZE_ANY here? Can we not know the vector size? Maybe we can introduce a new tag max_float16 or max_hf. And do something like this:
IRNode.VECTOR_SIZE + "min(max_float, max_hf)", "> 0"

The downside with using ANY is that the exact size is not tested, and that might mean that the size is much smaller than ideal.

Hi @eme64 , Test modification looks ok to me, we intend to trigger these IR rules on non AVX512-FP16 targets.
On AVX512-FP16 target compiler will infer scalar float16 add operation which will not get auto-vectorized.

jatin-bhateja · 2024-12-16T08:32:32Z

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

eme64 · 2024-12-16T09:03:38Z

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

jatin-bhateja · 2024-12-16T14:19:49Z

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

Validations details:-

A) x86 backend changes
   - new assembler instruction
   - macro assembly routines. 
    Test point:-  test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java
         - This test is based on a testng framework and includes new DataProviders to generate test vectors.
         -  Test vectors cover the entire float16 value range and also special floating point values (NaN, +Int, -Inf, 0.0 and -0.0) 
B) GVN transformations:-
     -  Value Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
              -  Covers all the constant folding scenarios for add, sub, mul, div, sqrt, fma, min, and max operations addressed by this patch.
              -  It also tests special case scenarios for each operation as specified by Java language specification.
    -   identity Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
               -  Covers identity transformation for  ReinterpretS2HFNode,  DivHFNode
    -  idealization Transforms
        Test points:-  test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java
                                :-   test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
            - Contains test point for the following transform 
                         MulHF idealization i.e. MulHF * 2 => AddHF  
           -  Contains test point for the following transform
                         DivHF SRC ,  PoT(constant) =>  MulHF SRC * reciprocal (constant) 
            - Contains idealization test points for the following transform 
                   ConvF2HF(FP32BinOp(ConvHF2F(x), ConvHF2F(y))) =>
                           ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))

jddarcy · 2024-12-16T18:42:48Z

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java

+/**
+ * The class {@code Float16Math} constains intrinsic entry points corresponding
+ * to scalar numeric operations defined in Float16 class.
+ * @author


Please remove all author tags. We haven't used them in new code in the JDK for some time.

jddarcy · 2024-12-16T18:47:50Z

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java

@@ -1401,8 +1412,15 @@ public static Float16 fma(Float16 a, Float16 b, Float16 c) {
        // product is numerically exact in float before the cast to
        // double; not necessary to widen to double before the
        // multiply.
-        double product = (double)(a.floatValue() * b.floatValue());
-        return valueOf(product + c.doubleValue());
+        short fa = float16ToRawShortBits(a);


The new implementations in fma and sqrt are comparatively long and obscure compared to the current versions. That might be the price of intrinsification, but it would be helpful to at least have a comment to the reader explaining why the more obvious code was not being used.

C2 compiler support for float16 scalar operations.

c215eac

openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 15, 2024

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 15, 2024

jatin-bhateja marked this pull request as ready for review December 15, 2024 18:14

openjdk bot added the rfr Pull request is ready for review label Dec 15, 2024

jatin-bhateja mentioned this pull request Dec 15, 2024

8346236: Auto vectorization support for various Float16 operations #22755

Draft

3 tasks

eme64 reviewed Dec 16, 2024

View reviewed changes

Adding missed check in container type detection.

7cb694f

Adding more test points

3a6697e

mur47x111 mentioned this pull request Dec 16, 2024

[JDK-8344599] Adapt JDK-8342103: C2 compiler support for Float16 type and associated operations oracle/graal#10117

Open

jddarcy reviewed Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

jatin-bhateja commented Dec 15, 2024 •

edited by openjdk bot

Loading

jatin-bhateja commented Dec 15, 2024

bridgekeeper bot commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

mlbridge bot commented Dec 15, 2024 •

edited

Loading

eme64 left a comment

eme64 Dec 16, 2024

jatin-bhateja Dec 16, 2024

jatin-bhateja commented Dec 16, 2024

eme64 commented Dec 16, 2024

jatin-bhateja commented Dec 16, 2024

jddarcy Dec 16, 2024

jddarcy Dec 16, 2024

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

Are you sure you want to change the base?

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

Conversation

jatin-bhateja commented Dec 15, 2024 • edited by openjdk bot Loading

Progress

Issue

Contributors

Reviewing

jatin-bhateja commented Dec 15, 2024

bridgekeeper bot commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

jatin-bhateja commented Dec 15, 2024

openjdk bot commented Dec 15, 2024

mlbridge bot commented Dec 15, 2024 • edited Loading

Webrevs

eme64 left a comment

Choose a reason for hiding this comment

eme64 Dec 16, 2024

Choose a reason for hiding this comment

jatin-bhateja Dec 16, 2024

Choose a reason for hiding this comment

jatin-bhateja commented Dec 16, 2024

eme64 commented Dec 16, 2024

jatin-bhateja commented Dec 16, 2024

jddarcy Dec 16, 2024

Choose a reason for hiding this comment

jddarcy Dec 16, 2024

Choose a reason for hiding this comment

jatin-bhateja commented Dec 15, 2024 •

edited by openjdk bot

Loading

mlbridge bot commented Dec 15, 2024 •

edited

Loading