Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Dec 15, 2024

Hi All,

This patch adds C2 compiler support for various Float16 operations added by PR#22128

Following is the summary of changes included with this patch:-

  1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
  2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
  3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
    • These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
  4. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
  5. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to FAQs for more details.
  6. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
  7. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
  8. X86 backend implementation for all supported intrinsics.
  9. Functional and Performance validation tests.

Kindly review the patch and share your feedback.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8342103: C2 compiler support for Float16 type and associated scalar operations (Enhancement - P4)

Contributors

  • Paul Sandoz <psandoz@openjdk.org>
  • Bhavana Kilambi <bkilambi@openjdk.org>
  • Joe Darcy <darcy@openjdk.org>
  • Raffaello Giulietti <rgiulietti@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754
$ git checkout pull/22754

Update a local copy of the PR:
$ git checkout pull/22754
$ git pull https://git.openjdk.org/jdk.git pull/22754/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22754

View PR using the GUI difftool:
$ git pr show -t 22754

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22754.diff

Using Webrev

Link to Webrev Comment

@jatin-bhateja
Copy link
Member Author

Some FAQs on the newly added ideal type for half-float IR nodes:-

Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type?
A. Newly defined half float type named TypeH is special as its basic type is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register.

Q. Problem with ConF?
A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width.

All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short.

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2024

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@jatin-bhateja
Copy link
Member Author

/contributor add @PaulSandoz

@jatin-bhateja
Copy link
Member Author

/contributor add @Bhavana-Kilambi

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Paul Sandoz <psandoz@openjdk.org> successfully added.

@jatin-bhateja
Copy link
Member Author

/contributor add @jddarcy

@jatin-bhateja
Copy link
Member Author

/contributor add @rgiulietti

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Bhavana Kilambi <bkilambi@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Joe Darcy <darcy@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Raffaello Giulietti <rgiulietti@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 15, 2024
@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 15, 2024
@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
The hotspot-compiler label was successfully added.

@jatin-bhateja jatin-bhateja marked this pull request as ready for review December 15, 2024 18:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 15, 2024
@mlbridge
Copy link

mlbridge bot commented Dec 15, 2024

Webrevs

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quickly summarize what tests you have, and what they test?

Comment on lines -44 to +49
@IR(applyIfCPUFeatureOr = {"f16c", "true", "avx512vl", "true", "zvfh", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
@IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"},
counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
@IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"},
counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
@IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "zvfh", "true"},
counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is having vector changes?
And this is pre-existing: but why are we using VECTOR_SIZE_ANY here? Can we not know the vector size? Maybe we can introduce a new tag max_float16 or max_hf. And do something like this:
IRNode.VECTOR_SIZE + "min(max_float, max_hf)", "> 0"

The downside with using ANY is that the exact size is not tested, and that might mean that the size is much smaller than ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eme64 , Test modification looks ok to me, we intend to trigger these IR rules on non AVX512-FP16 targets.
On AVX512-FP16 target compiler will infer scalar float16 add operation which will not get auto-vectorized.

@jatin-bhateja
Copy link
Member Author

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

@eme64
Copy link
Contributor

eme64 commented Dec 16, 2024

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

@jatin-bhateja
Copy link
Member Author

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

Validations details:-

A) x86 backend changes
   - new assembler instruction
   - macro assembly routines. 
    Test point:-  test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java
         - This test is based on a testng framework and includes new DataProviders to generate test vectors.
         -  Test vectors cover the entire float16 value range and also special floating point values (NaN, +Int, -Inf, 0.0 and -0.0) 
B) GVN transformations:-
     -  Value Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
              -  Covers all the constant folding scenarios for add, sub, mul, div, sqrt, fma, min, and max operations addressed by this patch.
              -  It also tests special case scenarios for each operation as specified by Java language specification.
    -   identity Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
               -  Covers identity transformation for  ReinterpretS2HFNode,  DivHFNode
    -  idealization Transforms
        Test points:-  test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java
                                :-   test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
            - Contains test point for the following transform 
                         MulHF idealization i.e. MulHF * 2 => AddHF  
           -  Contains test point for the following transform
                         DivHF SRC ,  PoT(constant) =>  MulHF SRC * reciprocal (constant) 
            - Contains idealization test points for the following transform 
                   ConvF2HF(FP32BinOp(ConvHF2F(x), ConvHF2F(y))) =>
                           ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))

/**
* The class {@code Float16Math} constains intrinsic entry points corresponding
* to scalar numeric operations defined in Float16 class.
* @author
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all author tags. We haven't used them in new code in the JDK for some time.

@@ -1401,8 +1412,15 @@ public static Float16 fma(Float16 a, Float16 b, Float16 c) {
// product is numerically exact in float before the cast to
// double; not necessary to widen to double before the
// multiply.
double product = (double)(a.floatValue() * b.floatValue());
return valueOf(product + c.doubleValue());
short fa = float16ToRawShortBits(a);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new implementations in fma and sqrt are comparatively long and obscure compared to the current versions. That might be the price of intrinsification, but it would be helpful to at least have a comment to the reader explaining why the more obvious code was not being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants