Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Dec 20, 2021

Patch extends existing macrologic inferencing algorithm to handle masked logic operations.

Existing algorithm:

  1. Identify logic cone roots.
  2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met.
    i.e. maximum number of inputs which a macro logic node can have.
  3. Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column
    to each input.
  4. Inputs along with encoded function together represents a macro logic node which mimics a truth table.

Modification:
Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following
rules define the criteria under which nodes gets packed into a macro logic node:-

  1. Parent and both child nodes are all unmasked or masked with same predicates.
  2. Masked parent can be packed with left child if it is predicated and both have same prediates.
  3. Masked parent can be packed with right child if its un-predicated or has matching predication condition.
  4. An unmasked parent can be packed with an unmasked child.

New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and
child nodes.

Following are the performance number for JMH benchmark included with the patch.

Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)

Benchmark ARRAYLEN Baseline (ops/s) Withopt (ops/s) Gain ( withopt/baseline)
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller 64 2365.421 5136.283 2.171403315
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller 128 2034.1 4073.381 2.002547072
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller 256 1568.694 2811.975 1.792558013
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller 512 883.261 1662.771 1.882536419
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller 1024 469.513 732.81 1.560787454
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller 64 273.049 552.106 2.022003377
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller 128 219.624 359.775 1.63814064
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller 256 131.649 182.23 1.384211046
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller 512 71.452 81.522 1.140933774
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller 1024 37.427 41.966 1.121276084
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller 64 2805.759 3383.16 1.205791374
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller 128 2069.012 2250.37 1.087654397
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller 256 1098.766 1101.996 1.002939661
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller 512 470.035 484.732 1.031267884
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller 1024 202.827 209.073 1.030794717
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 256 3435.989 4418.09 1.285827749
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 512 1524.803 1678.201 1.100601848
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 1024 972.501 1166.734 1.199725244
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 256 5980.85 7584.17 1.268075608
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 512 3258.108 3939.23 1.209054457
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 1024 1475.365 1511.159 1.024261115
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 256 4208.766 4220.678 1.002830283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 512 2056.651 2049.489 0.99651764
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 1024 1110.461 1116.448 1.005391455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 256 3259.348 3947.94 1.211266793
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 512 1515.147 1536.647 1.014190042
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 1024 911.58 1030.54 1.130498695
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 256 2034.611 2073.764 1.019243482
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 512 1110.659 1116.093 1.004892591
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 1024 559.269 559.651 1.000683034
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 256 3636.141 4446.505 1.222863745
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 512 1433.145 1681.261 1.173126934
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 1024 1000.107 1172.866 1.172740517
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 256 5568.313 7670.259 1.37748345
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 512 3350.108 3927.803 1.172440709
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 1024 1495.966 1541.56 1.030477965
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 256 4230.379 4282.154 1.012238856
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 512 2029.801 2049.638 1.009772879
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 1024 1108.738 1118.897 1.00916267
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 256 3802.801 3783.537 0.99493426
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 512 1546.244 1552.691 1.004169458
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 1024 1017.512 1020.075 1.002518889
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 256 4159.835 4527.676 1.088426825
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 512 1665.335 1733.04 1.040655484
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 1024 1150.319 1181.935 1.02748455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 256 6989.791 7382.883 1.056238019
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 512 3711.362 3911.921 1.054039191
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 1024 1540.341 1554.175 1.008981128
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 256 4164.559 4213.546 1.01176283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 512 2072.91 2079.105 1.002988552
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 1024 1112.678 1116.675 1.003592234
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 256 3702.998 3906.093 1.0548461
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 512 1536.571 1546.043 1.006164375
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256 1024 996.906 1013.649 1.016794964
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 256 2045.594 2048.966 1.001648421
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 512 1111.933 1117.689 1.005176571
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 1024 559.971 561.144 1.002094751

Kindly review and share your feedback.

Best Regards,
Jatin


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273322: Enhance macro logic optimization for masked logic operations.

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893
$ git checkout pull/6893

Update a local copy of the PR:
$ git checkout pull/6893
$ git pull https://git.openjdk.java.net/jdk pull/6893/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6893

View PR using the GUI difftool:
$ git pr show -t 6893

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6893.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 20, 2021

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 20, 2021
@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@openjdk
Copy link

openjdk bot commented Dec 20, 2021

@jatin-bhateja The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 20, 2021
@openjdk
Copy link

openjdk bot commented Dec 20, 2021

@jatin-bhateja The hotspot-compiler label was already applied.

@mlbridge
Copy link

mlbridge bot commented Dec 20, 2021

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whole "Bitwise operation packing optimization" code should be moved out from compile.cpp. May be to vectornode.cpp where MacroLogicVNode` code is located.

Copyright year should be updated to 2022 in all changed files.

return true;

case Op_MacroLogicV:
if(bt != T_INT && bt != T_LONG) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing VM_Version::supports_evex() check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vnkozlov, we already have that check (UseAVX < 3) in match_rule_supported routine which gets called from this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me test it before approval. You need second review.

And file RFE to move vector code from compile.cpp. Let do it separately from these changes.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jan 5, 2022

compiler/vectorapi/TestMaskedMacroLogicVector.java test failed on Aarch64 machines:

Unrecognized VM option 'UseAVX=3'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

java.lang.RuntimeException: TestFramework flag VM exited with 1
	at compiler.lib.ir_framework.driver.FlagVMProcess.checkFlagVMExitCode(FlagVMProcess.java:135)
	at compiler.lib.ir_framework.driver.FlagVMProcess.start(FlagVMProcess.java:121)
	at compiler.lib.ir_framework.driver.FlagVMProcess.<init>(FlagVMProcess.java:63)
	at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:658)
	at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:322)
	at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:230)
	at compiler.vectorapi.TestMaskedMacroLogicVector.main(TestMaskedMacroLogicVector.java:837)

match(Set dst (MacroLogicV dst (Binary src2 (Binary src3 (Binary func mask)))));
format %{ "vternlog_masked $dst,$src2,$src3,$func,$mask\t! vternlog masked operation" %}
ins_encode %{
int vector_len = vector_length_encoding(this);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to name this as vlen_enc instead of vector_len.

Comment on lines 4165 to 4174
void C2_MacroAssembler::evpternlog(XMMRegister dst, int func, KRegister mask, XMMRegister src2, Address src3,
bool merge, BasicType bt, int vlen_enc) {
if (bt == T_INT) {
evpternlogd(dst, func, mask, src2, src3, true, vlen_enc);
} else {
assert(bt == T_LONG, "");
evpternlogq(dst, func, mask, src2, src3, true, vlen_enc);
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"merge" argument is not used in the method body.

Comment on lines 4155 to 4164
void C2_MacroAssembler::evpternlog(XMMRegister dst, int func, KRegister mask, XMMRegister src2, XMMRegister src3,
bool merge, BasicType bt, int vlen_enc) {
if (bt == T_INT) {
evpternlogd(dst, func, mask, src2, src3, true, vlen_enc);
} else {
assert(bt == T_LONG, "");
evpternlogq(dst, func, mask, src2, src3, true, vlen_enc);
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"merge" argument not used in method body.

Comment on lines 9771 to 9773
emit_int8(0x25);
emit_int8((unsigned char)(0xC0 | encode));
emit_int8(imm8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use emit_int24() here.

Comment on lines 9738 to 9740
emit_int8(0x25);
emit_int8((unsigned char)(0xC0 | encode));
emit_int8(imm8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use emit_int24() here.

@jatin-bhateja
Copy link
Member Author

I think whole "Bitwise operation packing optimization" code should be moved out from compile.cpp. May be to vectornode.cpp where MacroLogicVNode` code is located.

Hi @vnkozlov ,
Yes we can also extended AndV/OrV/XorV/AndVMask/OrVMask/XorVMask idealizations to perform macro logic folding,
current changes keeps the implementation clean and limited to one optimization stage.

Copyright year should be updated to 2022 in all changed files.

@jatin-bhateja
Copy link
Member Author

Hi @sviswa7 , @vnkozlov , your comments have been addressed.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Jan 5, 2022

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273322: Enhance macro logic optimization for masked logic operations.

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 36 new commits pushed to the master branch:

  • bc12381: 8279505: Update documentation for RETRY_COUNT and REPEAT_COUNT
  • 2dbb936: 8279339: (ch) Input/Output streams returned by Channels factory methods don't support concurrent read/write ops
  • 456bd1e: 8211004: javac is complaining about non-denotable types and refusing to generate the class file
  • 844dfb3: Merge
  • 564c8c6: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
  • 590fa9d: 8278612: [macos] test/jdk/java/awt/dnd/RemoveDropTargetCrashTest crashes with VoiceOver on macOS
  • 5cd9515: 8279525: ProblemList java/awt/GraphicsDevice/CheckDisplayModes.java on macosx-aarch64
  • 9d43d25: 8278897: Alignment of heap segments is not enforced correctly
  • 0f4807e: 8279222: Incorrect legacyMap.get in java.security.Provider after JDK-8276660
  • f8f9148: 8278948: compiler/vectorapi/reshape/TestVectorCastAVX1.java crashes in assembler
  • ... and 26 more: https://git.openjdk.java.net/jdk/compare/2a59ebbba391ee0d70604027081712f1c2dfd1fe...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 5, 2022

/**
* @test
* @bug 8273322
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs @key randomness as we use random number without a fixed seed here.
Please see:
https://openjdk.java.net/jtreg/faq.html#when-should-i-use-the-intermittent-or-randomness-keyword-in-a-test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jan 6, 2022

Going to push as commit 8703f14.
Since your change was applied there have been 36 commits pushed to the master branch:

  • bc12381: 8279505: Update documentation for RETRY_COUNT and REPEAT_COUNT
  • 2dbb936: 8279339: (ch) Input/Output streams returned by Channels factory methods don't support concurrent read/write ops
  • 456bd1e: 8211004: javac is complaining about non-denotable types and refusing to generate the class file
  • 844dfb3: Merge
  • 564c8c6: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
  • 590fa9d: 8278612: [macos] test/jdk/java/awt/dnd/RemoveDropTargetCrashTest crashes with VoiceOver on macOS
  • 5cd9515: 8279525: ProblemList java/awt/GraphicsDevice/CheckDisplayModes.java on macosx-aarch64
  • 9d43d25: 8278897: Alignment of heap segments is not enforced correctly
  • 0f4807e: 8279222: Incorrect legacyMap.get in java.security.Provider after JDK-8276660
  • f8f9148: 8278948: compiler/vectorapi/reshape/TestVectorCastAVX1.java crashes in assembler
  • ... and 26 more: https://git.openjdk.java.net/jdk/compare/2a59ebbba391ee0d70604027081712f1c2dfd1fe...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 6, 2022
@openjdk openjdk bot closed this Jan 6, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 6, 2022
@openjdk
Copy link

openjdk bot commented Jan 6, 2022

@jatin-bhateja Pushed as commit 8703f14.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants