8273322: Enhance macro logic optimization for masked logic operations. #6893

jatin-bhateja · 2021-12-20T13:33:01Z

Patch extends existing macrologic inferencing algorithm to handle masked logic operations.

Existing algorithm:

Identify logic cone roots.
Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met.
i.e. maximum number of inputs which a macro logic node can have.
Perform symbolic evaluation of logic expression tree by assigning value corresponding to a truth table column
to each input.
Inputs along with encoded function together represents a macro logic node which mimics a truth table.

Modification:
Extended the packing algorithm to operate on both predicated or non-predicated logic nodes. Following
rules define the criteria under which nodes gets packed into a macro logic node:-

Parent and both child nodes are all unmasked or masked with same predicates.
Masked parent can be packed with left child if it is predicated and both have same prediates.
Masked parent can be packed with right child if its un-predicated or has matching predication condition.
An unmasked parent can be packed with an unmasked child.

New jtreg test case added with the patch exhaustively covers all the different combinations of predications of parent and
child nodes.

Following are the performance number for JMH benchmark included with the patch.

Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)

Benchmark	ARRAYLEN	Baseline (ops/s)	Withopt (ops/s)	Gain ( withopt/baseline)
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller	64	2365.421	5136.283	2.171403315
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller	128	2034.1	4073.381	2.002547072
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller	256	1568.694	2811.975	1.792558013
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller	512	883.261	1662.771	1.882536419
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller	1024	469.513	732.81	1.560787454
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller	64	273.049	552.106	2.022003377
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller	128	219.624	359.775	1.63814064
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller	256	131.649	182.23	1.384211046
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller	512	71.452	81.522	1.140933774
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller	1024	37.427	41.966	1.121276084
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller	64	2805.759	3383.16	1.205791374
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller	128	2069.012	2250.37	1.087654397
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller	256	1098.766	1101.996	1.002939661
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller	512	470.035	484.732	1.031267884
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller	1024	202.827	209.073	1.030794717
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128	256	3435.989	4418.09	1.285827749
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128	512	1524.803	1678.201	1.100601848
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128	1024	972.501	1166.734	1.199725244
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256	256	5980.85	7584.17	1.268075608
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256	512	3258.108	3939.23	1.209054457
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256	1024	1475.365	1511.159	1.024261115
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512	256	4208.766	4220.678	1.002830283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512	512	2056.651	2049.489	0.99651764
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512	1024	1110.461	1116.448	1.005391455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256	256	3259.348	3947.94	1.211266793
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256	512	1515.147	1536.647	1.014190042
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256	1024	911.58	1030.54	1.130498695
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512	256	2034.611	2073.764	1.019243482
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512	512	1110.659	1116.093	1.004892591
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512	1024	559.269	559.651	1.000683034
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128	256	3636.141	4446.505	1.222863745
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128	512	1433.145	1681.261	1.173126934
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128	1024	1000.107	1172.866	1.172740517
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256	256	5568.313	7670.259	1.37748345
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256	512	3350.108	3927.803	1.172440709
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256	1024	1495.966	1541.56	1.030477965
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512	256	4230.379	4282.154	1.012238856
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512	512	2029.801	2049.638	1.009772879
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512	1024	1108.738	1118.897	1.00916267
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256	256	3802.801	3783.537	0.99493426
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256	512	1546.244	1552.691	1.004169458
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256	1024	1017.512	1020.075	1.002518889
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128	256	4159.835	4527.676	1.088426825
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128	512	1665.335	1733.04	1.040655484
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128	1024	1150.319	1181.935	1.02748455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256	256	6989.791	7382.883	1.056238019
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256	512	3711.362	3911.921	1.054039191
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256	1024	1540.341	1554.175	1.008981128
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512	256	4164.559	4213.546	1.01176283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512	512	2072.91	2079.105	1.002988552
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512	1024	1112.678	1116.675	1.003592234
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256	256	3702.998	3906.093	1.0548461
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256	512	1536.571	1546.043	1.006164375
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256	1024	996.906	1013.649	1.016794964
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512	256	2045.594	2048.966	1.001648421
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512	512	1111.933	1117.689	1.005176571
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512	1024	559.971	561.144	1.002094751

Kindly review and share your feedback.

Best Regards,
Jatin

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8273322: Enhance macro logic optimization for masked logic operations.

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to f101fff
Sandhya Viswanathan (@sviswa7 - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893
$ git checkout pull/6893

Update a local copy of the PR:
$ git checkout pull/6893
$ git pull https://git.openjdk.java.net/jdk pull/6893/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6893

View PR using the GUI difftool:
$ git pr show -t 6893

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6893.diff

bridgekeeper · 2021-12-20T13:33:41Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

jatin-bhateja · 2021-12-20T13:35:51Z

/label add hotspot-compiler-dev

openjdk · 2021-12-20T13:35:57Z

@jatin-bhateja The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk · 2021-12-20T13:36:24Z

@jatin-bhateja The hotspot-compiler label was already applied.

mlbridge · 2021-12-20T13:39:44Z

Webrevs

vnkozlov

I think whole "Bitwise operation packing optimization" code should be moved out from compile.cpp. May be to vectornode.cpp where MacroLogicVNode` code is located.

Copyright year should be updated to 2022 in all changed files.

vnkozlov · 2022-01-04T02:21:35Z

src/hotspot/cpu/x86/x86.ad

      return true;

+    case Op_MacroLogicV:
+      if(bt != T_INT && bt != T_LONG) {


Missing VM_Version::supports_evex() check?

Hi @vnkozlov, we already have that check (UseAVX < 3) in match_rule_supported routine which gets called from this function.

vnkozlov

Let me test it before approval. You need second review.

And file RFE to move vector code from compile.cpp. Let do it separately from these changes.

vnkozlov · 2022-01-05T00:36:21Z

compiler/vectorapi/TestMaskedMacroLogicVector.java test failed on Aarch64 machines:

Unrecognized VM option 'UseAVX=3'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

java.lang.RuntimeException: TestFramework flag VM exited with 1
	at compiler.lib.ir_framework.driver.FlagVMProcess.checkFlagVMExitCode(FlagVMProcess.java:135)
	at compiler.lib.ir_framework.driver.FlagVMProcess.start(FlagVMProcess.java:121)
	at compiler.lib.ir_framework.driver.FlagVMProcess.<init>(FlagVMProcess.java:63)
	at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:658)
	at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:322)
	at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:230)
	at compiler.vectorapi.TestMaskedMacroLogicVector.main(TestMaskedMacroLogicVector.java:837)

sviswa7 · 2022-01-05T00:45:20Z

src/hotspot/cpu/x86/x86.ad

+  match(Set dst (MacroLogicV dst (Binary src2 (Binary src3 (Binary func mask)))));
+  format %{ "vternlog_masked $dst,$src2,$src3,$func,$mask\t! vternlog masked operation" %}
+  ins_encode %{
+    int vector_len = vector_length_encoding(this);


It would be good to name this as vlen_enc instead of vector_len.

sviswa7 · 2022-01-05T00:47:53Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+void C2_MacroAssembler::evpternlog(XMMRegister dst, int func, KRegister mask, XMMRegister src2, Address src3,
+                                   bool merge, BasicType bt, int vlen_enc) {
+  if (bt == T_INT) {
+    evpternlogd(dst, func, mask, src2, src3, true, vlen_enc);
+  } else {
+    assert(bt == T_LONG, "");
+    evpternlogq(dst, func, mask, src2, src3, true, vlen_enc);
+  }
+}
+


"merge" argument is not used in the method body.

sviswa7 · 2022-01-05T00:48:21Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+void C2_MacroAssembler::evpternlog(XMMRegister dst, int func, KRegister mask, XMMRegister src2, XMMRegister src3,
+                                   bool merge, BasicType bt, int vlen_enc) {
+  if (bt == T_INT) {
+    evpternlogd(dst, func, mask, src2, src3, true, vlen_enc);
+  } else {
+    assert(bt == T_LONG, "");
+    evpternlogq(dst, func, mask, src2, src3, true, vlen_enc);
+  }
+}
+


"merge" argument not used in method body.

sviswa7 · 2022-01-05T00:58:43Z

src/hotspot/cpu/x86/assembler_x86.cpp

+  emit_int8(0x25);
+  emit_int8((unsigned char)(0xC0 | encode));
+  emit_int8(imm8);


Please use emit_int24() here.

sviswa7 · 2022-01-05T00:59:21Z

src/hotspot/cpu/x86/assembler_x86.cpp

+  emit_int8(0x25);
+  emit_int8((unsigned char)(0xC0 | encode));
+  emit_int8(imm8);


Please use emit_int24() here.

jatin-bhateja · 2022-01-05T08:53:04Z

I think whole "Bitwise operation packing optimization" code should be moved out from compile.cpp. May be to vectornode.cpp where MacroLogicVNode` code is located.

Hi @vnkozlov ,
Yes we can also extended AndV/OrV/XorV/AndVMask/OrVMask/XorVMask idealizations to perform macro logic folding,
current changes keeps the implementation clean and limited to one optimization stage.

Copyright year should be updated to 2022 in all changed files.

jatin-bhateja · 2022-01-05T08:54:41Z

Hi @sviswa7 , @vnkozlov , your comments have been addressed.

vnkozlov

Looks good.

openjdk · 2022-01-05T18:18:20Z

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273322: Enhance macro logic optimization for masked logic operations.

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 36 new commits pushed to the master branch:

bc12381: 8279505: Update documentation for RETRY_COUNT and REPEAT_COUNT
2dbb936: 8279339: (ch) Input/Output streams returned by Channels factory methods don't support concurrent read/write ops
456bd1e: 8211004: javac is complaining about non-denotable types and refusing to generate the class file
844dfb3: Merge
564c8c6: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
590fa9d: 8278612: [macos] test/jdk/java/awt/dnd/RemoveDropTargetCrashTest crashes with VoiceOver on macOS
5cd9515: 8279525: ProblemList java/awt/GraphicsDevice/CheckDisplayModes.java on macosx-aarch64
9d43d25: 8278897: Alignment of heap segments is not enforced correctly
0f4807e: 8279222: Incorrect legacyMap.get in java.security.Provider after JDK-8276660
f8f9148: 8278948: compiler/vectorapi/reshape/TestVectorCastAVX1.java crashes in assembler
... and 26 more: https://git.openjdk.java.net/jdk/compare/2a59ebbba391ee0d70604027081712f1c2dfd1fe...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

sviswa7 · 2022-01-06T17:39:20Z

test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java

+
+/**
+ * @test
+ * @bug 8273322


Needs @key randomness as we use random number without a fixed seed here.
Please see:
https://openjdk.java.net/jtreg/faq.html#when-should-i-use-the-intermittent-or-randomness-keyword-in-a-test

jatin-bhateja · 2022-01-06T18:40:17Z

/integrate

openjdk · 2022-01-06T18:41:20Z

Going to push as commit 8703f14.
Since your change was applied there have been 36 commits pushed to the master branch:

bc12381: 8279505: Update documentation for RETRY_COUNT and REPEAT_COUNT
2dbb936: 8279339: (ch) Input/Output streams returned by Channels factory methods don't support concurrent read/write ops
456bd1e: 8211004: javac is complaining about non-denotable types and refusing to generate the class file
844dfb3: Merge
564c8c6: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
590fa9d: 8278612: [macos] test/jdk/java/awt/dnd/RemoveDropTargetCrashTest crashes with VoiceOver on macOS
5cd9515: 8279525: ProblemList java/awt/GraphicsDevice/CheckDisplayModes.java on macosx-aarch64
9d43d25: 8278897: Alignment of heap segments is not enforced correctly
0f4807e: 8279222: Incorrect legacyMap.get in java.security.Provider after JDK-8276660
f8f9148: 8278948: compiler/vectorapi/reshape/TestVectorCastAVX1.java crashes in assembler
... and 26 more: https://git.openjdk.java.net/jdk/compare/2a59ebbba391ee0d70604027081712f1c2dfd1fe...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-01-06T18:41:35Z

@jatin-bhateja Pushed as commit 8703f14.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8273322: Enhance macro logic optimization for masked logic operations.

b14079e

openjdk bot added the rfr Pull request is ready for review label Dec 20, 2021

openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 20, 2021

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8273322

f8120ac

vnkozlov suggested changes Jan 4, 2022

View reviewed changes

Jatin Bhateja added 2 commits January 4, 2022 08:25

8273322: Updating copywrite header.

d18f504

8273322: Review comments resolution.

f101fff

vnkozlov reviewed Jan 4, 2022

View reviewed changes

sviswa7 reviewed Jan 5, 2022

View reviewed changes

vnkozlov approved these changes Jan 5, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jan 5, 2022

8273322: Adding missing randomness key.

2d196f7

sviswa7 reviewed Jan 6, 2022

View reviewed changes

sviswa7 approved these changes Jan 6, 2022

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Jan 6, 2022

openjdk bot closed this Jan 6, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 6, 2022

vnkozlov mentioned this pull request Jan 6, 2022

8278868: Add x86 vectorization support for Long.bitCount() #6857

Closed

3 tasks

jatin-bhateja deleted the JDK-8273322 branch February 7, 2022 20:20

8273322: Enhance macro logic optimization for masked logic operations. #6893

8273322: Enhance macro logic optimization for masked logic operations. #6893

Uh oh!

Conversation

jatin-bhateja commented Dec 20, 2021 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Dec 20, 2021

Uh oh!

jatin-bhateja commented Dec 20, 2021

Uh oh!

openjdk bot commented Dec 20, 2021

Uh oh!

openjdk bot commented Dec 20, 2021

Uh oh!

mlbridge bot commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

vnkozlov commented Jan 5, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Jan 5, 2022

Uh oh!

jatin-bhateja commented Jan 5, 2022

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Jan 6, 2022

Uh oh!

openjdk bot commented Jan 6, 2022

Uh oh!

openjdk bot commented Jan 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

jatin-bhateja commented Dec 20, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Dec 20, 2021 •

edited

Loading

openjdk bot commented Jan 5, 2022 •

edited

Loading