Skip to content

Commit c2c1373

Browse files
Faye GaoFei Gao
authored andcommitted
8283091: Support type conversion between different data sizes in SLP
After JDK-8275317, C2's SLP vectorizer has supported type conversion between the same data size. We can also support conversions between different data sizes like: int <-> double float <-> long int <-> long float <-> double A typical test case: int[] a; double[] b; for (int i = start; i < limit; i++) { b[i] = (double) a[i]; } Our expected OptoAssembly code for one iteration is like below: add R12, R2, R11, LShiftL #2 vector_load V16,[R12, #16] vectorcast_i2d V16, V16 # convert I to D vector add R11, R1, R11, LShiftL #3 # ptr add R13, R11, #16 # ptr vector_store [R13], V16 To enable the vectorization, the patch solves the following problems in the SLP. There are three main operations in the case above, LoadI, ConvI2D and StoreD. Assuming that the vector length is 128 bits, how many scalar nodes should be packed together to a vector? If we decide it separately for each operation node, like what we did before the patch in SuperWord::combine_packs(), a 128-bit vector will support 4 LoadI or 2 ConvI2D or 2 StoreD nodes. However, if we put these packed nodes in a vector node sequence, like loading 4 elements to a vector, then typecasting 2 elements and lastly storing these 2 elements, they become invalid. As a result, we should look through the whole def-use chain and then pick up the minimum of these element sizes, like function SuperWord::max_vector_size_in_ud_chain() do in the superword.cpp. In this case, we pack 2 LoadI, 2 ConvI2D and 2 StoreD nodes, and then generate valid vector node sequence, like loading 2 elements, converting the 2 elements to another type and storing the 2 elements with new type. After this, LoadI nodes don't make full use of the whole vector and only occupy part of it. So we adapt the code in SuperWord::get_vw_bytes_special() to the situation. In SLP, we calculate a kind of alignment as position trace for each scalar node in the whole vector. In this case, the alignments for 2 LoadI nodes are 0, 4 while the alignment for 2 ConvI2D nodes are 0, 8. Sometimes, 4 for LoadI and 8 for ConvI2D work the same, both of which mark that this node is the second node in the whole vector, while the difference between 4 and 8 are just because of their own data sizes. In this situation, we should try to remove the impact caused by different data size in SLP. For example, in the stage of SuperWord::extend_packlist(), while determining if it's potential to pack a pair of def nodes in the function SuperWord::follow_use_defs(), we remove the side effect of different data size by transforming the target alignment from the use node. Because we believe that, assuming that the vector length is 512 bits, if the ConvI2D use nodes have alignments of 16-24 and their def nodes, LoadI, have alignments of 8-12, these two LoadI nodes should be packed as a pair as well. Similarly, when determining if the vectorization is profitable, type conversion between different data size takes a type of one size and produces a type of another size, hence the special checks on alignment and size should be applied, like what we do in SuperWord::is_vector_use. After solving these problems, we successfully implemented the vectorization of type conversion between different data sizes. Here is the test data on NEON: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 216.431 ± 0.131 ns/op VectorLoop.convertD2I 523 avgt 15 220.522 ± 0.311 ns/op VectorLoop.convertF2D 523 avgt 15 217.034 ± 0.292 ns/op VectorLoop.convertF2L 523 avgt 15 231.634 ± 1.881 ns/op VectorLoop.convertI2D 523 avgt 15 229.538 ± 0.095 ns/op VectorLoop.convertI2L 523 avgt 15 214.822 ± 0.131 ns/op VectorLoop.convertL2F 523 avgt 15 230.188 ± 0.217 ns/op VectorLoop.convertL2I 523 avgt 15 162.234 ± 0.235 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 124.352 ± 1.079 ns/op VectorLoop.convertD2I 523 avgt 15 557.388 ± 8.166 ns/op VectorLoop.convertF2D 523 avgt 15 118.082 ± 4.026 ns/op VectorLoop.convertF2L 523 avgt 15 225.810 ± 11.180 ns/op VectorLoop.convertI2D 523 avgt 15 166.247 ± 0.120 ns/op VectorLoop.convertI2L 523 avgt 15 119.699 ± 2.925 ns/op VectorLoop.convertL2F 523 avgt 15 220.847 ± 0.053 ns/op VectorLoop.convertL2I 523 avgt 15 122.339 ± 2.738 ns/op perf data on X86: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 279.466 ± 0.069 ns/op VectorLoop.convertD2I 523 avgt 15 551.009 ± 7.459 ns/op VectorLoop.convertF2D 523 avgt 15 276.066 ± 0.117 ns/op VectorLoop.convertF2L 523 avgt 15 545.108 ± 5.697 ns/op VectorLoop.convertI2D 523 avgt 15 745.303 ± 0.185 ns/op VectorLoop.convertI2L 523 avgt 15 260.878 ± 0.044 ns/op VectorLoop.convertL2F 523 avgt 15 502.016 ± 0.172 ns/op VectorLoop.convertL2I 523 avgt 15 261.654 ± 3.326 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 106.975 ± 0.045 ns/op VectorLoop.convertD2I 523 avgt 15 546.866 ± 9.287 ns/op VectorLoop.convertF2D 523 avgt 15 82.414 ± 0.340 ns/op VectorLoop.convertF2L 523 avgt 15 542.235 ± 2.785 ns/op VectorLoop.convertI2D 523 avgt 15 92.966 ± 1.400 ns/op VectorLoop.convertI2L 523 avgt 15 79.960 ± 0.528 ns/op VectorLoop.convertL2F 523 avgt 15 504.712 ± 4.794 ns/op VectorLoop.convertL2I 523 avgt 15 129.753 ± 0.094 ns/op perf data on AVX512: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 282.984 ± 4.022 ns/op VectorLoop.convertD2I 523 avgt 15 543.080 ± 3.873 ns/op VectorLoop.convertF2D 523 avgt 15 273.950 ± 0.131 ns/op VectorLoop.convertF2L 523 avgt 15 539.568 ± 2.747 ns/op VectorLoop.convertI2D 523 avgt 15 745.238 ± 0.069 ns/op VectorLoop.convertI2L 523 avgt 15 260.935 ± 0.169 ns/op VectorLoop.convertL2F 523 avgt 15 501.870 ± 0.359 ns/op VectorLoop.convertL2I 523 avgt 15 257.508 ± 0.174 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 76.687 ± 0.530 ns/op VectorLoop.convertD2I 523 avgt 15 545.408 ± 4.657 ns/op VectorLoop.convertF2D 523 avgt 15 273.935 ± 0.099 ns/op VectorLoop.convertF2L 523 avgt 15 540.534 ± 3.032 ns/op VectorLoop.convertI2D 523 avgt 15 745.234 ± 0.053 ns/op VectorLoop.convertI2L 523 avgt 15 260.865 ± 0.104 ns/op VectorLoop.convertL2F 523 avgt 15 63.834 ± 4.777 ns/op VectorLoop.convertL2I 523 avgt 15 48.183 ± 0.990 ns/op Change-Id: I93e60fd956547dad9204ceec90220145c58a72ef
1 parent 8aba4de commit c2c1373

14 files changed

+1071
-49
lines changed

src/hotspot/share/opto/superword.cpp

Lines changed: 102 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -999,6 +999,12 @@ int SuperWord::get_vw_bytes_special(MemNode* s) {
999999
}
10001000
}
10011001

1002+
// Check for special case where there is a type conversion between different data size.
1003+
int vectsize = max_vector_size_in_ud_chain(s);
1004+
if (vectsize < Matcher::max_vector_size(btype)) {
1005+
vw = MIN2(vectsize * type2aelembytes(btype), vw);
1006+
}
1007+
10021008
return vw;
10031009
}
10041010

@@ -1187,7 +1193,9 @@ bool SuperWord::stmts_can_pack(Node* s1, Node* s2, int align) {
11871193
BasicType bt2 = velt_basic_type(s2);
11881194
if(!is_java_primitive(bt1) || !is_java_primitive(bt2))
11891195
return false;
1190-
if (Matcher::max_vector_size(bt1) < 2) {
1196+
BasicType longer_bt = longer_type_for_conversion(s1);
1197+
if (Matcher::max_vector_size(bt1) < 2 ||
1198+
(longer_bt != T_ILLEGAL && Matcher::max_vector_size(longer_bt) < 2)) {
11911199
return false; // No vectors for this type
11921200
}
11931201

@@ -1432,16 +1440,20 @@ bool SuperWord::follow_use_defs(Node_List* p) {
14321440

14331441
if (s1->is_Load()) return false;
14341442

1435-
int align = alignment(s1);
1436-
NOT_PRODUCT(if(is_trace_alignment()) tty->print_cr("SuperWord::follow_use_defs: s1 %d, align %d", s1->_idx, align);)
1443+
NOT_PRODUCT(if(is_trace_alignment()) tty->print_cr("SuperWord::follow_use_defs: s1 %d, align %d", s1->_idx, alignment(s1));)
14371444
bool changed = false;
14381445
int start = s1->is_Store() ? MemNode::ValueIn : 1;
14391446
int end = s1->is_Store() ? MemNode::ValueIn+1 : s1->req();
14401447
for (int j = start; j < end; j++) {
1448+
int align = alignment(s1);
14411449
Node* t1 = s1->in(j);
14421450
Node* t2 = s2->in(j);
14431451
if (!in_bb(t1) || !in_bb(t2))
14441452
continue;
1453+
if (longer_type_for_conversion(s1) != T_ILLEGAL ||
1454+
longer_type_for_conversion(t1) != T_ILLEGAL) {
1455+
align = align / data_size(s1) * data_size(t1);
1456+
}
14451457
if (stmts_can_pack(t1, t2, align)) {
14461458
if (est_savings(t1, t2) >= 0) {
14471459
Node_List* pair = new Node_List();
@@ -1485,12 +1497,18 @@ bool SuperWord::follow_def_uses(Node_List* p) {
14851497
if (t2->Opcode() == Op_AddI && t2 == _lp->as_CountedLoop()->incr()) continue; // don't mess with the iv
14861498
if (!opnd_positions_match(s1, t1, s2, t2))
14871499
continue;
1488-
if (stmts_can_pack(t1, t2, align)) {
1500+
int adjusted_align = alignment(s1);
1501+
if (longer_type_for_conversion(s1) != T_ILLEGAL ||
1502+
longer_type_for_conversion(t1) != T_ILLEGAL) {
1503+
adjusted_align = adjusted_align / data_size(s1) * data_size(t1);
1504+
}
1505+
if (stmts_can_pack(t1, t2, adjusted_align)) {
14891506
int my_savings = est_savings(t1, t2);
14901507
if (my_savings > savings) {
14911508
savings = my_savings;
14921509
u1 = t1;
14931510
u2 = t2;
1511+
align = adjusted_align;
14941512
}
14951513
}
14961514
}
@@ -1683,8 +1701,7 @@ void SuperWord::combine_packs() {
16831701
for (int i = 0; i < _packset.length(); i++) {
16841702
Node_List* p1 = _packset.at(i);
16851703
if (p1 != NULL) {
1686-
BasicType bt = velt_basic_type(p1->at(0));
1687-
uint max_vlen = Matcher::max_vector_size(bt); // Max elements in vector
1704+
uint max_vlen = max_vector_size_in_ud_chain(p1->at(0)); // Max elements in vector
16881705
assert(is_power_of_2(max_vlen), "sanity");
16891706
uint psize = p1->size();
16901707
if (!is_power_of_2(psize)) {
@@ -2007,6 +2024,8 @@ bool SuperWord::implemented(Node_List* p) {
20072024
} else {
20082025
retValue = ReductionNode::implemented(opc, size, arith_type->basic_type());
20092026
}
2027+
} else if (VectorNode::is_convert_opcode(opc)) {
2028+
retValue = VectorCastNode::implemented(opc, size, velt_basic_type(p0->in(1)), velt_basic_type(p0));
20102029
} else {
20112030
retValue = VectorNode::implemented(opc, size, velt_basic_type(p0));
20122031
}
@@ -2558,12 +2577,11 @@ void SuperWord::output() {
25582577
Node* in = vector_opd(p, 1);
25592578
vn = VectorNode::make(opc, in, NULL, vlen, velt_basic_type(n));
25602579
vlen_in_bytes = vn->as_Vector()->length_in_bytes();
2561-
} else if (opc == Op_ConvI2F || opc == Op_ConvL2D ||
2562-
opc == Op_ConvF2I || opc == Op_ConvD2L) {
2580+
} else if (VectorNode::is_convert_opcode(opc)) {
25632581
assert(n->req() == 2, "only one input expected");
25642582
BasicType bt = velt_basic_type(n);
2565-
int vopc = VectorNode::opcode(opc, bt);
25662583
Node* in = vector_opd(p, 1);
2584+
int vopc = VectorCastNode::opcode(in->bottom_type()->is_vect()->element_basic_type());
25672585
vn = VectorCastNode::make(vopc, in, bt, vlen);
25682586
vlen_in_bytes = vn->as_Vector()->length_in_bytes();
25692587
} else if (is_cmov_pack(p)) {
@@ -2961,9 +2979,26 @@ bool SuperWord::is_vector_use(Node* use, int u_idx) {
29612979
return true;
29622980
}
29632981

2964-
29652982
if (u_pk->size() != d_pk->size())
29662983
return false;
2984+
2985+
if (longer_type_for_conversion(use) != T_ILLEGAL) {
2986+
// type conversion takes a type of a kind of size and produces a type of
2987+
// another size - hence the special checks on alignment and size.
2988+
for (uint i = 0; i < u_pk->size(); i++) {
2989+
Node* ui = u_pk->at(i);
2990+
Node* di = d_pk->at(i);
2991+
if (ui->in(u_idx) != di) {
2992+
return false;
2993+
}
2994+
if (alignment(ui) / type2aelembytes(velt_basic_type(ui)) !=
2995+
alignment(di) / type2aelembytes(velt_basic_type(di))) {
2996+
return false;
2997+
}
2998+
}
2999+
return true;
3000+
}
3001+
29673002
for (uint i = 0; i < u_pk->size(); i++) {
29683003
Node* ui = u_pk->at(i);
29693004
Node* di = d_pk->at(i);
@@ -3196,6 +3231,63 @@ void SuperWord::compute_max_depth() {
31963231
}
31973232
}
31983233

3234+
BasicType SuperWord::longer_type_for_conversion(Node* n) {
3235+
int opcode = n->Opcode();
3236+
switch (opcode) {
3237+
case Op_ConvD2I:
3238+
case Op_ConvI2D:
3239+
case Op_ConvF2D:
3240+
case Op_ConvD2F: return T_DOUBLE;
3241+
case Op_ConvF2L:
3242+
case Op_ConvL2F:
3243+
case Op_ConvL2I:
3244+
case Op_ConvI2L: return T_LONG;
3245+
case Op_ConvI2F: {
3246+
BasicType src_t = velt_basic_type(n->in(1));
3247+
if (src_t == T_BYTE || src_t == T_SHORT) {
3248+
return T_FLOAT;
3249+
}
3250+
return T_ILLEGAL;
3251+
}
3252+
case Op_ConvF2I: {
3253+
BasicType dst_t = velt_basic_type(n);
3254+
if (dst_t == T_BYTE || dst_t == T_SHORT) {
3255+
return T_FLOAT;
3256+
}
3257+
return T_ILLEGAL;
3258+
}
3259+
}
3260+
return T_ILLEGAL;
3261+
}
3262+
3263+
int SuperWord::max_vector_size_in_ud_chain(Node* n) {
3264+
BasicType bt = velt_basic_type(n);
3265+
BasicType vt = bt;
3266+
3267+
// find the longest type among def nodes.
3268+
uint start, end;
3269+
VectorNode::vector_operands(n, &start, &end);
3270+
for (uint i = start; i < end; ++i) {
3271+
Node* input = n->in(i);
3272+
if (!in_bb(input)) continue;
3273+
BasicType newt = longer_type_for_conversion(input);
3274+
vt = (newt == T_ILLEGAL) ? vt : newt;
3275+
}
3276+
3277+
// find the longest type among use nodes.
3278+
for (uint i = 0; i < n->outcnt(); ++i) {
3279+
Node* output = n->raw_out(i);
3280+
if (!in_bb(output)) continue;
3281+
BasicType newt = longer_type_for_conversion(output);
3282+
vt = (newt == T_ILLEGAL) ? vt : newt;
3283+
}
3284+
3285+
int max = Matcher::max_vector_size(vt);
3286+
// If now there is no vectors for the longest type, the nodes with the longest
3287+
// type in the def-use chain are not packed in SuperWord::stmts_can_pack.
3288+
return max < 2 ? Matcher::max_vector_size(bt) : max;
3289+
}
3290+
31993291
//-------------------------compute_vector_element_type-----------------------
32003292
// Compute necessary vector element type for expressions
32013293
// This propagates backwards a narrower integer type when the

src/hotspot/share/opto/superword.hpp

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2007, 2020, Oracle and/or its affiliates. All rights reserved.
2+
* Copyright (c) 2007, 2022, Oracle and/or its affiliates. All rights reserved.
33
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
44
*
55
* This code is free software; you can redistribute it and/or modify it
@@ -528,6 +528,10 @@ class SuperWord : public ResourceObj {
528528
void bb_insert_after(Node* n, int pos);
529529
// Compute max depth for expressions from beginning of block
530530
void compute_max_depth();
531+
// Return the longer type for type-conversion node and return illegal type for other nodes.
532+
BasicType longer_type_for_conversion(Node* n);
533+
// Find the longest type in def-use chain for packed nodes, and then compute the max vector size.
534+
int max_vector_size_in_ud_chain(Node* n);
531535
// Compute necessary vector element type for expressions
532536
void compute_vector_element_type();
533537
// Are s1 and s2 in a pack pair and ordered as s1,s2?

src/hotspot/share/opto/vectornode.cpp

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -226,15 +226,6 @@ int VectorNode::opcode(int sopc, BasicType bt) {
226226
return Op_StoreVector;
227227
case Op_MulAddS2I:
228228
return Op_MulAddVS2VI;
229-
case Op_ConvI2F:
230-
return Op_VectorCastI2X;
231-
case Op_ConvL2D:
232-
return Op_VectorCastL2X;
233-
case Op_ConvF2I:
234-
return Op_VectorCastF2X;
235-
case Op_ConvD2L:
236-
return Op_VectorCastD2X;
237-
238229
default:
239230
return 0; // Unimplemented
240231
}
@@ -365,6 +356,26 @@ bool VectorNode::is_shift_opcode(int opc) {
365356
}
366357
}
367358

359+
bool VectorNode::is_convert_opcode(int opc) {
360+
switch (opc) {
361+
case Op_ConvI2F:
362+
case Op_ConvL2D:
363+
case Op_ConvF2I:
364+
case Op_ConvD2L:
365+
case Op_ConvI2D:
366+
case Op_ConvL2F:
367+
case Op_ConvL2I:
368+
case Op_ConvI2L:
369+
case Op_ConvF2L:
370+
case Op_ConvD2F:
371+
case Op_ConvF2D:
372+
case Op_ConvD2I:
373+
return true;
374+
default:
375+
return false;
376+
}
377+
}
378+
368379
bool VectorNode::is_shift(Node* n) {
369380
return is_shift_opcode(n->Opcode());
370381
}
@@ -1118,11 +1129,21 @@ int VectorCastNode::opcode(BasicType bt, bool is_signed) {
11181129
case T_FLOAT: return Op_VectorCastF2X;
11191130
case T_DOUBLE: return Op_VectorCastD2X;
11201131
default:
1121-
assert(false, "unknown type: %s", type2name(bt));
1132+
assert(bt == T_CHAR || bt == T_BOOLEAN, "unknown type: %s", type2name(bt));
11221133
return 0;
11231134
}
11241135
}
11251136

1137+
bool VectorCastNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) {
1138+
if (is_java_primitive(dst_type) &&
1139+
(vlen > 1) && is_power_of_2(vlen) &&
1140+
Matcher::vector_size_supported(dst_type, vlen)) {
1141+
int vopc = VectorCastNode::opcode(src_type);
1142+
return vopc > 0 && Matcher::match_rule_supported_vector(vopc, vlen, dst_type);
1143+
}
1144+
return false;
1145+
}
1146+
11261147
Node* VectorCastNode::Identity(PhaseGVN* phase) {
11271148
if (!in(1)->is_top()) {
11281149
BasicType in_bt = in(1)->bottom_type()->is_vect()->element_basic_type();

src/hotspot/share/opto/vectornode.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ class VectorNode : public TypeNode {
8080
static VectorNode* make_mask_node(int vopc, Node* n1, Node* n2, uint vlen, BasicType bt);
8181

8282
static bool is_shift_opcode(int opc);
83+
static bool is_convert_opcode(int opc);
8384

8485
static bool is_vshift_cnt_opcode(int opc);
8586

@@ -1469,7 +1470,7 @@ class VectorCastNode : public VectorNode {
14691470

14701471
static VectorCastNode* make(int vopc, Node* n1, BasicType bt, uint vlen);
14711472
static int opcode(BasicType bt, bool is_signed = true);
1472-
static bool implemented(BasicType bt, uint vlen);
1473+
static bool implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type);
14731474

14741475
virtual Node* Identity(PhaseGVN* phase);
14751476
};

0 commit comments

Comments
 (0)