-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support to mono for v128 constants #81902
Conversation
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
fe252c3
to
cbee606
Compare
cbee606
to
f3f0340
Compare
case MONO_PATCH_INFO_X128: | ||
case MONO_PATCH_INFO_X128_GOT: | ||
encode_value (((guint32 *)patch_info->data.target) [(MINI_LS_WORD_IDX * 2) + MINI_LS_WORD_IDX], p, &p); | ||
encode_value (((guint32 *)patch_info->data.target) [(MINI_LS_WORD_IDX * 2) + MINI_MS_WORD_IDX], p, &p); | ||
encode_value (((guint32 *)patch_info->data.target) [(MINI_MS_WORD_IDX * 2) + MINI_LS_WORD_IDX], p, &p); | ||
encode_value (((guint32 *)patch_info->data.target) [(MINI_MS_WORD_IDX * 2) + MINI_MS_WORD_IDX], p, &p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, there isn't any SIMD support for BE architectures today.
It's possibly this needs to track the underlying element type and splat each T
in memory correctly instead.
else if (patch_info->type == MONO_PATCH_INFO_X128) | ||
code_size += 16 + 15; /* sizeof (Vector128<T>) + alignment */ | ||
else if (patch_info->type == MONO_PATCH_INFO_R8) | ||
code_size += 8 + 15; /* sizeof (double) + alignment */ | ||
if (patch_info->type == MONO_PATCH_INFO_R4) | ||
else if (patch_info->type == MONO_PATCH_INFO_R4) | ||
code_size += 4 + 15; /* sizeof (float) + alignment */ | ||
if (patch_info->type == MONO_PATCH_INFO_GC_CARD_TABLE_ADDR) | ||
else if (patch_info->type == MONO_PATCH_INFO_GC_CARD_TABLE_ADDR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the FIXME comment below, the alignment here only needs to be 8
for double
and 4
for float
. But some other logic was utilizing R4/R8 for packed SIMD instructions.
That should ideally be fixed and those updated to properly use X128 constants to save on space and help ensure correctness.
src/mono/mono/mini/mini-llvm.c
Outdated
if (!strcmp(class_name, "Vector64`1") || !strcmp (class_name, "Vector128`1") || !strcmp (class_name, "Vector256`1") || !strcmp (class_name, "Vector512`1")) { | ||
MonoType *element_type = mono_class_get_context (ins->klass)->class_inst->type_argv [0]; | ||
etype = element_type->type; | ||
ecount = mono_class_value_size (ins->klass, NULL) / mono_class_value_size (mono_class_from_mono_type_internal(element_type), NULL); | ||
} else if (!strcmp(class_name, "Vector4") || !strcmp(class_name, "Plane") || !strcmp(class_name, "Quaternion")) { | ||
etype = MONO_TYPE_R4; | ||
ecount = 4; | ||
} else { | ||
g_assert_not_reached (); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to leave a more general comment that Mono currently has to do these class name and type checks to determine things.
It would be beneficial if there was a way to track that the simdType
and simdBaseType
. In RyuJIT, we have TYP_SIMD8/12/16/32/64
and we then track the primitive base type as a separate field.
This allows better codegen, better specialization, and all-over cheaper checks. Even if this was just some helper method that cached a klass
to simd info
lookup, I think it would improve/simplify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a simd_class_to_llvm_type () function which can be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference @vargaz!
I'm not sure that really addresses the backing issue of there still needing to be string comparisons to do the checks in the first place and it doesn't help things on the MonoJIT side.
My comment was more about the general cost of going from a klass
to the respective SimdType
and SimdBaseType
and how if Mono had a different way to track this it could be done cheaply once at import.
Subsequent handling, including simd_class_to_llvm_type
could then be made cheaper.
case MONO_PATCH_INFO_X128_GOT: | ||
return hash | (guint32)*(double*)ji->data.target; | ||
case MONO_PATCH_INFO_R8_GOT: | ||
return hash | (guint32)*(double*)ji->data.target; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the hashes are currently being built for R4/R8, but they don't necessarily make "sense" to me.
This isn't a "good" hash since it just or's
bits together, it doesn't take into account the "upper bits", and it relies on undefined behavior in the case the double
or float
is NaN
/Infinity
/out of range
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the best, but its just used during JITting/AOTing, so it's not a problem in practice.
case MONO_PATCH_INFO_X128: { | ||
guint8 *pos, *patch_pos; | ||
guint32 target_pos; | ||
|
||
/* The SSE opcodes require a 16 byte alignment */ | ||
code = (guint8*)ALIGN_TO (code, 16); | ||
|
||
pos = cfg->native_code + patch_info->ip.i; | ||
if (IS_REX (pos [0])) { | ||
patch_pos = pos + 4; | ||
target_pos = GPTRDIFF_TO_UINT32 (code - pos - 8); | ||
} | ||
else { | ||
patch_pos = pos + 3; | ||
target_pos = GPTRDIFF_TO_UINT32 (code - pos - 7); | ||
} | ||
|
||
memcpy (code, patch_info->data.target, 16); | ||
code += 16; | ||
|
||
*(guint32*)(patch_pos) = target_pos; | ||
|
||
remove = TRUE; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original handling I added was the cause of the failures.
MonoJIT doesn't support the VEX encoding today (only the legacy encoding) and the R4/R8
patch point logic is effectively hardcoded to movss/movsd
.
For that scenario, we have:
- 1-byte prefix of
0xF3
(float) or0xF2
(double) - Optional REX prefix of
0x44
if using extended registers (XMM8-XMM15) - 2-byte opcode of
0x0F 0x010
- 1-byte modR/M byte
- 4-byte RIP relative offset
So this the R4/R8
logic was getting the offset of the RIP relative offset so it could be patched.
For X128
we currently hardcode this to movups
and so we instead have:
- Optional REX prefix of
0x44
if using extended registers (XMM8-XMM15) - 2-byte opcode of
0x0F 0x010
- 1-byte modR/M byte
- 4-byte RIP relative offset
Thus it's 1-byte smaller. We could but don't want to emit movupd
or movdqu
as that's 1-byte larger for no benefit.
We could emit movaps
instead, but there isn't any benefit to that on modern machines (anything since ~2011) and it strictly requires that the address be aligned, which is overall less flexible.
When the VEX support is added this all becomes a constant 4-bytes
before the RIP relative offset is encoded (3-byte VEX encoded opcode + modR/M byte).
1afa16f
to
cef739e
Compare
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
extra-platforms failures are all networking or other flaky tests that are also seeing failures in the mainline. |
This resolves #81482
Notably it adds the general support for the feature, but doesn't make use of it "everywhere" that it could. It doesn't add verbose constant folding support nor does it recognize the more general cases of
_Create
,_CreateScalar
, and_CreateScalarUnsafe
yet. Those should be handled in a separate PR.