Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support to mono for v128 constants #81902

Merged
merged 7 commits into from
Feb 15, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Feb 9, 2023

This resolves #81482

Notably it adds the general support for the feature, but doesn't make use of it "everywhere" that it could. It doesn't add verbose constant folding support nor does it recognize the more general cases of _Create, _CreateScalar, and _CreateScalarUnsafe yet. Those should be handled in a separate PR.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost assigned tannergooding Feb 9, 2023
@tannergooding tannergooding force-pushed the numerics-rewrite branch 6 times, most recently from fe252c3 to cbee606 Compare February 9, 2023 19:55
@tannergooding tannergooding marked this pull request as ready for review February 9, 2023 22:29
Comment on lines +7008 to +7013
case MONO_PATCH_INFO_X128:
case MONO_PATCH_INFO_X128_GOT:
encode_value (((guint32 *)patch_info->data.target) [(MINI_LS_WORD_IDX * 2) + MINI_LS_WORD_IDX], p, &p);
encode_value (((guint32 *)patch_info->data.target) [(MINI_LS_WORD_IDX * 2) + MINI_MS_WORD_IDX], p, &p);
encode_value (((guint32 *)patch_info->data.target) [(MINI_MS_WORD_IDX * 2) + MINI_LS_WORD_IDX], p, &p);
encode_value (((guint32 *)patch_info->data.target) [(MINI_MS_WORD_IDX * 2) + MINI_MS_WORD_IDX], p, &p);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, there isn't any SIMD support for BE architectures today.

It's possibly this needs to track the underlying element type and splat each T in memory correctly instead.

Comment on lines +8154 to +8160
else if (patch_info->type == MONO_PATCH_INFO_X128)
code_size += 16 + 15; /* sizeof (Vector128<T>) + alignment */
else if (patch_info->type == MONO_PATCH_INFO_R8)
code_size += 8 + 15; /* sizeof (double) + alignment */
if (patch_info->type == MONO_PATCH_INFO_R4)
else if (patch_info->type == MONO_PATCH_INFO_R4)
code_size += 4 + 15; /* sizeof (float) + alignment */
if (patch_info->type == MONO_PATCH_INFO_GC_CARD_TABLE_ADDR)
else if (patch_info->type == MONO_PATCH_INFO_GC_CARD_TABLE_ADDR)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the FIXME comment below, the alignment here only needs to be 8 for double and 4 for float. But some other logic was utilizing R4/R8 for packed SIMD instructions.

That should ideally be fixed and those updated to properly use X128 constants to save on space and help ensure correctness.

Comment on lines 7853 to 7862
if (!strcmp(class_name, "Vector64`1") || !strcmp (class_name, "Vector128`1") || !strcmp (class_name, "Vector256`1") || !strcmp (class_name, "Vector512`1")) {
MonoType *element_type = mono_class_get_context (ins->klass)->class_inst->type_argv [0];
etype = element_type->type;
ecount = mono_class_value_size (ins->klass, NULL) / mono_class_value_size (mono_class_from_mono_type_internal(element_type), NULL);
} else if (!strcmp(class_name, "Vector4") || !strcmp(class_name, "Plane") || !strcmp(class_name, "Quaternion")) {
etype = MONO_TYPE_R4;
ecount = 4;
} else {
g_assert_not_reached ();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to leave a more general comment that Mono currently has to do these class name and type checks to determine things.

It would be beneficial if there was a way to track that the simdType and simdBaseType. In RyuJIT, we have TYP_SIMD8/12/16/32/64 and we then track the primitive base type as a separate field.

This allows better codegen, better specialization, and all-over cheaper checks. Even if this was just some helper method that cached a klass to simd info lookup, I think it would improve/simplify things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a simd_class_to_llvm_type () function which can be used here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference @vargaz!

I'm not sure that really addresses the backing issue of there still needing to be string comparisons to do the checks in the first place and it doesn't help things on the MonoJIT side.

My comment was more about the general cost of going from a klass to the respective SimdType and SimdBaseType and how if Mono had a different way to track this it could be done cheaply once at import.

Subsequent handling, including simd_class_to_llvm_type could then be made cheaper.

Comment on lines +1284 to 1287
case MONO_PATCH_INFO_X128_GOT:
return hash | (guint32)*(double*)ji->data.target;
case MONO_PATCH_INFO_R8_GOT:
return hash | (guint32)*(double*)ji->data.target;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the hashes are currently being built for R4/R8, but they don't necessarily make "sense" to me.

This isn't a "good" hash since it just or's bits together, it doesn't take into account the "upper bits", and it relies on undefined behavior in the case the double or float is NaN/Infinity/out of range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the best, but its just used during JITting/AOTing, so it's not a problem in practice.

Comment on lines +8264 to +8288
case MONO_PATCH_INFO_X128: {
guint8 *pos, *patch_pos;
guint32 target_pos;

/* The SSE opcodes require a 16 byte alignment */
code = (guint8*)ALIGN_TO (code, 16);

pos = cfg->native_code + patch_info->ip.i;
if (IS_REX (pos [0])) {
patch_pos = pos + 4;
target_pos = GPTRDIFF_TO_UINT32 (code - pos - 8);
}
else {
patch_pos = pos + 3;
target_pos = GPTRDIFF_TO_UINT32 (code - pos - 7);
}

memcpy (code, patch_info->data.target, 16);
code += 16;

*(guint32*)(patch_pos) = target_pos;

remove = TRUE;
break;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original handling I added was the cause of the failures.

MonoJIT doesn't support the VEX encoding today (only the legacy encoding) and the R4/R8 patch point logic is effectively hardcoded to movss/movsd.

For that scenario, we have:

  • 1-byte prefix of 0xF3 (float) or 0xF2 (double)
  • Optional REX prefix of 0x44 if using extended registers (XMM8-XMM15)
  • 2-byte opcode of 0x0F 0x010
  • 1-byte modR/M byte
  • 4-byte RIP relative offset

So this the R4/R8 logic was getting the offset of the RIP relative offset so it could be patched.


For X128 we currently hardcode this to movups and so we instead have:

  • Optional REX prefix of 0x44 if using extended registers (XMM8-XMM15)
  • 2-byte opcode of 0x0F 0x010
  • 1-byte modR/M byte
  • 4-byte RIP relative offset

Thus it's 1-byte smaller. We could but don't want to emit movupd or movdqu as that's 1-byte larger for no benefit.

We could emit movaps instead, but there isn't any benefit to that on modern machines (anything since ~2011) and it strictly requires that the address be aligned, which is overall less flexible.

When the VEX support is added this all becomes a constant 4-bytes before the RIP relative offset is encoded (3-byte VEX encoded opcode + modR/M byte).

@fanyang-mono
Copy link
Member

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tannergooding
Copy link
Member Author

extra-platforms failures are all networking or other flaky tests that are also seeing failures in the mainline.

@tannergooding tannergooding merged commit db5dfad into dotnet:main Feb 15, 2023
@tannergooding tannergooding deleted the numerics-rewrite branch February 15, 2023 13:57
@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add handling for SIMD constants in Mono
3 participants