-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ref T overloads for hardware intrinsic functions which takes T* #36182
Comments
I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label. |
These were, iirc, explicitly disregarded because they want intrinsics to have to be explicit in nature. You can workaround it with reference casting, which generates pretty great asm public Vector128<T> ReadVector128<T>(ref T start) where T : struct
=> Unsafe.As<T, Vector128<T>>(ref start); You can see this approach being used in related convo:
|
Tagging subscribers to this area: @tannergooding |
I thought we had an existing issue for this, but I can't seem to find it right now...
Pinning is relatively low overhead, it effectively just involves storing a value to the stack and zeroing that location when done. While it can have overhead if you are pinning in a tight loop, it is no more expensive than stack spilling in general. Given the most likely scenario with HWIntrinsics is you are working with large amounts of data, pinning your data once upfront should be negligible in cost. Pinning when you have smaller amounts of data may have a larger relative impact, but it would ideally still not be noticeable due to being on small amounts of data. Instead, it would likely only be an issue if you are working with a large number of small inputs. The downside to pinning is that it can inhibit the GC and indirectly impact performance if a compaction happens. However,
This could somewhat be mitigated by taking a You could mitigate the Last time I discussed this with @jkotas and @CarolEidt, it was determined that exposing overloads that took |
I also meant System.Runtime.Intrinsics.X86.Avx2.GatherXXX.
I think hardware intrinsic is already inherently unsafe. BTW,Unsafe.As<T,Vector> approach could cause DataMisalignedException on ARM platform. |
To the best of my knowledge, the only APIs in This is the same reason we don't use |
Yeah,ref T overload could be named Unsafe.
So,why not intrinsic functions become exception?
If we could avoid pinning in a tight loop,we could use SIMD operations effectively with extern methods. |
Can you provide concrete examples (with pseudo-code) of scenarios that would benefit from a |
@GrabYourPitchforks, we use The use-case definitely exists, I think it's just a question of whether the existing support via |
Right, I'm aware of our internal use cases. My main point was that the existing API surface (even if you have to mix + match API calls) appears to be sufficient for our needs. I can't think offhand of a scenario which is blocked because of missing APIs. |
@GrabYourPitchforks the one thing i would say is that |
Right, there should be nothing blocked by these APIs not existing. It's mostly a question of convenience. |
I said...
GatherXXX 's missing ref T overloads are obviously blocking scenario. This is not replacable with any method,such as |
No, but it is replaceable with Nothing is blocked, it just isn't as "ideal". |
So then,I didn't mean question of convenience. |
#36323 (comment) is another case where If we had the Unsafe.SkipInit(out result);
Sse.StoreVector128(ref result.M11, row1);
Sse.StoreVector128(ref result.M21, row2);
Sse.StoreVector128(ref result.M31, row3);
Sse.StoreVector128(ref result.M41, row4); |
I didnt previously meant convenience,but it is good. |
@tannergooding I marked this as "Future". But should we keep it at all, or should we close it because we're likely to ever implement it? |
We've had a few asks on this now so I don't think we should close it outright. There are scenarios where this would be beneficial and where it might simplify some code, such as the various places we are using We should probably do some more analysis to determine how frequently this is needed and what scenarios would benefit. |
This has been minimally done via the new We could potentially expose other |
Currently,these are some hardware intrinsic functions which takes T*.
But requiring T* means requires pinning.
Obliviously,hardware intrinsic functions are used for performance,
but pinning will make lower performance.
The text was updated successfully, but these errors were encountered: