-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of coordination and overlapping work while implementing Intel hardware intrinsics #9656
Comments
@sdmaclea I would be very interested and eager to help in implementing Arm intrinsics. |
dotnet/coreclr#16183 will make the implementation of imm-intrinsic much simpler. SIMD shift* intrinsics are very special on semantics and encoding, so that implementing AVX2/SS2 counterparts can definitively avoid duplicating efforts. |
I'm sorry if my response was confusing. I had intended from https://github.com/dotnet/coreclr/issues/16153#issuecomment-362666958 (which was made ~30 minutes after the comment you linked) to indicate that this work was already done on my box, other than rebasing/commit cleanup, and that any other work on it would likely be considered duplicate. I had also meant to indicate that working on getting dotnet/coreclr#15777 resolved and moved to use the existing templates (where applicable) would be a good direction that wouldn't duplicate any work I'm currently doing. I am currently focused on getting tests added (using templates where possible, which also involves creating new templates to cover the remaining tests) and validating that the implemented intrinsics don't have any obvious bugs, so that they don't have any issues with 2.1. |
I think there are currently two categories of intrinsics:
The former mostly require adding a new entry to the table-driven framework, adding a line to the test-generation script, and clicking a button (this is generally 3-5 lines of code per intrinsic). These are easy for a single person to go and implement and will generally not overlap with anyone else's work (assuming two people aren't going to be working on the same ISA). The latter (such as those implemented by dotnet/coreclr#16183) will almost certainly overlap between a PR attempting to add the new infrastructure and a PR attempting to implement an entire ISA or even two PRs attempting to implement separate ISAs. Since intrinsics in this category or other general infrastructure changes will almost always have a cross-ISA impact, it is difficult to prevent work duplicating in many cases. Sometimes the APIs which fall into the latter category will be obvious. Other times, they will only come about after someone has started to work on them (as indicated by the multiple discussions had on various PRs). I believe the APIs implemented by dotnet/coreclr#16183 are the latter (there was a design decision to have a fallback for indirect calls, and it become clear that additional infrastructure work would be required). |
I had assumed that you are still going to complete dotnet/coreclr#15777 and most of the Sse2 ISA (unless you state otherwise). Many of these intrinsics (such as most scalar intrinsics) still fall into the "trivial" category. There are likely still some intrinsics in the SSE2 ISA which fall into the "infrastructure" category (any that takes a non-fixed immediate, for example, probably falls under or should wait for dotnet/coreclr#16183). I'm not sure what all APIs these would be off the top of my head, but it likely won't be clear until someone ends up starting that work and finding out (it could probably be assumed that any from the SSE API which are not currently table driven fall into this category, however). |
@tannergooding Thank you for detailed explanation. My intention is to finish Sse2 and continue to work on any other ISA if this will be helpful. However, I would just like to better communicate with team and avoid duplicated work. My proposal would be to communicate with others clearly intent of implementing part of ISA and do it in advance. This should be enough to get work well coordinated. Closing as main point seems to be resolved. |
I will not be working on Arm32 intrinsics. So the playing field is wide open. Although Arm64 framework/implementation may guide Arm32. As for Arm64 intrinsic your help would be much appreciated. 2.1 Intrinsics will be limited by time.
|
@4creators @fiigii @sdmaclea @tannergooding - I sincerely apologize for the issues arising from lack of coordination. I should have seen this earlier, and addressed it. Going forward, I propose:
Please let me know if there are other things I/we can do to help this go smoothly. And thanks for all the great work (and patience) thus far! |
@CarolEidt Thank you very much for support. |
Thank you for information about your plans. I would like to finish first Intel ISAs with @fiigii and @tannergooding and than start working on Arm unless you will need to help with reaching targets for 2.1 release - than I could start work in parallel to Intel ISA immediately.. |
@4creators Sure. 2.1 target is as much as I can get approved and implemented. Arm64 will not gate 2.1 release. |
@tannergooding @fiigii @CarolEidt
Unfortunately, we have quite a bit of duplicated work. Can we avoid in future unnecessary waste of our most precious resource which is time? I would appreciate it as we could be more efficient as a team in implementing intrinsics with a bit more of coordination and clear declarations of work item intents by team.
My impression was that we have agreed yesterday with @tannergooding ( https://github.com/dotnet/coreclr/issues/16153#issuecomment-362658148 and following answer ) that I will port tests to the new template and actually I have did that yesterday except for implementation of Load** which I wanted to do today/tomorrow or switch to older template not requiring
Load
instructions. In the meantime @tannergooding submitted dotnet/coreclr#16192 which duplicated my work.@fiigii has implemented in dotnet/coreclr#16183 quite a bit of instructions which were already implemented in dotnet/coreclr#15777 but required only porting to table driven import and codegen.
The exact proposal for scope of my work was put forward in dotnet/coreclr#15585 (comment) and specifically was created according to @fiigii request of implementing first simple intrinsics and leaving
Load
andStore
to be implemented by @fiigii .My proposal would be that I would finish scalar and special memory Sse2 intrinsics implementation, plus
Insert
,Extract
,Shuffle
leaving all remaining work to team. If team deems it appropriate I could continue work on next ISAs/tests as mutually agreed.The text was updated successfully, but these errors were encountered: