Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of coordination and overlapping work while implementing Intel hardware intrinsics #9656

Closed
4creators opened this issue Feb 4, 2018 · 11 comments

Comments

@4creators
Copy link
Contributor

@tannergooding @fiigii @CarolEidt

Unfortunately, we have quite a bit of duplicated work. Can we avoid in future unnecessary waste of our most precious resource which is time? I would appreciate it as we could be more efficient as a team in implementing intrinsics with a bit more of coordination and clear declarations of work item intents by team.

My impression was that we have agreed yesterday with @tannergooding ( https://github.com/dotnet/coreclr/issues/16153#issuecomment-362658148 and following answer ) that I will port tests to the new template and actually I have did that yesterday except for implementation of Load** which I wanted to do today/tomorrow or switch to older template not requiring Load instructions. In the meantime @tannergooding submitted dotnet/coreclr#16192 which duplicated my work.

@fiigii has implemented in dotnet/coreclr#16183 quite a bit of instructions which were already implemented in dotnet/coreclr#15777 but required only porting to table driven import and codegen.

The exact proposal for scope of my work was put forward in dotnet/coreclr#15585 (comment) and specifically was created according to @fiigii request of implementing first simple intrinsics and leaving Load and Store to be implemented by @fiigii .

My proposal would be that I would finish scalar and special memory Sse2 intrinsics implementation, plus Insert, Extract, Shuffle leaving all remaining work to team. If team deems it appropriate I could continue work on next ISAs/tests as mutually agreed.

@4creators
Copy link
Contributor Author

@sdmaclea I would be very interested and eager to help in implementing Arm intrinsics.

@fiigii
Copy link
Contributor

fiigii commented Feb 4, 2018

dotnet/coreclr#16183 will make the implementation of imm-intrinsic much simpler. SIMD shift* intrinsics are very special on semantics and encoding, so that implementing AVX2/SS2 counterparts can definitively avoid duplicating efforts.

@tannergooding
Copy link
Member

My impression was that we have agreed yesterday

I'm sorry if my response was confusing. I had intended from https://github.com/dotnet/coreclr/issues/16153#issuecomment-362666958 (which was made ~30 minutes after the comment you linked) to indicate that this work was already done on my box, other than rebasing/commit cleanup, and that any other work on it would likely be considered duplicate.

I had also meant to indicate that working on getting dotnet/coreclr#15777 resolved and moved to use the existing templates (where applicable) would be a good direction that wouldn't duplicate any work I'm currently doing.

I am currently focused on getting tests added (using templates where possible, which also involves creating new templates to cover the remaining tests) and validating that the implemented intrinsics don't have any obvious bugs, so that they don't have any issues with 2.1.

@tannergooding
Copy link
Member

I think there are currently two categories of intrinsics:

  • Those that can be implemented trivially
  • Those that still require infrastructure changes

The former mostly require adding a new entry to the table-driven framework, adding a line to the test-generation script, and clicking a button (this is generally 3-5 lines of code per intrinsic). These are easy for a single person to go and implement and will generally not overlap with anyone else's work (assuming two people aren't going to be working on the same ISA).

The latter (such as those implemented by dotnet/coreclr#16183) will almost certainly overlap between a PR attempting to add the new infrastructure and a PR attempting to implement an entire ISA or even two PRs attempting to implement separate ISAs. Since intrinsics in this category or other general infrastructure changes will almost always have a cross-ISA impact, it is difficult to prevent work duplicating in many cases.

Sometimes the APIs which fall into the latter category will be obvious. Other times, they will only come about after someone has started to work on them (as indicated by the multiple discussions had on various PRs). I believe the APIs implemented by dotnet/coreclr#16183 are the latter (there was a design decision to have a fallback for indirect calls, and it become clear that additional infrastructure work would be required).

@tannergooding
Copy link
Member

The exact proposal for scope of my work was put forward in dotnet/coreclr#15585 (comment) and specifically was created according to @fiigii request of implementing first simple intrinsics and leaving Load and Store to be implemented by @fiigii .

My proposal would be that I would finish scalar and special memory Sse2 intrinsics implementation, plus Insert, Extract, Shuffle leaving all remaining work to team. If team deems it appropriate I could continue work on next ISAs/tests as mutually agreed.

I had assumed that you are still going to complete dotnet/coreclr#15777 and most of the Sse2 ISA (unless you state otherwise). Many of these intrinsics (such as most scalar intrinsics) still fall into the "trivial" category.

There are likely still some intrinsics in the SSE2 ISA which fall into the "infrastructure" category (any that takes a non-fixed immediate, for example, probably falls under or should wait for dotnet/coreclr#16183). I'm not sure what all APIs these would be off the top of my head, but it likely won't be clear until someone ends up starting that work and finding out (it could probably be assumed that any from the SSE API which are not currently table driven fall into this category, however).

@4creators
Copy link
Contributor Author

@tannergooding Thank you for detailed explanation. My intention is to finish Sse2 and continue to work on any other ISA if this will be helpful. However, I would just like to better communicate with team and avoid duplicated work.

My proposal would be to communicate with others clearly intent of implementing part of ISA and do it in advance. This should be enough to get work well coordinated.

Closing as main point seems to be resolved.

@sdmaclea
Copy link
Contributor

sdmaclea commented Feb 5, 2018

@sdmaclea I would be very interested and eager to help in implementing Arm intrinsics.

@4creators

I will not be working on Arm32 intrinsics. So the playing field is wide open. Although Arm64 framework/implementation may guide Arm32.

As for Arm64 intrinsic your help would be much appreciated. 2.1 Intrinsics will be limited by time.

git grep theEmit | grep REG_V | less shows the set of SIMD assembly instructions which have emitters. Many of these do not yet have intrinsiscs.... There are many more emitters missing...

@CarolEidt
Copy link
Contributor

@4creators @fiigii @sdmaclea @tannergooding - I sincerely apologize for the issues arising from lack of coordination. I should have seen this earlier, and addressed it. Going forward, I propose:

  • I have created a "Hardware Intrinsics" project on this repo, where we can track issues and status. As new issues are added they can be associated with that project. I haven't used projects much, so bear with me as I figure it out.
  • As/before you begin to work on new intrinsics or tests, ensure that there is an issue created for it, and ping me so that I can tag it to the project and assign it to you.

Please let me know if there are other things I/we can do to help this go smoothly. And thanks for all the great work (and patience) thus far!

@4creators
Copy link
Contributor Author

@CarolEidt Thank you very much for support.

@4creators
Copy link
Contributor Author

@sdmaclea

Thank you for information about your plans. I would like to finish first Intel ISAs with @fiigii and @tannergooding and than start working on Arm unless you will need to help with reaching targets for 2.1 release - than I could start work in parallel to Intel ISA immediately..

@sdmaclea
Copy link
Contributor

sdmaclea commented Feb 5, 2018

@4creators Sure. 2.1 target is as much as I can get approved and implemented. Arm64 will not gate 2.1 release.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants