Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Reduce struct copies in ValueTask.GetAwaiter #22735

Closed
wants to merge 2 commits into from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Feb 21, 2019

Work around for part of https://github.com/dotnet/coreclr/issues/18542#issuecomment-465621796

Using Unsafe casting rather than .ctors

Cuts

; Lcl frame size = 408
G_M43654_IG04:
       mov      rdx, bword ptr [rbp+10H]
       mov      rcx, gword ptr [rdx+8]
       lea      rdx, bword ptr [rbp-E0H]
       xor      r8, r8
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+64]
       call     qword ptr [rax+40]PipeReader:ReadAsync(struct):struct:this

G_M43654_IG05:
       vmovdqu  xmm0, qword ptr [rbp-E0H]
       vmovdqu  qword ptr [rbp-158H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-D0H]
       vmovdqu  qword ptr [rbp-148H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-C0H]
       vmovdqu  qword ptr [rbp-138H], xmm0
       mov      rdx, qword ptr [rbp-B0H]
       mov      qword ptr [rbp-128H], rdx

G_M43654_IG06:
       vmovdqu  xmm0, qword ptr [rbp-158H]
       vmovdqu  qword ptr [rbp-120H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-148H]
       vmovdqu  qword ptr [rbp-110H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-138H]
       vmovdqu  qword ptr [rbp-100H], xmm0
       mov      rdx, qword ptr [rbp-128H]
       mov      qword ptr [rbp-F0H], rdx

G_M43661_IG07:
       vmovdqu  xmm0, qword ptr [rbp-120H]
       vmovdqu  qword ptr [rbp-A8H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-110H]
       vmovdqu  qword ptr [rbp-98H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-100H]
       vmovdqu  qword ptr [rbp-88H], xmm0
       mov      rdx, qword ptr [rbp-F0H]
       mov      qword ptr [rbp-78H], rdx

G_M43663_IG08:
       mov      rsi, gword ptr [rbp-A8H]
       test     rsi, rsi
       jne      SHORT G_M43663_IG09
       mov      edi, 1
       jmp      SHORT G_M43663_IG11

; Total bytes of code 1525, prolog size 59 for method <ProcessSends>d__26:MoveNext():this

To

; Lcl frame size = 296
G_M43654_IG04:
       mov      rdx, bword ptr [rbp+10H]
       mov      rcx, gword ptr [rdx+8]
       lea      rdx, bword ptr [rbp-E0H]
       xor      r8, r8
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+64]
       call     qword ptr [rax+40]PipeReader:ReadAsync(struct):struct:this

G_M43664_IG05:
       vmovdqu  xmm0, qword ptr [rbp-E0H]
       vmovdqu  qword ptr [rbp-A8H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-D0H]
       vmovdqu  qword ptr [rbp-98H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-C0H]
       vmovdqu  qword ptr [rbp-88H], xmm0
       mov      rdx, qword ptr [rbp-B0H]
       mov      qword ptr [rbp-78H], rdx

G_M43664_IG06:
       mov      rsi, gword ptr [rbp-A8H]
       test     rsi, rsi
       jne      SHORT G_M43664_IG07
       mov      edi, 1
       jmp      SHORT G_M43664_IG09

; Total bytes of code 1401, prolog size 59 for method <ProcessSends>d__26:MoveNext():this

Acceptable workaround?

/cc @stephentoub @jkotas @CarolEidt @AndyAyersMS @mikedn

@stephentoub
Copy link
Member

Does this make a massive difference in a key end-to-end benchmark? If not, I'd really prefer waiting for a proper fix.

@jkotas
Copy link
Member

jkotas commented Feb 21, 2019

Changing ValueTaskAwaiter(ValueTask value) to ValueTaskAwaiter(in ValueTask value) should help too (maybe not as much), without unsafe code.

@stephentoub
Copy link
Member

Changing ValueTaskAwaiter(ValueTask value) to ValueTaskAwaiter(in ValueTask value) should help too

I'd be fine with that, assuming it does help. Presumably that could be done for the configured-variations as well.

@benaadams
Copy link
Member Author

benaadams commented Feb 21, 2019

Changing ValueTaskAwaiter(ValueTask value) to ValueTaskAwaiter(in ValueTask value) should help too (maybe not as much)

Gets rid of one of the copies

       mov      rdx, bword ptr [rbp+10H]
       mov      rcx, gword ptr [rdx+8]
       lea      rdx, bword ptr [rbp-E0H]
       xor      r8, r8
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+64]
       call     qword ptr [rax+40]PipeReader:ReadAsync(struct):struct:this

G_M43655_IG05:
       vmovdqu  xmm0, qword ptr [rbp-E0H]
       vmovdqu  qword ptr [rbp-120H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-D0H]
       vmovdqu  qword ptr [rbp-110H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-C0H]
       vmovdqu  qword ptr [rbp-100H], xmm0
       mov      rdx, qword ptr [rbp-B0H]
       mov      qword ptr [rbp-F0H], rdx

G_M43661_IG06:
       vmovdqu  xmm0, qword ptr [rbp-120H]
       vmovdqu  qword ptr [rbp-A8H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-110H]
       vmovdqu  qword ptr [rbp-98H], xmm0
       vmovdqu  xmm0, qword ptr [rbp-100H]
       vmovdqu  qword ptr [rbp-88H], xmm0
       mov      rdx, qword ptr [rbp-F0H]
       mov      qword ptr [rbp-78H], rdx

G_M43663_IG07:
       mov      rsi, gword ptr [rbp-A8H]
       test     rsi, rsi
       jne      SHORT G_M43663_IG08
       mov      edi, 1
       jmp      SHORT G_M43663_IG10

Will do a PR for that

@benaadams
Copy link
Member Author

Safe version: #22738

@benaadams
Copy link
Member Author

Presumably that could be done for the configured-variations as well.

Looks like ConfiguredValueTaskAwaitable; ConfiguredValueTaskAwaiter and ValueTask are all the same layout, so this could work too 😉

@stephentoub
Copy link
Member

Looks like ConfiguredValueTaskAwaitable; ConfiguredValueTaskAwaiter and ValueTask are all the same layout

Ah, right, I forgot I stored the bool in the ValueTask itself to take advantage of the padding space.

@benaadams
Copy link
Member Author

If not, I'd really prefer waiting for a proper fix.

Ideally the Jit would use the return assignment as the stack it gave to the method call and remove the final copy so end up as:

G_M43654_IG04:
       mov      rdx, bword ptr [rbp+10H]
       mov      rcx, gword ptr [rdx+8]
       lea      rdx, bword ptr [rbp-A8H]   ; A8H rather than E0H
       xor      r8, r8
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+64]
       call     qword ptr [rax+40]PipeReader:ReadAsync(struct):struct:this

G_M43664_IG06:
       mov      rsi, gword ptr [rbp-A8H]
       test     rsi, rsi
       jne      SHORT G_M43664_IG07
       mov      edi, 1
       jmp      SHORT G_M43664_IG09

@benaadams
Copy link
Member Author

Ideally the Jit would use the return assignment as the stack it gave to the method call and remove the final copy so end up as...

@mikedn points out in "Out-Returns" dotnet/csharplang#2217 (comment) that causes compilations if the called method throws an exception and it ends up half initialized.

@mikedn
Copy link

mikedn commented Feb 21, 2019

I see, structs containing a single field... these tend to give JIT headaches. Still, some of this stuff could probably be fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants