Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64/Windows] Simplify JIT_MemSet & JIT_MemCpy #17537

Merged
merged 1 commit into from
Apr 13, 2018

Conversation

sdmaclea
Copy link

JIT_Memset alignment code was definitly broken for some
unaligned cases

JIT_MemCpy likely had the same issue

Simplify implementation to reduce maintenance burden

@janvorli @jkotas PTAL
@dotnet/arm64-contrib FYI

This is the Windows equivalent of #17536

Should be required to fix probable issues on windows like #17167, #17168, & #17169

JIT_Memset alignment code was definitly broken for some
unaligned cases

JIT_MemCpy likely had the same issue

Simplify implementation to reduce maintenance burden
Copy link
Author

@sdmaclea sdmaclea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a Windows Arm64 platform to test this. Relying on CI

; We need to provide our own implementation of memset instead of using the ones in crt because crt implementation does not gurantee
; that aligned 8/4/2 - byte memory will be written atomically. This is required because members in a struct can be read atomically
; and their values should be written atomically.
;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove misleading bogus comment. there is no reason to believe these accesses need to be aligned on arm64 Windows when they are not aligned on other platforms

; // Use math to avoid introducing more unpredictable branches
; // Due to inherent mod in lsr, ~7 is used instead of ~0 to handle count == 0
; // Note logic will fail is count >= (1 << 61). But this exceeds max physical memory for arm64
; uint8_t align = (dst & 0x7) & (~uint64_t(7) >> (countLeadingZeros(count) mod 64))
Copy link
Author

@sdmaclea sdmaclea Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two bugs in the pseudo code for calculating align. The correct calculation should have been

uint8_t align = 
    (8 - (dst & 0x7)) & (~uint64_t(7) >> (countLeadingZeros(count/2) % 64))

This escaped due to incomplete unit testing. Test coverage also failed to hit the problematic corners.

b JIT_MemSet_0xa8
JIT_MemSet_0xa0
stp x8, x8, [x0], #16
JIT_MemSet_0xa8
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete code to match pseudo code

;
; count += dc_zva_size;
; }
;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove attempts to align dst and attempts to use dc zva.

Based on comments elsewhere the distribution of use cases tends to be biased to toward small objects. 2013 experiments showed 95% smaller than 64 bytes. Use of dc zva add complexity and a series of branches which may not be predictable. It could not be used for A53, A57 or Centriq until the length was 128 bytes in this design.

Removing dc zva is probably a win. Adding it was probably a mistake on my part.

ands w1, w1, #0xff
orr w1, w1, w1, lsl #8
orr w1, w1, w1, lsl #0x10
orr x1, x1, x1, lsl #0x20
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped using w8. Use w1 instead.

subs x2, x2, #16
bge JIT_MemSet_0xa0

JIT_MemSet_0xb0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used readable labels

; count-=4;
; }
; }
;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove attempts to align

str w8, [x0], #4
sub x2, x2, #4
b JIT_MemCpy_0xa8
JIT_MemCpy_0xa0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting to match pseudo code.

@adityamandaleeka
Copy link
Member

adityamandaleeka commented Apr 13, 2018

LGTM, thanks for simplifying the code and fixing the bugs around alignment.

@jkotas jkotas merged commit bba18ae into dotnet:master Apr 13, 2018
@sdmaclea sdmaclea deleted the PR-ARM64-WINDOWS-MEMx-Simplify branch April 16, 2018 16:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants