-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64: Implement I128 Loads and Stores #2985
aarch64: Implement I128 Loads and Stores #2985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good -- I very much like the clean generalization of the amode selection.
Just one request for a few more tests below but otherwise LGTM.
; run: %i128_stack_store_load_big_offset(0, -1) == true | ||
; run: %i128_stack_store_load_big_offset(0x01234567_89ABCDEF, 0xFEDCBA98_76543210) == true | ||
; run: %i128_stack_store_load_big_offset(0x06060606_06060606, 0xA00A00A0_0A00A00A) == true | ||
; run: %i128_stack_store_load_big_offset(0xC0FFEEEE_DECAFFFF, 0xDECAFFFF_C0FFEEEE) == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some test cases (compile-tests probably) that do basic load.i128
/ store.i128
as well? At least right now, stack_store
/ stack_load
expand to use these under the hood, but it's conceivable (especially if we try to address the codegen improvement ideas you suggest, i.e. treat sp-relative addresses differently) that this might not be true later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad this needed revisiting, because the previous code had a bug that only showed up with load.i128
!
I've added the separate load.i128
/store.i128
run tests, but i'm using stack_addr
to get a valid address. I think this may be a problem in the future, where we may just end up optimizing that pattern(stack_addr
+load
) into a stack_load
.
Do we have a better way of getting a R+W address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for adding this (and I'm glad it found an issue)!
I think this is fine -- there isn't, afaik, a way to allocate some other memory to read/write in a run-test (there's no runtime to call to dynamically allocate, for instance). It's possible we might try to optimize stack_addr
+ load
/store
in the future (to use sp
directly) but if we do that, we'll see test failures on these compile tests and then work out another solution at that point. Compile-test-only coverage is probably fine at that point, IMHO.
The previous address calculation code had a bug where we tried to add offsets into a temporary register before defining it, causing the regalloc to complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
; run: %i128_stack_store_load_big_offset(0, -1) == true | ||
; run: %i128_stack_store_load_big_offset(0x01234567_89ABCDEF, 0xFEDCBA98_76543210) == true | ||
; run: %i128_stack_store_load_big_offset(0x06060606_06060606, 0xA00A00A0_0A00A00A) == true | ||
; run: %i128_stack_store_load_big_offset(0xC0FFEEEE_DECAFFFF, 0xDECAFFFF_C0FFEEEE) == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for adding this (and I'm glad it found an issue)!
I think this is fine -- there isn't, afaik, a way to allocate some other memory to read/write in a run-test (there's no runtime to call to dynamically allocate, for instance). It's possible we might try to optimize stack_addr
+ load
/store
in the future (to use sp
directly) but if we do that, we'll see test failures on these compile tests and then work out another solution at that point. Compile-test-only coverage is probably fine at that point, IMHO.
Hey, this implements loading and storing i128 values in the aarch64 backend