-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Fix stack corruption due to argument mismatch #96386
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-offload Author: Sushant Gokhale (sushgokh) ChangesWhile lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper. At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming it has the argument count listed in the signature. AARCH64 asm sequence for this is as follows (removed unrelated insns): ` __tgt_target_data_update_nowait_mapper: ldp x29, x30, sp, #16 // 16-byte Folded Reload On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above). Full diff: https://github.com/llvm/llvm-project/pull/96386.diff 2 Files Affected:
diff --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 323dee41630f2..968589e866334 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -388,8 +388,7 @@ void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId,
void __tgt_target_data_update_nowait_mapper(
ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
void **Args, int64_t *ArgSizes, int64_t *ArgTypes, map_var_info_t *ArgNames,
- void **ArgMappers, int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
- void *NoAliasDepList);
+ void **ArgMappers);
// Performs the same actions as data_begin in case ArgNum is non-zero
// and initiates run of offloaded region on target platform; if ArgNum
diff --git a/offload/src/interface.cpp b/offload/src/interface.cpp
index 763b051cc6d77..27562388c2f11 100644
--- a/offload/src/interface.cpp
+++ b/offload/src/interface.cpp
@@ -207,8 +207,7 @@ EXTERN void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId,
EXTERN void __tgt_target_data_update_nowait_mapper(
ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
void **Args, int64_t *ArgSizes, int64_t *ArgTypes, map_var_info_t *ArgNames,
- void **ArgMappers, int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
- void *NoAliasDepList) {
+ void **ArgMappers) {
OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
targetData<TaskAsyncInfoWrapperTy>(
Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames,
|
offload/src/interface.cpp
Outdated
@@ -207,8 +207,7 @@ EXTERN void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId, | |||
EXTERN void __tgt_target_data_update_nowait_mapper( | |||
ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase, | |||
void **Args, int64_t *ArgSizes, int64_t *ArgTypes, map_var_info_t *ArgNames, | |||
void **ArgMappers, int32_t DepNum, void *DepList, int32_t NoAliasDepNum, | |||
void *NoAliasDepList) { | |||
void **ArgMappers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all unused? Seems weird that we're not using them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original design was from libomp
task interface, but we never use it.
However, I think we might want to keep them, as I definitely had a patch long time ago that uses them. A better fix would be to modify the front end to emit null value correspondingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think we might want to keep them, as I definitely had a patch long time ago that uses them. A better fix would be to modify the front end to emit null value correspondingly.
All the nowait calls, listed below, suffer from the same discrepancy
__tgt_target_data_begin_nowait_mapper
__tgt_target_data_end_nowait_mapper
__tgt_target_data_update_nowait_mapper
while their counterparts, derived here, only have 9 arguments
So, I am thinking of modifying all the nowait calls while addressing this specific issue.
@shiltian If you still think that modifying the front-end would be better here, could you tell me what changes are exactly required in the OffloadingArgs
here ? I hope this is what you are suggesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If it is nowait, push extra null values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, don't you need to update clang tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper. At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming it has the argument count listed in the signature. AARCH64 asm sequence for this is as follows (removed unrelated insns): .omp_task_entry..108: sub sp, sp, llvm#32 stp x29, x30, sp, llvm#16 // 16-byte Folded Spill add x29, sp, llvm#16 str x8, sp, llvm#8. // stack canary str xzr, [sp] bl __tgt_target_data_update_nowait_mapper __tgt_target_data_update_nowait_mapper: sub sp, sp, llvm#32 stp x29, x30, sp, llvm#16 // 16-byte Folded Spill add x29, sp, llvm#16 str x8, sp, llvm#8 // stack canary // Sibcall argument setup adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb] stp x9, x8, x29, llvm#16 adrp x8, .L.str.8 add x8, x8, :lo12:.L.str.8 str x8, x29, llvm#32. <==. This is the insn that erases $fp ldp x29, x30, sp, llvm#16 // 16-byte Folded Reload add sp, sp, llvm#32 // Sibcall b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments. This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above). Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper. At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming it has the argument count listed in the signature. AARCH64 asm sequence for this is as follows (removed unrelated insns): ` .omp_task_entry..108: sub sp, sp, llvm#32 stp x29, x30, sp, llvm#16 // 16-byte Folded Spill add x29, sp, llvm#16 str x8, sp, llvm#8. // stack canary str xzr, [sp] bl __tgt_target_data_update_nowait_mapper __tgt_target_data_update_nowait_mapper: sub sp, sp, llvm#32 stp x29, x30, sp, llvm#16 // 16-byte Folded Spill add x29, sp, llvm#16 str x8, sp, llvm#8 // stack canary // Sibcall argument setup adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb] stp x9, x8, x29, llvm#16 adrp x8, .L.str.8 add x8, x8, :lo12:.L.str.8 str x8, x29, llvm#32. <==. This is the insn that erases $fp ldp x29, x30, sp, llvm#16 // 16-byte Folded Reload add sp, sp, llvm#32 // Sibcall b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD ` On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments. This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above). Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper.
At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData() is converted to a sibcall assuming it has the argument count listed in the signature.
AARCH64 asm sequence for this is as follows (removed unrelated insns):
`
.omp_task_entry..108:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8. // stack canary
str xzr, [sp]
bl __tgt_target_data_update_nowait_mapper
__tgt_target_data_update_nowait_mapper:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8 // stack canary
// Sibcall argument setup
adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
stp x9, x8, x29, #16
adrp x8, .L.str.8
add x8, x8, :lo12:.L.str.8
str x8, x29, #32. <==. This is the insn that erases $fp
ldp x29, x30, sp, #16 // 16-byte Folded Reload
add sp, sp, #32
// Sibcall
b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`
On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.
This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above).