-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PIR OpTest Fix No.30】 test_assign_pos_op #62863
【PIR OpTest Fix No.30】 test_assign_pos_op #62863
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
单看代码看不出什么,麻烦 @xingmingyyj 帮忙看下呢 |
@xingmingyyj 麻烦帮忙看一下这种单测出现结果错误可能是哪里导致的呢,非常感谢! |
在看了 |
打开FLAGS_PIR_OPTEST和FLAGS_PIR_OPTEST_WHITE_LIST的测试逻辑是先在旧IR下执行一得到旧IR下的执行结果,然后再在新IR下执行一次,对比二者的结果。这部分逻辑在test/legacy_test/op_test.py里。现在在旧IR下执行时,在assign_pos的kernel里会修改输入 template <typename T>
__global__ void AssignPos(T* cum_count,
const T* numbers,
T* out,
int64_t limit) {
CUDA_KERNEL_LOOP(i, limit) {
int number_idx = numbers[i];
if (number_idx > -1) {
int p = phi::CudaAtomicAdd(cum_count + number_idx, -1);
out[p - 1] = i;
}
}
} 这样第二次在新IR下执行时使用的 |
我明白了,也即assign_pos在旧IR执行kernel时在使用cum_count进行原子加操作时改变了内存地址上的数值,导致后来新IR复用此地址cum_count作为输入时已经不是原本的cum_count,非常感谢大佬的解答,我以后也多关注一下算子方面,之前一直只想着类型问题了/(ㄒoㄒ)/~~ |
是这样的,这类kernel相关的问题还是第一次遇到,之前遇到的精度问题一般都是考虑数值类型或者kernel选择不一致。 |
所以大佬觉得有必要在单测文件中将同一个inputs分成两个不同的place再分别feed_var给新旧IR吗,这是否能解决这个问题呢? |
这里我觉着也可能是kernel的问题,cum_count能不能被修改呢?如果修改是合理的话,就是现在加入no_check就可以了,当作特殊的测试处理。这个需要和研发老师@kangguangli确认一下。 |
好嘞了解了 |
这个任务不计划对kernel的实现做任何修改,专注于算子注册上面。对于发现的问题,可以提个issue单独讨论下。对于这个算子,需要分布式同学来确认下,我看提交记录,是 @ForFishes 合入的assign_pos,@ForFishes 可以确认下这里对cum_count的修改是否符合预期吗? |
Co-authored-by: kangguangli <kangguangli@hotmail.com>
好嘞了解了,已经修改了,非常感谢! |
可以rerun一下CI,推进合入了 |
phi::errors::InvalidArgument( | ||
"The dtype of the cum_count_dtype, eff_num_len and " | ||
"X should be same as int64")); | ||
out->set_dtype(X_dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不需要设置ddim吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原本是空的,这里我觉得不改变现状比较好,后续有优化需求单独做
Sorry to inform you that edc986c's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
* fix assign_pos * update fix * fix bug * fix codestyle * Update paddle/phi/api/yaml/op_compat.yaml Co-authored-by: kangguangli <kangguangli@hotmail.com> --------- Co-authored-by: kangguangli <kangguangli@hotmail.com>
* fix assign_pos * update fix * fix bug * fix codestyle * Update paddle/phi/api/yaml/op_compat.yaml Co-authored-by: kangguangli <kangguangli@hotmail.com> --------- Co-authored-by: kangguangli <kangguangli@hotmail.com>
PR Category
Execute Infrastructure
PR Types
Devs
Description
PIR Op单测修复
修复单测
test_assign_pos_op
修复后打开
FLAGS_enable_pir_in_executor
单测是否通过:否修复后未打开FLAGS_PIR_OPTEST和FLAGS_PIR_OPTEST_WHITE_LIST可通过全部单侧,打开两个FLAG后三个单侧中的一个TestAssignPosOpInt64无法通过,怀疑问题出在eff_num_len的处理上,使用astype对结果无改变,使用np.ascontiguousarray可对结果有改变但仍无法通过单侧,麻烦看一下这个是否是转换中出现的问题~