-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add index_put api #52886
add index_put api #52886
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
… clear_add_index_put_api
… clear_add_index_put_api
… less than x.dims
… clear_add_index_put_api
#include "paddle/phi/kernels/index_put_grad_kernel.h" | ||
#include <numeric> | ||
#include "paddle/phi/backends/gpu/gpu_context.h" | ||
#include "paddle/phi/backends/gpu/gpu_launch_config.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpu kernel里面不需要加这些gpu相关的头文件
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
int64_t offset = 0; | ||
|
||
for (size_t i = 0; i < Rank; ++i) { | ||
cur_ix = (int64_t(*(indices[i] + idx))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
数据类型转换用static_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
value_grad->dtype(), | ||
false, | ||
&value_grad_dims_without1); | ||
phi::ReshapeInferKernel<Context>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的目的是对value_grad
做Resize
吧?value_grad
是输出,直接用value_grad->Resize(...)
就行?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value_grad的size不能调用resize变化的,value_grad的dims会影响到反向梯度的shape,需保持与前向的value的shape一致
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但你ReshapeInferKernel
的调用,不也会修改value_grad
的shape吗?我的意思是,在L190再调用一次value_grad->Resize
,直接再次设置value_grad
的shape,也可避免ReshapeInferKernel
中的一次memcpy。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的ReshapeInferKernel本身并没有修改value_grad的shape
template <typename T, size_t Rank> | ||
void set_zero_kernel(const int64_t N, | ||
const int64_t** indices, | ||
phi::Array<int64_t, Rank> stride, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU Kernel没必要用phi::Array
,直接用const std::vector<int64_t>&
或const phi::DDim&
类型就行,还能避免拷贝。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
|
||
template <typename T, typename Context, size_t Rank> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU Kernel就不要将Rank
作为模板了,你单测覆盖率没过正式因为Rank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#include "paddle/phi/kernels/expand_kernel.h" | ||
#include "paddle/phi/kernels/nonzero_kernel.h" | ||
#include "paddle/phi/kernels/reshape_kernel.h" | ||
#include "paddle/phi/kernels/split_kernel.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要include
这么多头文件
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/phi/kernels/index_put_utils.h
Outdated
|
||
#include <vector> | ||
#include "paddle/fluid/memory/malloc.h" | ||
#include "paddle/fluid/memory/memcpy.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要include
fluid下面的头文件,使用phi
目录下的代替
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -916,6 +916,7 @@ set_tests_properties(test_imperative_selected_rows_to_lod_tensor | |||
PROPERTIES TIMEOUT 200) | |||
set_tests_properties(test_index_select_op PROPERTIES TIMEOUT 120) | |||
set_tests_properties(test_index_add_op PROPERTIES TIMEOUT 120) | |||
set_tests_properties(test_index_put_op PROPERTIES TIMEOUT 120) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个TIMEOUT
一定要设置吗,默认是多少?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不设置的话,CI会超时,和CI的同学确认过了,默认的值的话,很小,貌似不过15s,当时CI报错超过15s直接timeout了,具体是多少不确定
@@ -0,0 +1,826 @@ | |||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2022 -> 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/tensor/manipulation.py
Outdated
|
||
|
||
""" | ||
assert len(indices) != 0, "indices can't be empty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
动态图可不加assert,算子内部负责检查吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Sorry to inform you that 7b71a3a's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
paddle/phi/infermeta/multiary.cc
Outdated
@@ -3249,6 +3249,21 @@ void MoeInferMeta(const MetaTensor& x, | |||
out->set_layout(x.layout()); | |||
} | |||
|
|||
void IndexPutInferMeta(const MetaTensor& x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InferMeta按照字母序放置
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#pragma once | ||
|
||
#include <vector> | ||
#include "paddle/phi/common/place.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
place.h看上去不需要include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#pragma once | ||
|
||
#include <vector> | ||
#include "paddle/phi/common/place.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
infer_meta : | ||
func : IndexPutInferMeta | ||
kernel : | ||
func : index_put |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入x和indices的数据类型不同,需要指定按照谁的数据类型来选择kernel,关键字为data_type
,写法如后面紧跟的index_sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const int64_t* pd_indices[7]; | ||
for (size_t i = 0; i < indices_v.size(); ++i) { | ||
pd_indices[i] = indices_v[i]->data<int64_t>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L108 - L111既然后续还会用到,就挪到L98吧,删除L121 - L124的重复代码。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
value_grad->dtype(), | ||
false, | ||
&value_grad_dims_without1); | ||
phi::ReshapeInferKernel<Context>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但你ReshapeInferKernel
的调用,不也会修改value_grad
的shape吗?我的意思是,在L190再调用一次value_grad->Resize
,直接再次设置value_grad
的shape,也可避免ReshapeInferKernel
中的一次memcpy。
T* out = dev_ctx.template Alloc<T>(p_res); | ||
range_kernel<T>(N, out); | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
并不是说CPU、GPU Kernel里面重复,而是前向、反向中也有重复。通过模板+宏、或者设置不同的函数名来解决。
const int64_t** indices, | ||
const phi::DDim& stride, | ||
const phi::DDim& shape, | ||
int64_t isSingleValTensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSingleValTensor -> is_single_val_tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
DenseTensor* out) { | ||
auto* x_data = x.data<T>(); | ||
auto* val_data = value.data<T>(); | ||
bool isInitialized = out->initialized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isInitialized -> is_initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#include "paddle/phi/kernels/reshape_kernel.h" | ||
#include "paddle/phi/kernels/split_kernel.h" | ||
|
||
namespace phi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加一层namespace funcs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
phi::DenseTensor res_tensor(tensor.dtype()); | ||
res_tensor.Resize(res_dim); | ||
ExpandKernel<T, Context>( | ||
dev_ctx, mid_tensor, IntArray(phi::vectorize(res_dim)), &res_tensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里调Reshape和Expand都会产生memcpy,实际上只需要获得相应的dims
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我开始也想过是不是可以直接resize就行,之前在
tmp_indices_v.emplace_back(DenseTensor(phi::DataType::INT64).Resize(phi::make_ddim({nonzero_indices.dims()[0],1})));
替换为
tmp_indices_v.emplace_back(DenseTensor(phi::DataType::INT64).Resize(phi::make_ddim({nonzero_indices.dims()[0]})));
的时候我尝试过是否通过resize可以减少一些reshape操作
但是在这里我受限的点在于我需要一个能够满足expand关系的src tensor和一个des tensor来操作,但是我并不能修改tensor这个对象,因为它是一个const reference,所以这里两次的拷贝,可能是一个必要的
int64_t** indices, | ||
phi::Array<int64_t, Rank> stride, | ||
phi::Array<int64_t, Rank> shape, | ||
int64_t isSingleValTensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSingleValTensor -> is_single_val_tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
T* out = dev_ctx.template Alloc<T>(p_res); \ | ||
range_kernel<T>(N, out); \ | ||
return res; \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
真是没想到你会把整个函数全部写成一个宏。一个建议的最简单的修改方式如下:
template <typename T>
void range_kernel(int64_t N, T* out) {
...
}
template <typename T, typename Context>
phi::DenseTensor GetRangeTensor(const Context& dev_ctx, int64_t N, phi::DataType dtype) {
...
}
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
template <typename T>
__global__ void range_cuda_kernel(int64_t N, T* out) {
...
}
template <typename T, typename Context>
phi::DenseTensor GetRangeCudaTensor(
const Context& dev_ctx, int64_t N, phi::DataType dtype) {
...
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
python/paddle/tensor/manipulation.py
Outdated
|
||
Args: | ||
x (Tensor) : The Source Tensor. Supported data types are int32, int64, float16, float32, float64, bool. | ||
indices (Tensor): The tuple of Tensor containing the indices to index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该是List / tuple of Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是tuple of tensor,对齐的torch对应的API的用法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
… clear_add_index_put_api
python/paddle/tensor/manipulation.py
Outdated
indices (Tuple of Tensor): The tuple of Tensor containing the indices to index. | ||
The data type of ``tensor in indices`` must be int32, int64 or bool | ||
value (Tensor): The tensor used to be assigned to x. | ||
accummulate (Bool): Whether the elements in values are added to x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有默认值的参数需要注明optional,以及default是什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
def index_put(x, indices, value, accumulate=False, name=None): | ||
""" | ||
Outplace version of ``index_put_`` API, the output Tensor will be inplaced with input ``x``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一般是会说 index_put_ 是 index_put 的 inplace 版本,能否反过来说辛苦文档pm确认下 @sunzhongkai588
|
||
Returns: | ||
Tensor, same dimention and dtype with x. | ||
Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要确认一下官网预览效果,等 PR-CI-Paddle-Doc-Preview 跑完
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for set_tests_properties(test_index_put_op PROPERTIES TIMEOUT 120)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for set_tests_properties(test_index_put_op PROPERTIES TIMEOUT 120)
PR types
New features
PR changes
APIs
Description
This PR add index_put and index_put_ API for Paddle, please refer to PaddlePaddle API doc for details.
(Supplementary Note: Due to some indexing mechanism problems of the Paddle framework, the performance of Paddle's ways to index is much slower than Torch, but the overall reconstruction is a process that takes time, so some advanced indexing with poor performance is firstly extracted for optimization and will expose them in the type of paddle API which are index_put and index_put_ API for users.
Advanced Indexing means using tensor as subscript to index a tensor. Under the functions supported by index_put API, its performance is far better than directly c-order indexing in paddle)