-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random op #3060
Random op #3060
Conversation
paddle/framework/operator.h
Outdated
@@ -88,7 +88,7 @@ class OperatorBase { | |||
|
|||
/// Net will call this function to Run an op. | |||
virtual void Run(const std::shared_ptr<Scope>& scope, | |||
const platform::DeviceContext& dev_ctx) const = 0; | |||
platform::DeviceContext& dev_ctx) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
由于cuda的randGenerator和stream绑定,stream只存在于CUDADeviceContext里,只能在Op设置种子,生成Generator,违反了Op Run的时候不能修改context。因此去掉了const 约束。
尝试过解决办法,全局一个static generator。random_seed可以只在创建CUDADeviceContext一次指定,但是不能创建一个全局的Generator,否则随机性有问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果seed固定的话,生成的序列也是一样的。
目前paddle/caffe2的随机数生成也都是提前设定好一个seed,后续不会改变。请教一下@hedaoyuan,不知道这样是否有问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU的每device环境的seed只需要设置一次就行了,但要保证各个device的seed是不一样的,否则每个device上获得的随机数就会是一样的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const DeviceContext是否必要? cublas_handle
等函数需要运行期间修改context
paddle/operators/random_op.h
Outdated
auto place = context.GetPlace(); | ||
if (platform::is_cpu_place(place)) { | ||
Gaussian( | ||
dynamic_cast<platform::CPUDeviceContext*>(context.device_context_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic_cast
效率可能比较低下,应该避免。
这里目前还需要dynamic_cast
的原因是KernalContext
里的成员变量:
const platform::DeviceContext& device_context_;
用父类DeviceContext
的引用去保存子类CPUDeviceContext
和CUDADeviceContext
。这导致子类具体的类型信息丢失,无法用来特化模板和重载函数。在这里只能用dynamic_cast
来手动显式转换。
所以,是否可以考虑在将上面的成员变量替换为:
using DeviceContext = boost::variant<CPUDeviceContext, CUDADeviceContext>;
const DeviceContext device_context_;
Guassian
函数对于CPU和GPU的接口统一为:
bool Gaussian(const DeviceContext& ctx,
T* output,
const int size,
const T& mean,
const T& std,
const T& seed) { ... }
在函数内部进行GPU和CPU实现上的switch。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place和DeviceContext的类型是可以一一对应的。可以在编译期根据Place的类型确定DeviceContext的类型。
在.h里面声明
template <typename Place, typename T>
class RandomOpKernel : public framework::OpKernel {
public:
void Compute(const framework::KernelContext& context) const override;
}
分别在.cc和.cu里面偏特化
template <typename T>
class RandomOpKernel<CPUPlace, T> : public framework::OpKernel {
void Compute() {
CPUDeviceContext* dc = static_cast<CPUDeviceContext*>(KernelContext.device_context_);
}
}
.cu
template <typename T>
class RandomOpKernel<GPUPlace, T> : public framework::OpKernel {
void Compute() {
CUDADeviceContext* dc = static_cast<CUDADeviceContext*>(KernelContext.device_context_);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments! The template specialization can fix it elegantly. fix Done.
paddle/operators/random_op.cc
Outdated
} | ||
}; | ||
|
||
class RandomOpMaker : public framework::OpProtoAndCheckerMaker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"random" is too general. There can be many different types of random. We should use a more specific name such as "GaussionRandom"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
static_cast<const platform::CPUDeviceContext*>(context.device_context_); | ||
// generator need to modify context | ||
auto g = const_cast<platform::CPUDeviceContext*>(ctx)->RandGenerator(); | ||
std::normal_distribution<T> distribution(mean, std); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only std normal distribution
in here. Do you plan to implement other distribution soon? like uniform distribution
.
std::uniform_real_distribution<T> uniform(0.0, 1.0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curand also support curandGenerateUniform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle/platform/device_context.h
Outdated
@@ -40,7 +41,10 @@ class DeviceContext { | |||
class CPUDeviceContext : public DeviceContext { | |||
public: | |||
typedef std::mt19937 random_generator_type; | |||
CPUDeviceContext() { eigen_device_.reset(new Eigen::DefaultDevice()); } | |||
CPUDeviceContext() { | |||
random_seed_ = std::chrono::system_clock::now().time_since_epoch().count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now we are not able to manually specify random seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted the seeding function in context.
it is the place that we need to modify the Execution context. currently, move the set seed to random op. We need a global configuration to unify the random seed in the future. Maybe we can fix it in next PR.
public: | ||
void Compute(const framework::ExecutionContext& context) const override { | ||
T mean = static_cast<T>(context.op_.GetAttr<T>("mean")); | ||
T std = static_cast<T>(context.op_.GetAttr<T>("std")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't "mean"
and "std"
always float
?
See https://github.com/PaddlePaddle/Paddle/pull/3060/files#diff-86aabb5beb3f333cf191423101025d1fR67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just unified style with the uniform random operator. Revert these things to old code.
} | ||
std::mt19937 g(seed); | ||
std::normal_distribution<T> distribution(mean, std); | ||
for (int i = 0; i < framework::product(tensor->dims()); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we accelerate the code by calculating framework::product()
before the loop? I'm not sure whether the compiler will do the optimization for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the compiler can't be optimized at all. tensor shape only can be referenced in runtime. Done.
public: | ||
void Compute(const framework::ExecutionContext& context) const override { | ||
T mean = static_cast<T>(context.op_.GetAttr<T>("mean")); | ||
T std = static_cast<T>(context.op_.GetAttr<T>("std")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the PR is not perfect, we have spent lots of time on it and it's becoming larger. So I prefer to merge it for now and refine the code in later PRs.
there is two part need to be discussed.
One is the
dynamc_cast
part of implementing polynomial. Another one is removing the const qualify key inOp.Run
interface.