Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random op #3060

Merged
merged 37 commits into from
Aug 9, 2017
Merged

Random op #3060

merged 37 commits into from
Aug 9, 2017

Conversation

dzhwinter
Copy link
Contributor

@dzhwinter dzhwinter commented Jul 25, 2017

there is two part need to be discussed.
One is the dynamc_cast part of implementing polynomial. Another one is removing the const qualify key in Op.Run interface.

@@ -88,7 +88,7 @@ class OperatorBase {

/// Net will call this function to Run an op.
virtual void Run(const std::shared_ptr<Scope>& scope,
const platform::DeviceContext& dev_ctx) const = 0;
platform::DeviceContext& dev_ctx) const = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于cuda的randGenerator和stream绑定,stream只存在于CUDADeviceContext里,只能在Op设置种子,生成Generator,违反了Op Run的时候不能修改context。因此去掉了const 约束。

尝试过解决办法,全局一个static generator。random_seed可以只在创建CUDADeviceContext一次指定,但是不能创建一个全局的Generator,否则随机性有问题。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果seed固定的话,生成的序列也是一样的。
目前paddle/caffe2的随机数生成也都是提前设定好一个seed,后续不会改变。请教一下@hedaoyuan,不知道这样是否有问题。

Copy link
Contributor

@hedaoyuan hedaoyuan Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU的每device环境的seed只需要设置一次就行了,但要保证各个device的seed是不一样的,否则每个device上获得的随机数就会是一样的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const DeviceContext是否必要? cublas_handle 等函数需要运行期间修改context

auto place = context.GetPlace();
if (platform::is_cpu_place(place)) {
Gaussian(
dynamic_cast<platform::CPUDeviceContext*>(context.device_context_),
Copy link
Collaborator

@JiayiFeng JiayiFeng Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic_cast效率可能比较低下,应该避免。
这里目前还需要dynamic_cast的原因是KernalContext里的成员变量:

const platform::DeviceContext& device_context_;

用父类DeviceContext的引用去保存子类CPUDeviceContextCUDADeviceContext。这导致子类具体的类型信息丢失,无法用来特化模板和重载函数。在这里只能用dynamic_cast来手动显式转换。

所以,是否可以考虑在将上面的成员变量替换为:

using DeviceContext = boost::variant<CPUDeviceContext, CUDADeviceContext>;
const DeviceContext device_context_;

Guassian函数对于CPU和GPU的接口统一为:

bool Gaussian(const DeviceContext& ctx,
                         T* output,
                         const int size,
                         const T& mean,
                         const T& std,
                         const T& seed) { ... }

在函数内部进行GPU和CPU实现上的switch。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place和DeviceContext的类型是可以一一对应的。可以在编译期根据Place的类型确定DeviceContext的类型。
在.h里面声明

template <typename Place, typename T>
class RandomOpKernel : public framework::OpKernel {
public:
  void Compute(const framework::KernelContext& context) const override;  
}

分别在.cc和.cu里面偏特化

template <typename T>
class RandomOpKernel<CPUPlace, T> : public framework::OpKernel {
  void Compute() {
    CPUDeviceContext* dc = static_cast<CPUDeviceContext*>(KernelContext.device_context_);
  }
}

.cu

template <typename T>
class RandomOpKernel<GPUPlace, T> : public framework::OpKernel {
  void Compute() {
    CUDADeviceContext* dc = static_cast<CUDADeviceContext*>(KernelContext.device_context_);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments! The template specialization can fix it elegantly. fix Done.

}
};

class RandomOpMaker : public framework::OpProtoAndCheckerMaker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"random" is too general. There can be many different types of random. We should use a more specific name such as "GaussionRandom"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

static_cast<const platform::CPUDeviceContext*>(context.device_context_);
// generator need to modify context
auto g = const_cast<platform::CPUDeviceContext*>(ctx)->RandGenerator();
std::normal_distribution<T> distribution(mean, std);
Copy link
Contributor

@gangliao gangliao Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only std normal distribution in here. Do you plan to implement other distribution soon? like uniform distribution.

std::uniform_real_distribution<T> uniform(0.0, 1.0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curand also support curandGenerateUniform

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyoung implement the uniform random generator in thrust.
see #3293 for detail.

@@ -40,7 +41,10 @@ class DeviceContext {
class CPUDeviceContext : public DeviceContext {
public:
typedef std::mt19937 random_generator_type;
CPUDeviceContext() { eigen_device_.reset(new Eigen::DefaultDevice()); }
CPUDeviceContext() {
random_seed_ = std::chrono::system_clock::now().time_since_epoch().count();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now we are not able to manually specify random seed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted the seeding function in context.
it is the place that we need to modify the Execution context. currently, move the set seed to random op. We need a global configuration to unify the random seed in the future. Maybe we can fix it in next PR.

public:
void Compute(const framework::ExecutionContext& context) const override {
T mean = static_cast<T>(context.op_.GetAttr<T>("mean"));
T std = static_cast<T>(context.op_.GetAttr<T>("std"));
Copy link
Collaborator

@JiayiFeng JiayiFeng Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just unified style with the uniform random operator. Revert these things to old code.

}
std::mt19937 g(seed);
std::normal_distribution<T> distribution(mean, std);
for (int i = 0; i < framework::product(tensor->dims()); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we accelerate the code by calculating framework::product() before the loop? I'm not sure whether the compiler will do the optimization for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler can't be optimized at all. tensor shape only can be referenced in runtime. Done.

public:
void Compute(const framework::ExecutionContext& context) const override {
T mean = static_cast<T>(context.op_.GetAttr<T>("mean"));
T std = static_cast<T>(context.op_.GetAttr<T>("std"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@JiayiFeng JiayiFeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the PR is not perfect, we have spent lots of time on it and it's becoming larger. So I prefer to merge it for now and refine the code in later PRs.

@dzhwinter dzhwinter merged commit 56faf51 into PaddlePaddle:develop Aug 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants