-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add MKLDNN_DEVICE #3712
add MKLDNN_DEVICE #3712
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR还是过长,在11个commit中,以下3个commit和 MKLDNN_DEVICE关系不大,可以作为独立的PR发(这次就维持现状):
paddle/gserver/layers/MKLDNNLayer.h
Outdated
if (useGpu_ == true) { | ||
LOG(WARNING) << "Do not support GPU yet, will change to useGpu = false"; | ||
useGpu_ = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
89-92行:useGpu_ = true时直接报错,不用转成false。
CHECK(!useGpu_);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
paddle/gserver/layers/MKLDNNLayer.h
Outdated
|
||
/** | ||
* Is previous layer MKLDNN type. | ||
* Otherwise, only support otherdevice CPU device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, only support the previous layer using CPU device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks,done
} | ||
real* iData = getInputValue(0, CPU_DEVICE)->getData(); | ||
// update input data | ||
// since it might be changed if this is after data layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 上一个PR的时候,这里注释没仔细看:"it might be changed if this is after data layer"表达的是什么意思?input_data可能会变?
updateData
不需要封装吧:直接用set_data_handle
不行么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上一个PR的时候,这里注释没仔细看:"it might be changed if this is after data layer"表达的是什么意思?input_data可能会变?
这是因为,data layer的output的buffer是一直在变的,就是指针会变,所以需要更新data。这个我是很早以前发现的。如果上一个layer的output buffer地址稳定不变的话,这里就不用加了。
updateData不需要封装吧:直接用set_data_handle不行么
我认为还是封装一下比较好吧,主要是为了函数名字风格统一。
/** | ||
* Set deviceId of the params used in this layer. | ||
*/ | ||
void setParamsDevice(int id, const ParameterMap& parameterMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数和layer::init
非常类似,其中
CHECK_EQ(parameter->getDeviceId(), getDeviceId());
parameter是带有deviceId信息的,那setDevice
后,还需要setParamsDevice
么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里就是因为layer::init
会CHECK_EQ(parameter->getDeviceId(), getDeviceId());
,所以需要param提前设置好。
setDevice
其实具体是指setLayerDevice
,但是由于已经在layer中了,所以没有使用这个名字
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不是很理解,paddle也有cpu+gpu混合使用的配置,也不需要提前setParamsDevice。为什么这里需要呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
首先是这样
if (useGpu_ && FLAGS_parallel_nn) {
/* gpu environment is specified by device property */
deviceId_ = config_.device();
if (deviceId_ < 0) {
useGpu_ = false;
}
}
所以,一般是用gpu且FLAGS_parallel_nn
的时候才会设置deviceId, 所以普遍情况下都是-1的。
并且这里实际上MKLDNN的param用的也就是CPU的param,所以也没有必要专门在Python端口设置为-2,直接在MKLDNNLayer里面统一设置,为了避免Layer::init
的检查出错。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
也就是说,使用mkldnn的时候,只要设置paddle.init(use_mkldnn=True)
即可,不需要设置FLAGS_parallel_nn
和手动给每个python layer设置deviceId了。
paddle/math/MKLDNNMatrix.cpp
Outdated
if (m == nullptr) { | ||
size_t height = dims[0]; | ||
size_t width = cnts / dims[0]; | ||
// LOG(INFO) << height << "," << width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
34行去掉。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx,done
<< "Only support other device is CPU yet"; | ||
} | ||
return outputOtherDevice_.size() == 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextIsMKLDNN函数应该放在prevIsMKLDNN后面写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
|
||
// fc cpu output value do not need convert | ||
cpuOutput.value = output_.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- paddle中是在
Layer::copyOutputToOtherDevice
里面自动会做这一部分转换,所以这里为什么还要单独再写一遍呢?这里写的话,很多信息也不全,比如SequenceStartPosition这些信息。 - 需要nextIsMKLDNN么?layer间不同deviceId的转换,是在layer基类中完成的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layer::copyOutputToOtherDevice里面自动会做这一部分转换
Layer只是提供了一个接口,但是并不会自动调用,只用于ParallelNeuralNetwork.cpp,另外在dataLayer.h有重写。
并且图像的高宽信息也没有设置,还有很重要的一点这里我是赋值的指针(是因为fc的输出不需要reorder,不然是需要加mkldnn的reorder的,比如conv的这里是需要加额外的东西),不是copyFrom
,与父类的不同。
由于现在mkldnn没考虑sequence的东西,所以没有copy这个信息,不过我确实可以补上,以后也可以用得到。
所以我在MKLDNNLayer里面加一个copyOutputInfoToOtherDevice
的接口,负责copy data以外的信息;
一个convertOutputToOtherDevice
去实际创建convert信息,FC这里就是share 指针。
done
hasBias ? MKLDNNMatrix::create(bias, {oc_}, format::x, engine_) : nullptr; | ||
outVal_ = MKLDNNMatrix::create(out, {bs_, oc_}, format::nc, engine_); | ||
|
||
// change original output value to mkldnn output value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original output value?指的是什么格式?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的original output
指的是修改原先的output为一个可以case为MKLDNNMatrixPtr
的指针,这样下一个如果是MKLDNN layer,是可以直接cast,并且拿到需要的信息。
const MatrixPtr& out = getOutput(CPU_DEVICE).grad; | ||
// fc do not need to convert from cpu device since output always nc | ||
// only need create from cpu device | ||
outGrad_ = MKLDNNMatrix::create(out, outVal_->getPD()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- merge top diffs是在
layer::waitAndMergeOutputGrad
中做的,这里为什么要单独写一下。 - top diffs的叫法是caffe的,注释改一下,下同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge top diffs是在 layer::waitAndMergeOutputGrad中做的,这里为什么要单独写一下。
父类的是waitAndMergeOutputGrad
是专门用于ParallelNeuralNetwork的情况。不能通过,并且我这里的不是wait的,父类的是结合线程使用的。
并且我这里写的是以后需要用mkldnn::sum实现,应该不可以直接用父类的那个函数。
top diffs的叫法是caffe的,注释改一下,下同
done
// TODO(TJ): use outputMaps_ ways when merge topdiff done | ||
} else { | ||
inGrad_ = MKLDNNMatrix::create(in, inVal_->getPD()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
230-249可以合并两个分支:
bool device = prevIsMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE;
const MatrixPtr& in = getInputGrad(0, device);
if (in == nullptr) return;
if (getInput(0, device).getAllCount() > 1) {
// TODO(TJ): use outputMaps_ ways when merge topdiff done
} else {
inGrad_ = MKLDNNMatrix::create(in, inVal_->getPD());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, 但是 device不能是bool
,应该是int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. done
hasBias ? MKLDNNMatrix::create(bias, {oc_}, format::x, engine_) : nullptr; | ||
outVal_ = MKLDNNMatrix::create(out, {bs_, oc_}, format::nc, engine_); | ||
|
||
// change original output value to mkldnn output value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的original output
指的是修改原先的output为一个可以case为MKLDNNMatrixPtr
的指针,这样下一个如果是MKLDNN layer,是可以直接cast,并且拿到需要的信息。
const MatrixPtr& out = getOutput(CPU_DEVICE).grad; | ||
// fc do not need to convert from cpu device since output always nc | ||
// only need create from cpu device | ||
outGrad_ = MKLDNNMatrix::create(out, outVal_->getPD()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge top diffs是在 layer::waitAndMergeOutputGrad中做的,这里为什么要单独写一下。
父类的是waitAndMergeOutputGrad
是专门用于ParallelNeuralNetwork的情况。不能通过,并且我这里的不是wait的,父类的是结合线程使用的。
并且我这里写的是以后需要用mkldnn::sum实现,应该不可以直接用父类的那个函数。
top diffs的叫法是caffe的,注释改一下,下同
done
// TODO(TJ): use outputMaps_ ways when merge topdiff done | ||
} else { | ||
inGrad_ = MKLDNNMatrix::create(in, inVal_->getPD()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, 但是 device不能是bool
,应该是int
paddle/gserver/layers/MKLDNNLayer.h
Outdated
if (useGpu_ == true) { | ||
LOG(WARNING) << "Do not support GPU yet, will change to useGpu = false"; | ||
useGpu_ = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
paddle/gserver/layers/MKLDNNLayer.h
Outdated
protected: | ||
/** | ||
* If next layer only has MKLDNN type. | ||
* Otherwise, only support otherdevice CPU device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原本是希望名字可以与prevIsMKLDNN
对应,并不是想使用next layer的信息,注释是需要改改。
但是名字的话体现出only也挺好的,不过要改的话还是觉得prevIsMKLDNN
的也一起改了比较好。
改为:prevIsOnlyMKLDNN
和 nextIsOnlyMKLDNN
/** | ||
* Update the memory data handle. | ||
* Caution: This will not check the buffer size of the data, | ||
* it should be coverd by user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个同上。
paddle/math/MKLDNNMatrix.h
Outdated
/** | ||
* Get primitive descriptor. | ||
*/ | ||
mkldnn::memory::primitive_desc getPD() { return this->get_primitive_desc(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我原本的考虑是:
- 这个
PD
和MD
在mkldnn里面用的是比较多的,已经类似一个mkldnn里面比较通用的词汇了。另外函数前面也会有对应的注释。 - 如果改详细的名字之后,会有很多函数那一行会变得很长,会有好几个换行,感觉也不好看。
不过还是改掉算了吧。
done
paddle/math/MKLDNNMatrix.h
Outdated
/** | ||
* Get memory descriptor. | ||
*/ | ||
mkldnn::memory::desc getMD() { return getPD().desc(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/math/MKLDNNMatrix.h
Outdated
|
||
protected: | ||
/** | ||
* Do once reorder supported inplace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// fc cpu output value do not need convert | ||
cpuOutput.value = output_.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layer::copyOutputToOtherDevice里面自动会做这一部分转换
Layer只是提供了一个接口,但是并不会自动调用,只用于ParallelNeuralNetwork.cpp,另外在dataLayer.h有重写。
并且图像的高宽信息也没有设置,还有很重要的一点这里我是赋值的指针(是因为fc的输出不需要reorder,不然是需要加mkldnn的reorder的,比如conv的这里是需要加额外的东西),不是copyFrom
,与父类的不同。
由于现在mkldnn没考虑sequence的东西,所以没有copy这个信息,不过我确实可以补上,以后也可以用得到。
所以我在MKLDNNLayer里面加一个copyOutputInfoToOtherDevice
的接口,负责copy data以外的信息;
一个convertOutputToOtherDevice
去实际创建convert信息,FC这里就是share 指针。
done
const MatrixPtr& bias = hasBias ? biases_->getW() : nullptr; | ||
const MatrixPtr& out = output_.value; | ||
|
||
if (prevIsOnlyMKLDNN()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
名字改成inputIsOnlyMKLDNN和outputIsOnlyMKLDNN,如何?
- 对layer来说,没有next信息
- prev和next对应,所以都改成input/output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以~ done.
const MatrixPtr& out = getOutput(CPU_DEVICE).grad; | ||
// fc do not need to convert from cpu device since output always nc | ||
// only need create from cpu device | ||
outGrad_ = MKLDNNMatrix::create(out, outVal_->getPrimitiveDesc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
197-207两分支也可以合并成241行的形式
int device = nextIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE;
const MatrixPtr& out = getOutput(device).grad;
outGrad_ = MKLDNNMatrix::create(out, outVal_->getPrimitiveDesc());
另外,205行注释末尾nc是什么意思?是nc format么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks。
是的,就是格式, 我写详细点。
paddle/gserver/layers/MKLDNNLayer.h
Outdated
/** | ||
* copy image size and sequence info to other device | ||
*/ | ||
void copyOutputInfoToOtherDevice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释这里最好再加一个 @note,说一下为什么不用Layer::copyOutputToOtherDevice
。我理解的是value不能直接copy过去,因为格式不一样。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,这个是copy的基本info,不包含data,与Layer::copyOutputToOtherDevice
还不一样。
不过也可以加一个note。
/** | ||
* Set deviceId of the params used in this layer. | ||
*/ | ||
void setParamsDevice(int id, const ParameterMap& parameterMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不是很理解,paddle也有cpu+gpu混合使用的配置,也不需要提前setParamsDevice。为什么这里需要呢?
// fc cpu output value do not need convert | ||
// just share point | ||
outputOtherDevice_[i].value = output_.value; | ||
++cnt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对mkldnn layer来说,要转value的地方多么?如果大部分layer都是
outputOtherDevice_[i].value = output_.value;
那放进基类函数即可。需要转的layer再单独写一下,会比较清爽。
不能超过1个CPU设备的检查也应该放进基类函数中吧。而且为什么不能超过1个呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是大部分都是outputOtherDevice_[i].value = output_.value;
,另外的layer是需要别的操作,这里fc因为一直是nc格式的输出,与paddle的cpu device格式一样,所以直接可以share。
不过后面layer多一点之后,可以再整理一遍的。
不超过一个CPU device是理论上我认为不应该会出现多个,担心目前考虑的不周全,比如RNN的case会不会有影响,所以给一个warning。如果就算是多个,每个还是用的share。这一点也是特定在FClayer的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果要做检查,也应该放在mkldnnLayer的convertOutputToOtherDevice里做,可以在下一个PR中修改。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. done
// fc cpu output value do not need convert | ||
// just share point | ||
outputOtherDevice_[i].value = output_.value; | ||
++cnt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是大部分都是outputOtherDevice_[i].value = output_.value;
,另外的layer是需要别的操作,这里fc因为一直是nc格式的输出,与paddle的cpu device格式一样,所以直接可以share。
不过后面layer多一点之后,可以再整理一遍的。
不超过一个CPU device是理论上我认为不应该会出现多个,担心目前考虑的不周全,比如RNN的case会不会有影响,所以给一个warning。如果就算是多个,每个还是用的share。这一点也是特定在FClayer的。
const MatrixPtr& bias = hasBias ? biases_->getW() : nullptr; | ||
const MatrixPtr& out = output_.value; | ||
|
||
if (prevIsOnlyMKLDNN()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以~ done.
const MatrixPtr& out = getOutput(CPU_DEVICE).grad; | ||
// fc do not need to convert from cpu device since output always nc | ||
// only need create from cpu device | ||
outGrad_ = MKLDNNMatrix::create(out, outVal_->getPrimitiveDesc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks。
是的,就是格式, 我写详细点。
paddle/gserver/layers/MKLDNNLayer.h
Outdated
/** | ||
* copy image size and sequence info to other device | ||
*/ | ||
void copyOutputInfoToOtherDevice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,这个是copy的基本info,不包含data,与Layer::copyOutputToOtherDevice
还不一样。
不过也可以加一个note。
/** | ||
* Set deviceId of the params used in this layer. | ||
*/ | ||
void setParamsDevice(int id, const ParameterMap& parameterMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
首先是这样
if (useGpu_ && FLAGS_parallel_nn) {
/* gpu environment is specified by device property */
deviceId_ = config_.device();
if (deviceId_ < 0) {
useGpu_ = false;
}
}
所以,一般是用gpu且FLAGS_parallel_nn
的时候才会设置deviceId, 所以普遍情况下都是-1的。
并且这里实际上MKLDNN的param用的也就是CPU的param,所以也没有必要专门在Python端口设置为-2,直接在MKLDNNLayer里面统一设置,为了避免Layer::init
的检查出错。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 最后的两个commit名字略简单,refine和rename,下次可以写详细点。
/** | ||
* Set deviceId of the params used in this layer. | ||
*/ | ||
void setParamsDevice(int id, const ParameterMap& parameterMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
也就是说,使用mkldnn的时候,只要设置paddle.init(use_mkldnn=True)
即可,不需要设置FLAGS_parallel_nn
和手动给每个python layer设置deviceId了。
// fc cpu output value do not need convert | ||
// just share point | ||
outputOtherDevice_[i].value = output_.value; | ||
++cnt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果要做检查,也应该放在mkldnnLayer的convertOutputToOtherDevice里做,可以在下一个PR中修改。
add
MKLDNN_DEVICE
andMKLDNNMatrix
to handle internalmkldnn::memory