Refine MKLDNNLayer and MKLDNNTester #4011

tensor-tang · 2017-09-11T08:37:28Z

The addition for #4008

remove copyOutputInfoToOtherDevice
move forward and backward to MKLDNNLayer
add reshapeInput() and reshapeOutput() in MKLDNNLayer and use them at child layer.
add UpdateCallback for MKLDNNTester , since the weight not updated before.

…and remove copyOutputInfoToOtherDevice

luotao1 · 2017-09-11T09:25:15Z

paddle/gserver/layers/MKLDNNFcLayer.cpp

-    if (biases_ && biases_->getWGrad()) {
-      biases_->getParameterPtr()->incUpdate(callback);
-    }
+void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) {


updateWeights的实现，每个MKLDNNXXXLayer都一样吧，那没必要每个子类重新实现一遍了。

这里考虑到的是，有的MKLDNNLayer不一定会有weight，比如pooling

luotao1 · 2017-09-11T09:27:08Z

paddle/gserver/layers/MKLDNNFcLayer.cpp

-    REGISTER_TIMER_INFO("FwActTimer", getName().c_str());
-    forwardActivation();
-  }
+  inVal_->setData(getInputValue(0, CPU_DEVICE)->getData());


inVal_ 在MKLDNNLayer.h中定义了，那227行也可以挪到父类中实现。

不好挪到父类，不同的子类不一定都是用这个方法，交由每个子类自己去负责，只是恰好这里fc是用这个变量，比如conv会用不一样的变量。
所以父类只给了一个空的实现，也考虑到，未来可能有的layer就不需要override这个函数，这点由子类自己去负责。

luotao1 · 2017-09-11T09:31:00Z

paddle/gserver/layers/MKLDNNLayer.h

+    {
+      REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
+      copySeqInfoToOutputs();
+      CHECK(!inputLayers_.empty());


114行应该在115行的check后再做。不然如果inputLayer为空，114行就直接出core了。

原来的考虑是，114行的函数，是可以允许inputLayer_为空的，就直接return了，是不会core dump的，只是115行后面的内容不希望他为空，所以才是这个顺序。

不过，其实整体上，forward这个函数都是希望input不要为空的，放到前面也可以的。

我看到在copySeqInfoToOutputs函数里面，有对inputLayer_是否为空的判断。既然整体上，forward函数不能有input为空，那么115行要放到114行前面，然后copySeqInfoToOutputs里对inputLayer_空的判断可以去掉，只留下needSequenceInfo_的判断。

可以把CHECK放到前面，但是并不推荐把函数里面的判断去掉~因为这个是函数自己的保证。

luotao1 · 2017-09-11T09:42:09Z

paddle/gserver/layers/MKLDNNLayer.h

+      CHECK(!inputLayers_.empty());
+      size_t elemenCnt = inputLayers_[0]->getOutput().value->getElementCnt();
+      if (inputElemenCnt_ != elemenCnt) {
+        inputElemenCnt_ = elemenCnt;


这段不是很理解，确切说是inputElemenCnt_这个参数不理解。为什么需要用value的矩阵个数来控制119-122行呢？
如果为了做reshape, resetFwd 和convertWeightsFromPaddle，是因为上一层不是MKLDNN的layer？那可以换成其他参数么？

如果为了做reshape, resetFwd 和convertWeightsFromPaddle，是因为上一层不是MKLDNN的layer？

这里的考虑不是因为上一层是不是MKLDNN的关系，而是上一层的输出大小发生变化才需要做。

在改之前，原先是只考虑了batchsize变化才做，现在换为输入的大小，包含了batchsize的概念，考虑到conv允许image size变化，他并不会体现在batchsize的变化，所以整体上，只需要判断输入大小inputElemenCnt_的变化。

请将这段comment写入注释。

luotao1 · 2017-09-11T09:44:58Z

paddle/gserver/layers/MKLDNNLayer.h

+   * Update input value data when input layer is "data" type.
+   * Since the input value data address might be changed.
+   */
+  virtual void updateInputData() {}


reshape， resetFwd， resetBwd和updateInputData。最好带有参数：

这样不用看具体实现也能知道这个函数的输入是什么，又修改了什么。现在全是空，代码不容易看。同时，在注释中也能说明各个参数的含义。

因为子类会实现resetFwd和resetBwd等，如果父类中把接口给定好了，以后写子类也更加规范。

这一点我也同意，只不过如果原先考虑到，如果要加入参数的话，要加的参数个数比较多，并且这些参数都是父类定义过，所以觉得写在参数就有些多余了，比如reshape(int &bs, int &ih, int &iw, int &oh, int &ow)。

不过也好，可以加入一些必要的且能说明意义的参数，我待会改了你再review看看。

luotao1 · 2017-09-11T09:48:42Z

paddle/gserver/tests/MKLDNNTester.cpp

+  CHECK(dnnLayer_);
+  // for comparison with Paddle reference results,
+  // need manually add cpu device output for test
+  dnnLayer_->addOutputArgument(-1);


-1 用CPU_DEVICE代替

luotao1 · 2017-09-11T09:48:52Z

paddle/gserver/tests/MKLDNNTester.cpp

@@ -109,20 +113,21 @@ void MKLDNNTester::randomBotDatas() {

 void MKLDNNTester::randomTopDiffs() {
  refLayer_->getOutputGrad()->randomizeUniform();
-  dnnLayer_->getOutputGrad()->copyFrom(*(refLayer_->getOutputGrad()));
-  VLOG(lvl_) << "Random dom Backward Input, TopDiff: ";
+  dnnLayer_->getOutput(-1).grad->copyFrom(*(refLayer_->getOutputGrad()));


-1 用CPU_DEVICE代替

这个需要include layer.h，这里没有定义的

那include进来，-1写死不利于后续维护。

luotao1 · 2017-09-11T09:49:01Z

paddle/gserver/tests/MKLDNNTester.cpp

  VLOG(MKLDNN_ALL) << "Check Forward";
+  printTopDatas();
+  double delta = compareMatrix(dnnLayer_->getOutput(-1).value,


-1 用CPU_DEVICE代替

luotao1 · 2017-09-11T09:51:52Z

paddle/gserver/tests/MKLDNNTester.cpp

-      const VectorPtr& grad = parameters_[n][i]->getBuf(PARAMETER_GRADIENT);
-      if (grad) {
-        grad->zeroMem();
+    if (id == n || id == parameters_.size()) {


不是很理解，为什么要加198行，即clearWgtDiffs(id)。原来全部clear有什么问题么

哦，不是原来的问题。
这里是给一个参数选择，把函数接口作统一了，可以选择性的clear。

luotao1 · 2017-09-11T09:54:17Z

目前的实现，是假设MKLDNNXXXLayer都只有一个input layer的情况么？

tensor-tang

是的，目前暂时只考虑只有一个输入的情况。

tensor-tang · 2017-09-11T11:45:32Z

paddle/gserver/layers/MKLDNNFcLayer.cpp

-    REGISTER_TIMER_INFO("FwActTimer", getName().c_str());
-    forwardActivation();
-  }
+  inVal_->setData(getInputValue(0, CPU_DEVICE)->getData());


不好挪到父类，不同的子类不一定都是用这个方法，交由每个子类自己去负责，只是恰好这里fc是用这个变量，比如conv会用不一样的变量。
所以父类只给了一个空的实现，也考虑到，未来可能有的layer就不需要override这个函数，这点由子类自己去负责。

tensor-tang · 2017-09-11T11:47:47Z

paddle/gserver/layers/MKLDNNFcLayer.cpp

-    if (biases_ && biases_->getWGrad()) {
-      biases_->getParameterPtr()->incUpdate(callback);
-    }
+void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) {


这里考虑到的是，有的MKLDNNLayer不一定会有weight，比如pooling

tensor-tang · 2017-09-11T11:52:25Z

paddle/gserver/layers/MKLDNNLayer.h

+    {
+      REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
+      copySeqInfoToOutputs();
+      CHECK(!inputLayers_.empty());


原来的考虑是，114行的函数，是可以允许inputLayer_为空的，就直接return了，是不会core dump的，只是115行后面的内容不希望他为空，所以才是这个顺序。

不过，其实整体上，forward这个函数都是希望input不要为空的，放到前面也可以的。

tensor-tang · 2017-09-11T11:59:12Z

paddle/gserver/layers/MKLDNNLayer.h

+      CHECK(!inputLayers_.empty());
+      size_t elemenCnt = inputLayers_[0]->getOutput().value->getElementCnt();
+      if (inputElemenCnt_ != elemenCnt) {
+        inputElemenCnt_ = elemenCnt;


如果为了做reshape, resetFwd 和convertWeightsFromPaddle，是因为上一层不是MKLDNN的layer？

这里的考虑不是因为上一层是不是MKLDNN的关系，而是上一层的输出大小发生变化才需要做。

在改之前，原先是只考虑了batchsize变化才做，现在换为输入的大小，包含了batchsize的概念，考虑到conv允许image size变化，他并不会体现在batchsize的变化，所以整体上，只需要判断输入大小inputElemenCnt_的变化。

tensor-tang · 2017-09-11T12:08:27Z

paddle/gserver/layers/MKLDNNLayer.h

+   * Update input value data when input layer is "data" type.
+   * Since the input value data address might be changed.
+   */
+  virtual void updateInputData() {}


这一点我也同意，只不过如果原先考虑到，如果要加入参数的话，要加的参数个数比较多，并且这些参数都是父类定义过，所以觉得写在参数就有些多余了，比如reshape(int &bs, int &ih, int &iw, int &oh, int &ow)。

不过也好，可以加入一些必要的且能说明意义的参数，我待会改了你再review看看。

tensor-tang · 2017-09-11T12:53:48Z

paddle/gserver/tests/MKLDNNTester.cpp

@@ -109,20 +113,21 @@ void MKLDNNTester::randomBotDatas() {

 void MKLDNNTester::randomTopDiffs() {
  refLayer_->getOutputGrad()->randomizeUniform();
-  dnnLayer_->getOutputGrad()->copyFrom(*(refLayer_->getOutputGrad()));
-  VLOG(lvl_) << "Random dom Backward Input, TopDiff: ";
+  dnnLayer_->getOutput(-1).grad->copyFrom(*(refLayer_->getOutputGrad()));


这个需要include layer.h，这里没有定义的

tensor-tang · 2017-09-11T12:53:59Z

paddle/gserver/tests/MKLDNNTester.cpp

  VLOG(MKLDNN_ALL) << "Check Forward";
+  printTopDatas();
+  double delta = compareMatrix(dnnLayer_->getOutput(-1).value,


tensor-tang · 2017-09-11T12:57:04Z

paddle/gserver/tests/MKLDNNTester.cpp

-      const VectorPtr& grad = parameters_[n][i]->getBuf(PARAMETER_GRADIENT);
-      if (grad) {
-        grad->zeroMem();
+    if (id == n || id == parameters_.size()) {


哦，不是原来的问题。
这里是给一个参数选择，把函数接口作统一了，可以选择性的clear。

tensor-tang · 2017-09-11T14:57:36Z

Done

luotao1

LGTM

tensor-tang added 2 commits September 11, 2017 16:23

refine MKLDNNLayer logical: move forward and backward to MKLDNNLayer …

94ea8ee

…and remove copyOutputInfoToOtherDevice

refine MKLDNNTester add UpdateCallback for test

f3bb7b9

tensor-tang assigned luotao1 and unassigned luotao1 Sep 11, 2017

tensor-tang mentioned this pull request Sep 11, 2017

Refine MKLDNNMatrix and MKLDNNLayer #4008

Merged

luotao1 reviewed Sep 11, 2017

View reviewed changes

tensor-tang commented Sep 11, 2017

View reviewed changes

add some necessary params for reset functions

4f0869b

use CPU_DEVICE instead of magic number

7f7fa32

tensor-tang force-pushed the refine branch 2 times, most recently from e980a6d to 7f7fa32 Compare September 12, 2017 02:15

luotao1 approved these changes Sep 12, 2017

View reviewed changes

luotao1 merged commit 171fee2 into PaddlePaddle:develop Sep 12, 2017

tensor-tang deleted the refine branch September 12, 2017 02:57

Refine MKLDNNLayer and MKLDNNTester #4011

Refine MKLDNNLayer and MKLDNNTester #4011

Conversation

tensor-tang commented Sep 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Sep 11, 2017

tensor-tang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tensor-tang commented Sep 11, 2017

luotao1 left a comment

Choose a reason for hiding this comment