[Pten] Refactor the implementation of custom operator #37122

chenwhql · 2021-11-11T06:29:37Z

PR types

Function optimization

PR changes

OPs

Describe

[Pten] Refactor the implementation of custom operator

Paddle Tensor计算库（以下简称pten）需要为自定义算子提供更多的C++运算类API，从而使自定义算子的开发成本进一步降低，因此需要将自定义算子与pten实现打通，本PR改动如下：

将原自定义算子的Tensor实现整合至pten API Tensor中，相关方法替换为计算库的实现，移除原自定义算子Tensor，减少外部Tensor概念，降低长期维护成本，后续自定义算子、动态图、Python&C++ API将共用一个pten API Tensor；
将原自定义算子C++ API整合至计算库，移除原DataType API，复用pten的DataType；
更新原custom_operator.cc中对Tensor转换的适配逻辑，接入pten；
调整对外暴露的头文件，以pten api为主要对外接口；
整理相应的单元测试，以适配pten实现；
出于兼容考虑暂时保留原自定义算子的部分数据结构及方法，比如PlaceType以及使用PlaceType的方法，后续逐渐Deprecated。

本次自定义算子与pten整合后，自定义算子开发便利性会得到有效增强，复杂Kernel开发代码量显著减少，以开发一个linear算子的前向Kernel为例：

1. 原先实现方式（手写基础实现逻辑，较复杂）

说明： 此处仅为paddle内部matmul前向kernel的主体实现代码，不包含linear的add运算，add运算内部实现还有上千行代码（#37034），代码量太多，这里不贴了。此处示例代码未编译验证，仅用于对比代码量变化。并且，在外部自定义算子中写出高效的matmul和elementwise_add是很困难的，原因包括但不限于：

一方面，这里调用了paddle内部对于eigen和blas等第三方库方法的封装，实际代码远超这里列出的量
另一方面，目前在外部自定义算子中，不能使用我们内部封装的eigen及blas等方法，即使用户自己在外部实现一遍，由于全局内存和显存管理不统一，性能也会受影响

std::vector<paddle::Tensor> CustomLinearForward(const paddle::Tensor& x,
                                              const paddle::Tensor& weight,
                                              const paddle::Tensor& bias) {
  PADDLE_ENFORCE_NE(paddle::framework::product(x.dims()),
                    0,
                    paddle::platform::errors::InvalidArgument(
                        "The Input(X) dims size must not be equal 0,"
                        " but reviced dims size is 0. "));
  PADDLE_ENFORCE_NE(paddle::framework::product(y.dims()),
                    0,
                    paddle::platform::errors::InvalidArgument(
                        "The Input(Y) dims size must not be equal 0,"
                        " but reviced dims size is 0. "));
  const std::vector<std::int64_t> x_dims = vectorize(X.dims());
  const std::vector<std::int64_t> y_dims = vectorize(Y.dims());

  const int x_ndim = x_dims.size();
  const int y_ndim = y_dims.size();

  // Get data ptr
  const T* x_data = X.data<T>();
  const T* y_data = Y.data<T>();

  if (x_ndim == 1 && y_ndim == 1) {
    PADDLE_ENFORCE_EQ(
        X.numel(),
        Y.numel(),
        paddle::platform::errors::InvalidArgument(
            "X's numbers must be equal to Y's numbers,"
            "when X/Y's dims =1. But received X has [%d] elements,"
            "received Y has [%d] elements",
            X.numel(),
            Y.numel()));
    VLOG(3) << "MatMul's case 1";
    Out->Resize({1});
    Out->mutable_data<T>();
    auto out_eigen = EigenScalar<T>::From(*Out);
    auto x_eigen = EigenVector<T>::Flatten(X);
    auto y_eigen = EigenVector<T>::Flatten(Y);

    auto& dev = *dev_ctx.eigen_device();
    if (flag) {
      out_eigen.device(dev) = (x_eigen * y_eigen).sum() + out_eigen;
    } else {
      out_eigen.device(dev) = (x_eigen * y_eigen).sum();
    }
    return;
  }

  auto blas = paddle::operators::math::GetBlas<DeviceContext, T>(dev_ctx);

  if (x_ndim == 1) {
    const int N = X.numel();
    if (trans_y) {
      PADDLE_ENFORCE_EQ(y_dims[y_ndim - 1],
                        N,
                        paddle::platform::errors::InvalidArgument(
                            "Input(Y) has error dim."
                            "Y'dims[%d] must be equal to %d"
                            "But received Y'dims[%d] is %d",
                            y_ndim - 1,
                            N,
                            y_ndim - 1,
                            y_dims[y_ndim - 1]));
    } else {
      PADDLE_ENFORCE_EQ(y_dims[y_ndim - 2],
                        N,
                        paddle::platform::errors::InvalidArgument(
                            "Input(Y) has error dim."
                            "Y'dims[%d] must be equal to %d"
                            "But received Y'dims[%d] is %d",
                            y_ndim - 2,
                            N,
                            y_ndim - 2,
                            y_dims[y_ndim - 2]));
    }
    std::vector<std::int64_t> out_dims(y_ndim - 1);
    if (trans_y) {
      std::copy_n(y_dims.cbegin(), y_ndim - 1, out_dims.begin());
    } else {
      std::copy_n(y_dims.cbegin(), y_ndim - 2, out_dims.begin());
      out_dims.back() = y_dims.back();
    }
    Out->Resize(paddle::framework::make_ddim(out_dims));
    Out->mutable_data<T>();
    if (trans_y) {
      const int M = Y.numel() / N;
      VLOG(3) << "MatMul's case 2";
      blas.GEMV(false,
                M,
                N,
                static_cast<T>(1),
                y_data,
                x_data,
                static_cast<T>(flag),
                Out->mutable_data<T>());
    } else {
      const int M = y_dims[y_ndim - 1];
      const int batch_size = Y.numel() / (M * N);
      if (batch_size == 1) {
        VLOG(3) << "MatMul's case 3";
        blas.GEMV(true,
                  N,
                  M,
                  static_cast<T>(1),
                  y_data,
                  x_data,
                  static_cast<T>(flag),
                  Out->mutable_data<T>());
      } else {
        VLOG(3) << "MatMul's case 4";
        blas.BatchedGEMM(CblasTrans,
                         CblasNoTrans,
                         M,
                         1,
                         N,
                         static_cast<T>(1),
                         y_data,
                         x_data,
                         static_cast<T>(flag),
                         Out->mutable_data<T>(),
                         batch_size,
                         M * N,
                         0);
      }
    }
    return;
  }

  if (y_ndim == 1) {
    const int N = Y.numel();
    if (trans_x) {
      PADDLE_ENFORCE_EQ(x_dims[x_ndim - 2],
                        N,
                        paddle::platform::errors::InvalidArgument(
                            "Input(X) has error dim."
                            "X'dims[%d] must be equal to %d"
                            "But received X'dims[%d] is %d",
                            x_ndim - 2,
                            N,
                            x_ndim - 2,
                            x_dims[x_ndim - 2]));
    } else {
      PADDLE_ENFORCE_EQ(x_dims[x_ndim - 1],
                        N,
                        paddle::platform::errors::InvalidArgument(
                            "Input(X) has error dim."
                            "X'dims[%d] must be equal to %d"
                            "But received X'dims[%d] is %d",
                            x_ndim - 1,
                            N,
                            x_ndim - 1,
                            x_dims[x_ndim - 1]));
    }
    std::vector<std::int64_t> out_dims(x_ndim - 1);
    if (trans_x) {
      std::copy_n(x_dims.cbegin(), x_ndim - 2, out_dims.begin());
      out_dims.back() = x_dims.back();
    } else {
      std::copy_n(x_dims.cbegin(), x_ndim - 1, out_dims.begin());
    }
    Out->Resize(paddle::framework::make_ddim(out_dims));
    Out->mutable_data<T>();

    if (trans_x) {
      const int M = x_dims[x_ndim - 1];
      const int batch_size = X.numel() / (M * N);
      if (batch_size == 1) {
        VLOG(3) << "MatMul's case 5";
        blas.GEMV(true,
                  N,
                  M,
                  static_cast<T>(1),
                  x_data,
                  y_data,
                  static_cast<T>(flag),
                  Out->mutable_data<T>());
      } else {
        VLOG(3) << "MatMul's case 6";
        blas.BatchedGEMM(CblasTrans,
                         CblasNoTrans,
                         M,
                         1,
                         N,
                         static_cast<T>(1),
                         x_data,
                         y_data,
                         static_cast<T>(flag),
                         Out->mutable_data<T>(),
                         batch_size,
                         M * N,
                         0);
      }
    } else {
      const int M = X.numel() / N;
      VLOG(3) << "MatMul's case 7";
      blas.GEMV(false,
                M,
                N,
                static_cast<T>(1),
                x_data,
                y_data,
                static_cast<T>(flag),
                Out->mutable_data<T>());
    }
    return;
  }

  const int M = trans_x ? x_dims[x_ndim - 1] : x_dims[x_ndim - 2];
  const int K = trans_x ? x_dims[x_ndim - 2] : x_dims[x_ndim - 1];
  if (trans_y) {
    PADDLE_ENFORCE_EQ(y_dims[y_ndim - 1],
                      K,
                      paddle::platform::errors::InvalidArgument(
                          "Input(Y) has error dim."
                          "Y'dims[%d] must be equal to %d"
                          "But received Y'dims[%d] is %d",
                          y_ndim - 1,
                          K,
                          y_ndim - 1,
                          y_dims[y_ndim - 1]));
  } else {
    PADDLE_ENFORCE_EQ(y_dims[y_ndim - 2],
                      K,
                      paddle::platform::errors::InvalidArgument(
                          "Input(Y) has error dim."
                          "Y'dims[%d] must be equal to %d"
                          "But received Y'dims[%d] is %d",
                          y_ndim - 2,
                          K,
                          y_ndim - 2,
                          y_dims[y_ndim - 2]));
  }
  const int N = trans_y ? y_dims[y_ndim - 2] : y_dims[y_ndim - 1];
  const int ndim = (std::max)(x_ndim, y_ndim);
  std::vector<std::int64_t> x_broadcast_dims(ndim);
  std::vector<std::int64_t> y_broadcast_dims(ndim);
  std::vector<std::int64_t> out_broadcast_dims(ndim);

  GetBroadcastFromDims(x_ndim - 2,
                       x_dims.data(),
                       y_ndim - 2,
                       y_dims.data(),
                       x_broadcast_dims.data(),
                       y_broadcast_dims.data(),
                       out_broadcast_dims.data());
  out_broadcast_dims[ndim - 2] = M;
  out_broadcast_dims[ndim - 1] = N;

  Out->Resize(paddle::framework::make_ddim(out_broadcast_dims));
  Out->mutable_data<T>();

  const int batch_dim = ndim - 2;
  // broadcast message
  const bool is_broadcast_dims =
      !std::equal(x_broadcast_dims.cbegin(),
                  x_broadcast_dims.cbegin() + batch_dim,
                  y_broadcast_dims.cbegin());

  const std::int64_t x_batch_size =
      std::accumulate(x_broadcast_dims.cbegin(),
                      x_broadcast_dims.cbegin() + batch_dim,
                      1LL,
                      std::multiplies<std::int64_t>());
  const std::int64_t y_batch_size =
      std::accumulate(y_broadcast_dims.cbegin(),
                      y_broadcast_dims.cbegin() + batch_dim,
                      1LL,
                      std::multiplies<std::int64_t>());
  const std::int64_t out_batch_size =
      std::accumulate(out_broadcast_dims.cbegin(),
                      out_broadcast_dims.cbegin() + batch_dim,
                      1LL,
                      std::multiplies<std::int64_t>());
  if (out_batch_size == 0) return;
  if (x_batch_size == 1 && y_batch_size == 1) {
    VLOG(3) << "MatMul's case 8";
    blas.GEMM(trans_x ? CblasTrans : CblasNoTrans,
              trans_y ? CblasTrans : CblasNoTrans,
              M,
              N,
              K,
              static_cast<T>(1),
              x_data,
              y_data,
              static_cast<T>(flag),
              Out->mutable_data<T>());
  } else if (x_batch_size == 1) {
    if (M == 1 && trans_y) {
      VLOG(3) << "MatMul's case 9";
      blas.GEMV(false,
                y_batch_size * N,
                K,
                static_cast<T>(1),
                y_data,
                x_data,
                static_cast<T>(flag),
                Out->mutable_data<T>());
    } else {
      VLOG(3) << "MatMul's case 10";
      blas.BatchedGEMM(trans_x ? CblasTrans : CblasNoTrans,
                       trans_y ? CblasTrans : CblasNoTrans,
                       M,
                       N,
                       K,
                       static_cast<T>(1),
                       x_data,
                       y_data,
                       static_cast<T>(flag),
                       Out->mutable_data<T>(),
                       out_batch_size,
                       0,
                       K * N);
    }
  } else if (y_batch_size == 1) {
    if (!trans_x) {
      VLOG(3) << "MatMul's case 11";
      blas.GEMM(CblasNoTrans,
                trans_y ? CblasTrans : CblasNoTrans,
                x_batch_size * M,
                N,
                K,
                static_cast<T>(1),
                x_data,
                y_data,
                static_cast<T>(flag),
                Out->mutable_data<T>());
    } else {
      VLOG(3) << "MatMul's case 12";
      blas.BatchedGEMM(CblasTrans,
                       trans_y ? CblasTrans : CblasNoTrans,
                       M,
                       N,
                       K,
                       static_cast<T>(1),
                       x_data,
                       y_data,
                       static_cast<T>(flag),
                       Out->mutable_data<T>(),
                       out_batch_size,
                       M * K,
                       0);
    }
  } else if (!is_broadcast_dims) {
    VLOG(3) << "MatMul's case 13";
    blas.BatchedGEMM(trans_x ? CblasTrans : CblasNoTrans,
                     trans_y ? CblasTrans : CblasNoTrans,
                     M,
                     N,
                     K,
                     static_cast<T>(1),
                     x_data,
                     y_data,
                     static_cast<T>(flag),
                     Out->mutable_data<T>(),
                     out_batch_size,
                     M * K,
                     K * N);
  } else {
    // in the case, can't use stridedgemm
    std::vector<const T*> x_ptr(out_batch_size);
    std::vector<const T*> y_ptr(out_batch_size);
    std::vector<T*> out_ptr(out_batch_size);
    std::vector<std::int64_t> index(batch_dim, 0);
    for (std::int64_t i = 0; i < out_batch_size; ++i) {
      // using the index to get offset
      const std::int64_t x_index =
          GetIndexMessage(batch_dim, x_broadcast_dims.data(), index.data());
      const std::int64_t y_index =
          GetIndexMessage(batch_dim, y_broadcast_dims.data(), index.data());

      x_ptr[i] = x_data + x_index * M * K;
      y_ptr[i] = y_data + y_index * K * N;
      out_ptr[i] = Out->mutable_data<T>() + i * M * N;
      IndexIncreaseFromDims(batch_dim, out_broadcast_dims.data(), index.data());
    }
    VLOG(3) << "MatMul's case 14";
    blas.BatchedGEMM(trans_x ? CblasTrans : CblasNoTrans,
                     trans_y ? CblasTrans : CblasNoTrans,
                     M,
                     N,
                     K,
                     static_cast<T>(1),
                     x_ptr.data(),
                     y_ptr.data(),
                     static_cast<T>(flag),
                     out_ptr.data(),
                     out_batch_size);
  }

  // Add还有很多代码，这里省略
}

2. 本PR实现方式（一行写完，并且支持多设备，已在单测中验证）

说明： 这里使用的C++ API与Python端对应API在以下各方面均保持一致：

接口路径（暂时在paddle后插入experimental，后续会移除）
命名
参数列表（除去name参数）
参数类型支持
API功能

因此，用户可直接参考Python API文档使用C++ API，无额外理解成本，并且性能也是经过paddle内部同学优化的。

// The linear implemented here must be passed in bias
std::vector<paddle::Tensor> PtenLinearForward(const paddle::Tensor& x,
                                              const paddle::Tensor& weight,
                                              const paddle::Tensor& bias) {
  return {paddle::experimental::add(paddle::experimental::matmul(x, weight), bias)};
}

当然，目前以上两种实现方式都是支持的，并不是PR合入后原先的写法就不支持了。

TODO事项

自定义算子C++ API为对外暴露的正式接口，需要确保其兼容性，本PR在整合时考虑了这一点，但限于工作量较大，部分工作拆分开展，原自定义算子Tensor的reshape, copy_to, slice, cast方法均是之前专门为自定义算子编写的实现，本次整合后，这几个API将直接复用pten的C++ API和kernel实现功能，而目前pten相关的kernel还在迁移中，因此这几个API在后续PR（11.20之前）完成，本PR暂时禁用了这几个方法
本次新引入的API暂时先以paddle::experimental作为命名空间前缀，目前的API还在实验阶段，尚有不确定性，之前自定义算子开放API过于草率，导致之前设计考虑不够充分的一些数据结构和API直接暴露给用户，现在为了确保兼容性又不能废弃，例如之前的PlaceType，及Tensor的不完全构造函数、Tensor::stream()等，后续这些设计不太好的接口也会添加Deprecated Warning

… develop

… pten/refactor_custom_op

XieYunshen

LGTM
单测整理被识别为移除单测

zyfncg · 2021-11-14T15:30:49Z

paddle/pten/common/data_type.h

+      PD_THROW("Data type ",
+               static_cast<int>(data_type),
+               " is not supported by tensor.");


这里和下面176行的报错提示写法有点不太一致，是两种写法都可以吗？

done，这里也加了个引号

zyfncg · 2021-11-14T15:43:51Z

paddle/pten/api/include/tensor.h

+   * @brief Return the shape (dimensions) of Tensor.
+   * The compatible method of `Tensor::dims()`.
+   * This is a deprecated method and may be removed in the future!
+   *
+   * @return std::vector<int64_t>
+   */
+  std::vector<int64_t> shape() const;


如果将来删除shape函数，会不会出现C++的Tensor接口与Python不一致的问题？

这里需要做的不是移除这个接口，而是做不兼容升级，后续在接口中给出具体的提示

shape is better than dims

目前需要保留shape接口也是为了兼容

zyfncg · 2021-11-14T15:45:33Z

paddle/pten/api/include/tensor.h

+  bool is_cpu() const;
+
+  /**
+   * @brief Determine whether the tensor device is CPU


这里的CPU应该是GPU

zyfncg · 2021-11-14T15:50:11Z

paddle/pten/api/include/tensor.h

   */
-  bool is_cpu() const { return paddle::platform::is_cpu_place(place()); }
-  bool is_cuda() const { return paddle::platform::is_gpu_place(place()); }
+  paddle::platform::Place inner_place() const;


如果上面的place()接口将来移除的话，这里的inner_place()可以更新为place()吗？

可以，这里也需要做不兼容升级

zyfncg · 2021-11-14T15:54:04Z

paddle/pten/api/include/tensor.h

+   * This is a deprecated method and may be removed in the future!
+   *
+   * @tparam T
+   * @param target_place of target place, of which the tensor will copy to.


这里的语法感觉有点怪

zyfncg · 2021-11-14T15:54:13Z

paddle/pten/api/include/tensor.h

-   * @return None
+   * @brief Transfer the current Tensor to the specified device and return.
+   *
+   * @param place of target place, of which the tensor will copy to.


zyfncg · 2021-11-14T15:57:48Z

paddle/pten/api/include/tensor.h

+
+  /**
+   * @brief Determine whether Tensor is initialized.
+   * This is a deprecated method and may be removed in the future!


这些将来准备删除的接口是不是可以在注释里给出替代当前接口的方法？

后续会在接口调用时直接给出warning提示，注释对用户来讲没什么意义，后续会再补充一下，目前统一使用这个写法

Aurelius84

LGTM for overall

Aurelius84 · 2021-11-15T02:14:26Z

paddle/pten/api/lib/tensor.cc

+
+template <typename T>
+T *Tensor::mutable_data() {
+  if (impl_->type_info().name() == "DenseTensor") {


Not Important，这里对是否为DenseTensor，后续是否可以单独一个函数？我看多处调用了，后续实现上有变动，接口层是不用改动的。

done，thx，暂时封装了一个inline函数做判断，后面再想想更好的方式

… pten/refactor_custom_op

YuanRisheng · 2021-11-15T03:24:37Z

paddle/pten/api/include/creation.h

-                  DataType dtype = DataType::UNDEFINED,
-                  Backend backend = Backend::UNDEFINED,
-                  DataLayout layout = DataLayout::UNDEFINED);
+PD_DLL_DECL Tensor full(const std::vector<int64_t>& shape,


PD_DLL_DECL含义是？还有使用PD_DLL_DECL的原因是？

windows符号需要手动导出，不然运行时会找不到符号，这点和unix不一样

zhwesky2010 · 2021-11-15T03:39:48Z

python/paddle/fluid/tests/custom_op/custom_relu_op.cu

  int numel = x.size();
  int block = 512;
  int grid = (numel + block - 1) / block;
  PD_DISPATCH_FLOATING_AND_HALF_TYPES(
      x.type(), "relu_cuda_forward_kernel", ([&] {
-        auto cpu_input = x.copy_to<data_t>(paddle::PlaceType::kCPU);


这里不测下copy_to吗

copy_to暂时disable了，还有reshape，cast，slice，会在下个PR加回来，也会再补充对应的单测

可以看下PR描述中的TODO事项解释

zhwesky2010 · 2021-11-15T04:17:42Z

paddle/pten/api/include/registry.h

+  PD_DLL_DECL int RegisterSymbolsFor##name() { return 0; }
+
+#define PT_DECLARE_API(name)             \
+  extern int RegisterSymbolsFor##name(); \


这里也加个 PD_DLL_DECL 吧

zhwesky2010 · 2021-11-15T04:18:01Z

paddle/pten/api/include/registry.h

+
+#define PT_DECLARE_API(name)             \
+  extern int RegisterSymbolsFor##name(); \
+  UNUSED static int use_pten_api_##name = RegisterSymbolsFor##name()


UNUSED作用是？

zhwesky2010

对外暴露的普通函数和类，声明时都需要加上PD_DLL_DECL

Shixiaowei02 · 2021-11-15T06:00:53Z

paddle/pten/api/lib/tensor.cc

+
+namespace detail {
+
+inline bool IsDenseTensor(


[TODO] 后续此处会改为 no-rtti 的形式
pten::DenseTensor::classof(derived_a.get());

JiabinYang · 2021-11-15T08:30:59Z

paddle/pten/CMakeLists.txt

@@ -20,4 +25,5 @@ endif()
 if(WITH_XPU)
  set(PTEN_DEPS ${PTEN_DEPS} manipulation_xpu)
 endif()
+


remove additional blank line

这个不重要，仅是和上面的做个跨行区分

JiabinYang · 2021-11-15T08:37:02Z

paddle/pten/api/include/tensor.h

-                            platform::errors::InvalidArgument(
-                                "TensorImpl with nullptr is not supported"));
-  }
+  explicit Tensor(const PlaceType& place);


Is this a good way to compatible with original custom tensor, these two constructor here is not safe

这两个构造都不安全，但是现在需要兼容，后续会增加deprecated warning，需要一段时间，比如在2.4版本废弃掉

JiabinYang · 2021-11-15T08:38:22Z

paddle/pten/api/include/tensor.h

   */
-  paddle::experimental::DataType type() const { return impl_->data_type(); }
+  DataType type() const;


type is not a good name... how about var_type

这个type接口也是为了兼容自定义算子；2. var_type并不是一个好名字，因为目前的体系里没有Variable

JiabinYang · 2021-11-15T08:40:41Z

paddle/pten/api/lib/kernel_dispatch.cc

+
+}  // namespace detail
+
+paddle::platform::DeviceContext* GetDeviceContextByBackend(


枚举本身应该属于POD类型，不需要传const引用，我们之前设计自定义算子的属性的时候都要求int, float传const&，其实也不太有必要

JiabinYang · 2021-11-15T08:43:24Z

paddle/pten/api/include/tensor.h

+   *
+   * @return int64_t
+   */
+  int64_t size() const;


i prefer to using size() instead of numel

ok, 我下个PR将numel移除

JiabinYang · 2021-11-15T08:43:59Z

paddle/pten/api/include/tensor.h

+   * @brief Return the shape (dimensions) of Tensor.
+   * The compatible method of `Tensor::dims()`.
+   * This is a deprecated method and may be removed in the future!
+   *
+   * @return std::vector<int64_t>
+   */
+  std::vector<int64_t> shape() const;


shape is better than dims

JiabinYang

LGMT

raindrops2sea

LGTM

chenwhql added 27 commits November 5, 2021 10:31

move extension into pten [no-verify]

fce6112

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

47968f6

… develop

append tensor methods by ext_tensor [no-verify]

eab3bd3

append other tensor methods [no-verify]

639e18a

ext related files tidy [no-verify]

1d655fc

include relation tidy [no-verify]

48bf466

add pten tensor test [no-verify]

936ac06

replace tensor in custom op & compile success

1cf6219

refine tensor constructor for unittest

e59fdca

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

125a5f5

… pten/refactor_custom_op

custom relu jit run success

4b460c7

fix all custom op unittests

db13794

add inference cmake adapt [no-verify]

86fa3a0

fix failed unittests

f58bbf4

fix windows failed unittests

6e24b90

try to fix kunlun and inference failed

c403f03

Merge branch 'develop' into pten/refactor_custom_op

98a3eff

fix test_elementwise_api error

e0d872e

try to fix win compile failed

850bfa3

fix kunlun fp16 type error

3a253b7

remove useless haddle error macro

ef8c987

add custom linear op test

54cc78b

fix compile failed & add win symbols

ff5bddb

Merge branch 'develop' into pten/refactor_custom_op

e46af0a

fix non pten kernel cast failed

a31387a

add dll decl for api

1e934a7

polish several deetails

e1b1819

XieYunshen previously approved these changes Nov 13, 2021

View reviewed changes

zyfncg reviewed Nov 15, 2021

View reviewed changes

Aurelius84 previously approved these changes Nov 15, 2021

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f98d8be

… pten/refactor_custom_op

YuanRisheng reviewed Nov 15, 2021

View reviewed changes

polish details by review comment

5d10182

chenwhql dismissed stale reviews from Aurelius84 and XieYunshen via 5d10182 November 15, 2021 03:41

zhwesky2010 reviewed Nov 15, 2021

View reviewed changes

add dll_decl for register

6fd4dbb

Shixiaowei02 reviewed Nov 15, 2021

View reviewed changes

Shixiaowei02 approved these changes Nov 15, 2021

View reviewed changes

XieYunshen approved these changes Nov 15, 2021

View reviewed changes

chenwhql requested review from Aurelius84, zhwesky2010 and zyfncg November 15, 2021 08:22

zyfncg approved these changes Nov 15, 2021

View reviewed changes

chenwhql requested review from phlrain, raindrops2sea and JiabinYang November 15, 2021 08:37

zhwesky2010 approved these changes Nov 15, 2021

View reviewed changes

Aurelius84 approved these changes Nov 15, 2021

View reviewed changes

JiabinYang reviewed Nov 15, 2021

View reviewed changes

JiabinYang approved these changes Nov 15, 2021

View reviewed changes

raindrops2sea approved these changes Nov 15, 2021

View reviewed changes

chenwhql merged commit 1e598f1 into PaddlePaddle:develop Nov 15, 2021

This was referenced Nov 17, 2021

[PTen] Add copy_to and to method for Tensor #37262

Merged

[PTen] Add slice api implemention for Tensor #37276

Merged

[PTen] Add compatible reshape method for Tensor #37281

Merged

ghost mentioned this pull request Dec 9, 2021

Changed default implementation of bfloat16 to eigen #37664

Closed


		} // namespace detail

		paddle::platform::DeviceContext* GetDeviceContextByBackend(

[Pten] Refactor the implementation of custom operator #37122

[Pten] Refactor the implementation of custom operator #37122

Conversation

chenwhql commented Nov 11, 2021 • edited Loading

PR types

PR changes

Describe

1. 原先实现方式（手写基础实现逻辑，较复杂）

2. 本PR实现方式（一行写完，并且支持多设备，已在单测中验证）

TODO事项

XieYunshen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aurelius84 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiabinYang left a comment

Choose a reason for hiding this comment

raindrops2sea left a comment

Choose a reason for hiding this comment

chenwhql commented Nov 11, 2021 •

edited

Loading