sparse_fc supported #41770

b3602sss · 2022-04-13T11:39:37Z

PR types

New features

PR changes

OPs

Describe

稀疏推理支持。编译时使用-DWITH_CUSPARSELT=ON 开启cusparselt相关支持。
1.Pass阶段查找权重名字包含有关键字“sparse_2_4”的FC，替换成假OP：sparse_fc。
2.sparse_fc不是用户可以直接使用的真实OP。
3.sparse_fc调用cusparseLT完成稀疏推理。

paddle-bot-old · 2022-04-13T11:41:14Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

paddle-bot-old · 2022-05-01T02:58:58Z

Sorry to inform you that 35d16f1's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

wanghaoshuang · 2022-06-01T07:02:19Z

paddle/fluid/framework/ir/replace_dense_with_sparse_pass.cc

+  FusePassBase::Init(name_scope, graph);
+  GraphPatternDetector gpd;
+
+  patterns::DenseFC dense_fc_pattern(gpd.mutable_pattern(),


在op_convert中，会将某些mul op转为fc op, 这种情况是不是也要转为稀疏fc?

例如：
在分布式推理时，mul和elementwise会被通信OP分开，导致其无法融合为FC op，而是mul op单独被converter转为FC op.

这种情况应该是分布式要解决的，应该不是我们层面考虑的。

wanghaoshuang · 2022-06-01T07:06:33Z

paddle/fluid/framework/ir/replace_dense_with_sparse_pass.cc

+    if (w_name.find("sparse_2_4") != w_name.npos) {
+      // fake op
+      OpDesc desc(fc_op->Block());
+      desc.SetType("sparse_fc");


这个命名需要和其它稀疏方式区分么？比如后续可能会添加的block sparsity

这个可以后续定好的后改，用户没有感知。

wanghaoshuang · 2022-06-01T07:14:08Z

paddle/fluid/framework/ir/graph_pattern_detector.h

+//
+// \brief   Pattern looking for dense fc.
+//
+struct DenseFC : public PatternBase {


这个新增的Pattern和当前文件中已定义的patterns::FC有什么区别么？为什么不直接用patterns::FC？

patterns::FC是找mul+element的不一样

paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc

wanghaoshuang

整体问题不大，精度和显存的bug @minghaoBD 会在下个PR中修复。
另外，后续需要 @minghaoBD 关注：

weights共享问题
支持只融合反量化，输出为FP16

wanghaoshuang · 2022-06-02T01:44:54Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+                "SpmmPluginDynamic only supports weight of type [FLOAT|HALF]"));
+  nvinfer1::DataType weight_type;
+  if (precision_ == nvinfer1::DataType::kINT8) {
+    weight_type = nvinfer1::DataType::kFLOAT;


在TensorRT IConvolutionLayer中，目前是传入FP32的weights，TRT内部将weights转为Int8, 与当前的实现较为一致，但是：

后续为了优化PaddleInference的内存，在Pass分析阶段，weights可能是以FP16数值存储的；

应该支持传入Int8的weights，也就是量化操作不在当前plugin内做，也不用在当前plugin统计weight的abs_max, 直接用量化模型中自带的abs_max(或叫weight scale)既可。
以上两点，可以在后续的PR中支持。

好的，这个可以等支持存储fp16weights，convert这边做适配。

好的，我给plugin的constructor增加一个scale输入。我在更新到cusparseLT 0.3.0的PR中加入吧，不然会存在【channel-wise提供了scale_vec】和【0.2.0不支持scale_vec】的冲突

wanghaoshuang · 2022-06-02T01:47:00Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+  std::vector<char> weight_host(element_size_ * out_dim_ * k_);
+  convertAndCopy(weight, weight_type, weight_host.data());
+  void* weight_dev{nullptr};
+  cudaMalloc(reinterpret_cast<void**>(&weight_dev),


应该支持传入Int8的weights，也就是量化操作不在当前plugin内做，也不用在当前plugin统计weight的abs_max, 直接用量化模型中自带的abs_max(或叫weight scale)既可。
以上两点，可以在后续的PR中支持。

不在plugin内做量化的另一个原因是，当前在plugin内分配管理显存，不便于和其它plugin共享参数（显存）。

wanghaoshuang · 2022-06-02T01:54:21Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+          std::abs(reinterpret_cast<const float*>(weight_host.data())[i]);
+      max_weight = std::max(max_weight, local_abs);
+    }
+    weight_scale_ = max_weight / 127.0f;


这里也要支持channel-wise的量化。对weights进行channel-wise量化，会影响反量化逻辑：

如果反量化是融合在cusparseLtMatmul接口中的，则需要调整计算逻辑，主要是gemm的alpha参数；

如果反量化不是融合在cusparseLtMatmul接口中的，则当前plugin的计算逻辑不受影响；

以上，需要 @minghaoBD 在后续PR中关注。

关于channel-wise，cusparseLt 0.2.0只支持 float类型的alpha，0.3.0才支持vec类型。

问题1在0.3.0对应的plugin代码中已经考虑到，会修改alpha的计算逻辑。

wanghaoshuang · 2022-06-02T01:57:13Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+  2. (Int8) Calculate scale and scale the weight (on host)
+  3. Copy weight to device
+  4. Compress the weight (on device)
+  5. Copy the compressed weight to host


这一步是必须的么？cusparseLtMatmul不能直接接收device上的数据么？
只是为了序列化？

序列化和反序列化的使用需要compressed_weight_。

不清楚是否可以序列化和反序列化device上面的变量？

这个CPU的内存有优化的必要性吗？

应该不可以

不用优化这个host memory了

wanghaoshuang · 2022-06-02T02:01:40Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+      activation_(activation) {
+  /*
+  1. Copy the compressed weight (on host)
+  2. Copy the compressed weight to device


没看在当前构造函数中看到这一步相关的代码；这一步是在clone函数中吧？

哦哦，这一步的注释没删掉。我在PR2中删除。

wanghaoshuang · 2022-06-02T02:04:18Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+  weight_compressed_ = new char[compressed_size_];
+  deserialize_value_size(&data, &length, weight_compressed_, compressed_size_);
+  cudaMalloc(reinterpret_cast<void**>(&weight_compressed_dev_),
+             compressed_size_);


对于共享weights的情况，要关注下这里。

wanghaoshuang · 2022-06-02T02:08:37Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+  }
+  const nvinfer1::PluginTensorDesc& prev = inOut[pos - 1];
+
+  return in.type == prev.type && in.format == prev.format;


当前实现，是将反量化和量化都融合到了cusparseMatMul接口中(通过设置alpha)
能否支持只融合反量化，输出是FP16?

cusparseLT这个库要求输入输出同一datatype，即不支持int8输入，fp16输出

所以只能依赖nv这个库的更新了

wanghaoshuang · 2022-06-02T02:11:52Z

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu

+          weight_compressed_dev_, &beta, output, output, workSpace, &stream, 1);
+      return status != CUSPARSE_STATUS_SUCCESS;
+    } else if (inputDesc->type == nvinfer1::DataType::kINT8) {
+      alpha = inputDesc->scale * weight_scale_ / outputDesc->scale;


当前实现，是将反量化和量化都融合到了cusparseMatMul接口中(通过设置alpha)
能否支持只融合反量化，输出是FP16?

后续OP可能只支持FP16，这样就节省一个类型转换OP。

qingqing01

加下单测吧

paddle/fluid/framework/ir/replace_dense_with_sparse_pass.cc

paddle/fluid/inference/api/paddle_pass_builder.cc

paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc

b3602sss · 2022-06-02T07:46:20Z

加下单测吧

在下一个的pr上有单测

CLAassistant · 2022-08-22T02:47:08Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

b3602sss added 8 commits April 8, 2022 05:08

first

f42a773

modify new_spmm_plugin

f667231

modify compile

0b1519d

modify dense fc pass

78d585e

remove node

1ac29fd

typo

b72bcbd

miss in_num_col_dims

2a3fff1

error info

093e4a4

b3602sss added 2 commits April 15, 2022 09:23

modify log

17338e9

merge develop

1808537

wanghaoshuang reviewed Apr 19, 2022

View reviewed changes

paddle/fluid/inference/tensorrt/plugin/spmm_plugin.cu Outdated Show resolved Hide resolved

fp16 supported

256c6ca

b3602sss changed the title Fc sparse_fc supported Apr 20, 2022

b3602sss added 5 commits April 20, 2022 07:36

remove paddle_enforce

4c44db5

comments

ec004da

comments

7f8c63f

coma

c4e1db8

error

35d16f1

wanghaoshuang requested review from minghaoBD, Wangzheee, qingqing01 and cyj1986 April 21, 2022 03:19

minghaoBD requested a review from wanghaoshuang April 21, 2022 03:30

wanghaoshuang reviewed Jun 1, 2022

View reviewed changes

b3602sss added 2 commits June 1, 2022 12:18

comments

a16489e

reslove

2e85d1b

wanghaoshuang previously approved these changes Jun 2, 2022

View reviewed changes

b3602sss closed this Jun 2, 2022

b3602sss reopened this Jun 2, 2022

PaddlePaddle locked and limited conversation to collaborators Jun 2, 2022

PaddlePaddle unlocked this conversation Jun 2, 2022

resolve

9a42620

b3602sss dismissed wanghaoshuang’s stale review via 9a42620 June 2, 2022 06:58

remove slice

3f1ddb2

qingqing01 reviewed Jun 2, 2022

View reviewed changes

paddle/fluid/framework/ir/replace_dense_with_sparse_pass.cc Outdated Show resolved Hide resolved

paddle/fluid/inference/api/paddle_pass_builder.cc Outdated Show resolved Hide resolved

paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc Outdated Show resolved Hide resolved

rename desne_to_sparse_pass

1e6d8e0

b3602sss closed this Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse_fc supported #41770

sparse_fc supported #41770

b3602sss commented Apr 13, 2022 •

edited

Loading

paddle-bot-old bot commented Apr 13, 2022

paddle-bot-old bot commented May 1, 2022

wanghaoshuang Jun 1, 2022

b3602sss Jun 1, 2022

wanghaoshuang Jun 1, 2022

b3602sss Jun 1, 2022

wanghaoshuang Jun 1, 2022

b3602sss Jun 1, 2022

wanghaoshuang left a comment •

edited

Loading

wanghaoshuang Jun 2, 2022

minghaoBD Jun 2, 2022 •

edited

Loading

wanghaoshuang Jun 2, 2022

wanghaoshuang Jun 2, 2022

minghaoBD Jun 2, 2022

wanghaoshuang Jun 2, 2022

minghaoBD Jun 2, 2022 •

edited

Loading

wanghaoshuang Jun 2, 2022

wanghaoshuang Jun 2, 2022

minghaoBD Jun 2, 2022

wanghaoshuang Jun 2, 2022

wanghaoshuang Jun 2, 2022

minghaoBD Jun 2, 2022 •

edited

Loading

wanghaoshuang Jun 2, 2022

qingqing01 left a comment

b3602sss commented Jun 2, 2022

CLAassistant commented Aug 22, 2022

sparse_fc supported #41770

sparse_fc supported #41770

Conversation

b3602sss commented Apr 13, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 13, 2022

paddle-bot-old bot commented May 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghaoshuang left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minghaoBD Jun 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minghaoBD Jun 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minghaoBD Jun 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

b3602sss commented Jun 2, 2022

CLAassistant commented Aug 22, 2022

b3602sss commented Apr 13, 2022 •

edited

Loading

wanghaoshuang left a comment •

edited

Loading

minghaoBD Jun 2, 2022 •

edited

Loading

minghaoBD Jun 2, 2022 •

edited

Loading

minghaoBD Jun 2, 2022 •

edited

Loading