Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse_fc supported #41770

Closed
wants to merge 21 commits into from
Closed

sparse_fc supported #41770

wants to merge 21 commits into from

Conversation

b3602sss
Copy link
Contributor

@b3602sss b3602sss commented Apr 13, 2022

PR types

New features

PR changes

OPs

Describe

稀疏推理支持。编译时使用-DWITH_CUSPARSELT=ON 开启cusparselt相关支持。
1.Pass阶段查找权重名字包含有关键字“sparse_2_4”的FC,替换成假OP:sparse_fc。
2.sparse_fc不是用户可以直接使用的真实OP。
3.sparse_fc调用cusparseLT完成稀疏推理。

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@b3602sss b3602sss changed the title Fc sparse_fc supported Apr 20, 2022
@paddle-bot-old
Copy link

paddle-bot-old bot commented May 1, 2022

Sorry to inform you that 35d16f1's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

FusePassBase::Init(name_scope, graph);
GraphPatternDetector gpd;

patterns::DenseFC dense_fc_pattern(gpd.mutable_pattern(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在op_convert中,会将某些mul op转为fc op, 这种情况是不是也要转为稀疏fc?

例如:
在分布式推理时,mul和elementwise会被通信OP分开,导致其无法融合为FC op,而是mul op单独被converter转为FC op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种情况应该是分布式要解决的,应该不是我们层面考虑的。

if (w_name.find("sparse_2_4") != w_name.npos) {
// fake op
OpDesc desc(fc_op->Block());
desc.SetType("sparse_fc");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个命名需要和其它稀疏方式区分么?比如后续可能会添加的block sparsity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个可以后续定好的后改,用户没有感知。

//
// \brief Pattern looking for dense fc.
//
struct DenseFC : public PatternBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个新增的Pattern和当前文件中已定义的patterns::FC有什么区别么?为什么不直接用patterns::FC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patterns::FC是找mul+element的 不一样

paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc Outdated Show resolved Hide resolved
paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc Outdated Show resolved Hide resolved
paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc Outdated Show resolved Hide resolved
paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc Outdated Show resolved Hide resolved
wanghaoshuang
wanghaoshuang previously approved these changes Jun 2, 2022
Copy link
Contributor

@wanghaoshuang wanghaoshuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体问题不大,精度和显存的bug @minghaoBD 会在下个PR中修复。
另外,后续需要 @minghaoBD 关注:

  1. weights共享问题
  2. 支持只融合反量化,输出为FP16

"SpmmPluginDynamic only supports weight of type [FLOAT|HALF]"));
nvinfer1::DataType weight_type;
if (precision_ == nvinfer1::DataType::kINT8) {
weight_type = nvinfer1::DataType::kFLOAT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在TensorRT IConvolutionLayer中,目前是传入FP32的weights,TRT内部将weights转为Int8, 与当前的实现较为一致,但是:

  1. 后续为了优化PaddleInference的内存,在Pass分析阶段,weights可能是以FP16数值存储的;
  2. 应该支持传入Int8的weights,也就是量化操作不在当前plugin内做,也不用在当前plugin统计weight的abs_max, 直接用量化模型中自带的abs_max(或叫weight scale)既可。
    以上两点,可以在后续的PR中支持。

Copy link
Contributor

@minghaoBD minghaoBD Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 好的,这个可以等支持存储fp16weights,convert这边做适配。
  2. 好的,我给plugin的constructor增加一个scale输入。我在更新到cusparseLT 0.3.0的PR中加入吧,不然会存在【channel-wise提供了scale_vec】和 【0.2.0不支持scale_vec】的冲突

std::vector<char> weight_host(element_size_ * out_dim_ * k_);
convertAndCopy(weight, weight_type, weight_host.data());
void* weight_dev{nullptr};
cudaMalloc(reinterpret_cast<void**>(&weight_dev),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 应该支持传入Int8的weights,也就是量化操作不在当前plugin内做,也不用在当前plugin统计weight的abs_max, 直接用量化模型中自带的abs_max(或叫weight scale)既可。
    以上两点,可以在后续的PR中支持。

不在plugin内做量化的另一个原因是,当前在plugin内分配管理显存,不便于和其它plugin共享参数(显存)。

std::abs(reinterpret_cast<const float*>(weight_host.data())[i]);
max_weight = std::max(max_weight, local_abs);
}
weight_scale_ = max_weight / 127.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也要支持channel-wise的量化。对weights进行channel-wise量化,会影响反量化逻辑:

  1. 如果反量化是融合在cusparseLtMatmul接口中的,则需要调整计算逻辑,主要是gemm的alpha参数;
  2. 如果反量化不是融合在cusparseLtMatmul接口中的,则当前plugin的计算逻辑不受影响;

以上,需要 @minghaoBD 在后续PR中关注。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于channel-wise,cusparseLt 0.2.0只支持 float类型的alpha,0.3.0才支持vec类型。

问题1在0.3.0对应的plugin代码中已经考虑到,会修改alpha的计算逻辑。

2. (Int8) Calculate scale and scale the weight (on host)
3. Copy weight to device
4. Compress the weight (on device)
5. Copy the compressed weight to host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一步是必须的么?cusparseLtMatmul不能直接接收device上的数据么?
只是为了序列化?

Copy link
Contributor

@minghaoBD minghaoBD Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

序列化和反序列化的使用需要compressed_weight_。

  1. 不清楚是否可以序列化和反序列化device上面的变量?
  2. 这个CPU的内存有优化的必要性吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 应该不可以
  2. 不用优化这个host memory了

activation_(activation) {
/*
1. Copy the compressed weight (on host)
2. Copy the compressed weight to device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没看在当前构造函数中看到这一步相关的代码;这一步是在clone函数中吧?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦哦,这一步的注释没删掉。我在PR2中删除。

weight_compressed_ = new char[compressed_size_];
deserialize_value_size(&data, &length, weight_compressed_, compressed_size_);
cudaMalloc(reinterpret_cast<void**>(&weight_compressed_dev_),
compressed_size_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于共享weights的情况,要关注下这里。

}
const nvinfer1::PluginTensorDesc& prev = inOut[pos - 1];

return in.type == prev.type && in.format == prev.format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前实现,是将反量化和量化都融合到了cusparseMatMul接口中(通过设置alpha)
能否支持只融合反量化,输出是FP16?

Copy link
Contributor

@minghaoBD minghaoBD Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cusparseLT这个库要求输入输出同一datatype,即不支持int8输入,fp16输出
image

所以只能依赖nv这个库的更新了

weight_compressed_dev_, &beta, output, output, workSpace, &stream, 1);
return status != CUSPARSE_STATUS_SUCCESS;
} else if (inputDesc->type == nvinfer1::DataType::kINT8) {
alpha = inputDesc->scale * weight_scale_ / outputDesc->scale;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前实现,是将反量化和量化都融合到了cusparseMatMul接口中(通过设置alpha)
能否支持只融合反量化,输出是FP16?

后续OP可能只支持FP16,这样就节省一个类型转换OP。

@b3602sss b3602sss closed this Jun 2, 2022
@b3602sss b3602sss reopened this Jun 2, 2022
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jun 2, 2022
@PaddlePaddle PaddlePaddle unlocked this conversation Jun 2, 2022
Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加下单测吧

paddle/fluid/inference/api/paddle_pass_builder.cc Outdated Show resolved Hide resolved
paddle/fluid/inference/tensorrt/convert/sparse_fc_op.cc Outdated Show resolved Hide resolved
@b3602sss
Copy link
Contributor Author

b3602sss commented Jun 2, 2022

加下单测吧

在下一个的pr上有单测

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@b3602sss b3602sss closed this Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants