-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse_fc supported #41770
sparse_fc supported #41770
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that 35d16f1's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
FusePassBase::Init(name_scope, graph); | ||
GraphPatternDetector gpd; | ||
|
||
patterns::DenseFC dense_fc_pattern(gpd.mutable_pattern(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在op_convert中,会将某些mul op转为fc op, 这种情况是不是也要转为稀疏fc?
例如:
在分布式推理时,mul和elementwise会被通信OP分开,导致其无法融合为FC op,而是mul op单独被converter转为FC op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种情况应该是分布式要解决的,应该不是我们层面考虑的。
if (w_name.find("sparse_2_4") != w_name.npos) { | ||
// fake op | ||
OpDesc desc(fc_op->Block()); | ||
desc.SetType("sparse_fc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个命名需要和其它稀疏方式区分么?比如后续可能会添加的block sparsity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个可以后续定好的后改,用户没有感知。
// | ||
// \brief Pattern looking for dense fc. | ||
// | ||
struct DenseFC : public PatternBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个新增的Pattern和当前文件中已定义的patterns::FC有什么区别么?为什么不直接用patterns::FC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patterns::FC是找mul+element的 不一样
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体问题不大,精度和显存的bug @minghaoBD 会在下个PR中修复。
另外,后续需要 @minghaoBD 关注:
- weights共享问题
- 支持只融合反量化,输出为FP16
"SpmmPluginDynamic only supports weight of type [FLOAT|HALF]")); | ||
nvinfer1::DataType weight_type; | ||
if (precision_ == nvinfer1::DataType::kINT8) { | ||
weight_type = nvinfer1::DataType::kFLOAT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在TensorRT IConvolutionLayer中,目前是传入FP32的weights,TRT内部将weights转为Int8, 与当前的实现较为一致,但是:
- 后续为了优化PaddleInference的内存,在Pass分析阶段,weights可能是以FP16数值存储的;
- 应该支持传入Int8的weights,也就是量化操作不在当前plugin内做,也不用在当前plugin统计weight的abs_max, 直接用量化模型中自带的abs_max(或叫weight scale)既可。
以上两点,可以在后续的PR中支持。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 好的,这个可以等支持存储fp16weights,convert这边做适配。
- 好的,我给plugin的constructor增加一个scale输入。我在更新到cusparseLT 0.3.0的PR中加入吧,不然会存在【channel-wise提供了scale_vec】和 【0.2.0不支持scale_vec】的冲突
std::vector<char> weight_host(element_size_ * out_dim_ * k_); | ||
convertAndCopy(weight, weight_type, weight_host.data()); | ||
void* weight_dev{nullptr}; | ||
cudaMalloc(reinterpret_cast<void**>(&weight_dev), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 应该支持传入Int8的weights,也就是量化操作不在当前plugin内做,也不用在当前plugin统计weight的abs_max, 直接用量化模型中自带的abs_max(或叫weight scale)既可。
以上两点,可以在后续的PR中支持。
不在plugin内做量化的另一个原因是,当前在plugin内分配管理显存,不便于和其它plugin共享参数(显存)。
std::abs(reinterpret_cast<const float*>(weight_host.data())[i]); | ||
max_weight = std::max(max_weight, local_abs); | ||
} | ||
weight_scale_ = max_weight / 127.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也要支持channel-wise的量化。对weights进行channel-wise量化,会影响反量化逻辑:
- 如果反量化是融合在cusparseLtMatmul接口中的,则需要调整计算逻辑,主要是gemm的alpha参数;
- 如果反量化不是融合在cusparseLtMatmul接口中的,则当前plugin的计算逻辑不受影响;
以上,需要 @minghaoBD 在后续PR中关注。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
关于channel-wise,cusparseLt 0.2.0只支持 float类型的alpha,0.3.0才支持vec类型。
问题1在0.3.0对应的plugin代码中已经考虑到,会修改alpha的计算逻辑。
2. (Int8) Calculate scale and scale the weight (on host) | ||
3. Copy weight to device | ||
4. Compress the weight (on device) | ||
5. Copy the compressed weight to host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一步是必须的么?cusparseLtMatmul不能直接接收device上的数据么?
只是为了序列化?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
序列化和反序列化的使用需要compressed_weight_。
- 不清楚是否可以序列化和反序列化device上面的变量?
- 这个CPU的内存有优化的必要性吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 应该不可以
- 不用优化这个host memory了
activation_(activation) { | ||
/* | ||
1. Copy the compressed weight (on host) | ||
2. Copy the compressed weight to device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没看在当前构造函数中看到这一步相关的代码;这一步是在clone函数中吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哦哦,这一步的注释没删掉。我在PR2中删除。
weight_compressed_ = new char[compressed_size_]; | ||
deserialize_value_size(&data, &length, weight_compressed_, compressed_size_); | ||
cudaMalloc(reinterpret_cast<void**>(&weight_compressed_dev_), | ||
compressed_size_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于共享weights的情况,要关注下这里。
} | ||
const nvinfer1::PluginTensorDesc& prev = inOut[pos - 1]; | ||
|
||
return in.type == prev.type && in.format == prev.format; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前实现,是将反量化和量化都融合到了cusparseMatMul接口中(通过设置alpha)
能否支持只融合反量化,输出是FP16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight_compressed_dev_, &beta, output, output, workSpace, &stream, 1); | ||
return status != CUSPARSE_STATUS_SUCCESS; | ||
} else if (inputDesc->type == nvinfer1::DataType::kINT8) { | ||
alpha = inputDesc->scale * weight_scale_ / outputDesc->scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前实现,是将反量化和量化都融合到了cusparseMatMul接口中(通过设置alpha)
能否支持只融合反量化,输出是FP16?
后续OP可能只支持FP16,这样就节省一个类型转换OP。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加下单测吧
在下一个的pr上有单测 |
|
PR types
New features
PR changes
OPs
Describe
稀疏推理支持。编译时使用-DWITH_CUSPARSELT=ON 开启cusparselt相关支持。
1.Pass阶段查找权重名字包含有关键字“sparse_2_4”的FC,替换成假OP:sparse_fc。
2.sparse_fc不是用户可以直接使用的真实OP。
3.sparse_fc调用cusparseLT完成稀疏推理。