[Inference] Append attn FP8 quant #9328

ckl117 · 2024-10-28T12:35:49Z

PR types

Others

PR changes

Others

Description

append attn支持FP8 e4m3量化；
编译自定义算子时，自动生成FP8 cutlass gemm，并增加FP8 cutlass GEMM默认配置；
将FP8组网统一到FusedBlockMultiTransformer，方便后续维护；

paddle-bot · 2024-10-28T12:35:54Z

Thanks for your contribution!

codecov · 2024-10-28T13:09:35Z

Codecov Report

Attention: Patch coverage is 0% with 319 lines in your changes missing coverage. Please review.

Project coverage is 52.97%. Comparing base (66c5d65) to head (b50da65).
Report is 2 commits behind head on develop.

❗ Current head b50da65 differs from pull request most recent head deb4651

Please upload reports for the commit deb4651 to get more accurate results.

Files with missing lines	Patch %	Lines
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	129 Missing ⚠️
...dlenlp/experimental/transformers/qwen2/modeling.py	0.00%	124 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	66 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9328      +/-   ##
===========================================
+ Coverage    52.24%   52.97%   +0.72%     
===========================================
  Files          673      673              
  Lines       109100   107355    -1745     
===========================================
- Hits         56998    56868     -130     
+ Misses       52102    50487    -1615

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nto append_attn_fp8_quant

yuanlehome · 2024-10-30T07:18:05Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

+    def compute_activation(self, ffn1_out, i):
+        return ffn1_out


FP8的activation是被融合了吗？

使用自定义算子实现的cutlass版本的FP8 dual gemm融合了act；
使用Paddle实现的cublaslt的FP8 gemm在compute_ffn1函数内计算了act，所以继承的这个方法置空就可以了；

yuanlehome · 2024-10-30T07:18:49Z

csrc/gpu/append_attn/append_attention_c16_impl.cuh

@@ -321,6 +323,7 @@ __global__ void multi_query_append_attention_kernel(
        smooth_weight,
        q_base_seq_id_this_block,
        q_head_idx,
+        quant_max_bound,quant_min_bound,


格式化一下C++代码

yuanlehome

也添加一下llama的重构吧

…nto append_attn_fp8_quant

ckl117 added 3 commits October 24, 2024 05:25

add fp8 gen files to gitignore

c62ab7c

append_attn support fp8 quant

265e9b8

Unified FP8 Network

85e151e

ckl117 added 3 commits October 29, 2024 10:05

include cuda_fp8.h

b03707f

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

b5111c8

…nto append_attn_fp8_quant

simplify qwen2 network and FusedBlockMultiTransformerFP8

0172295

yuanlehome reviewed Oct 30, 2024

View reviewed changes

simplify llama network and code check

6dd2fcc

yuanlehome previously approved these changes Oct 30, 2024

View reviewed changes

check fp8 params

097e4ae

ckl117 dismissed yuanlehome’s stale review via 097e4ae October 30, 2024 12:48

yuanlehome closed this Oct 31, 2024

yuanlehome reopened this Oct 31, 2024

ckl117 added 2 commits October 31, 2024 05:45

code check

f63608e

check

37a6e88

ckl117 closed this Nov 1, 2024

ckl117 reopened this Nov 1, 2024

ckl117 added 2 commits November 1, 2024 10:58

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

b50da65

…nto append_attn_fp8_quant

default config for fp8 gemm

deb4651

yuanlehome approved these changes Nov 4, 2024

View reviewed changes

yuanlehome merged commit 5217a3b into PaddlePaddle:develop Nov 4, 2024
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Append attn FP8 quant #9328

[Inference] Append attn FP8 quant #9328

ckl117 commented Oct 28, 2024 •

edited

Loading

paddle-bot bot commented Oct 28, 2024

codecov bot commented Oct 28, 2024 •

edited

Loading

yuanlehome Oct 30, 2024

ckl117 Oct 30, 2024

yuanlehome Oct 30, 2024

ckl117 Oct 30, 2024

yuanlehome left a comment •

edited

Loading

[Inference] Append attn FP8 quant #9328

[Inference] Append attn FP8 quant #9328

Conversation

ckl117 commented Oct 28, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Oct 28, 2024

codecov bot commented Oct 28, 2024 • edited Loading

Codecov Report

yuanlehome Oct 30, 2024

Choose a reason for hiding this comment

ckl117 Oct 30, 2024

Choose a reason for hiding this comment

yuanlehome Oct 30, 2024

Choose a reason for hiding this comment

ckl117 Oct 30, 2024

Choose a reason for hiding this comment

yuanlehome left a comment • edited Loading

Choose a reason for hiding this comment

ckl117 commented Oct 28, 2024 •

edited

Loading

codecov bot commented Oct 28, 2024 •

edited

Loading

yuanlehome left a comment •

edited

Loading