Support XPU for auto-paralllel LLaMa #9796

From00 · 2025-01-20T03:48:49Z

PR types

New features

PR changes

Models

Description

Llama模型适配xpu自动并行训练，目前仅支持动态图+纯dp（只包含allreduce通信）。
依赖主框架PR：PaddlePaddle/Paddle#70997

codecov · 2025-01-20T04:23:04Z

Codecov Report

Attention: Patch coverage is 6.45161% with 29 lines in your changes missing coverage. Please review.

Project coverage is 52.20%. Comparing base (13053a7) to head (927d878).
Report is 13 commits behind head on develop.

❗ Current head 927d878 differs from pull request most recent head bbbeb62

Please upload reports for the commit bbbeb62 to get more accurate results.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/llama/modeling_auto.py	7.14%	26 Missing ⚠️
paddlenlp/trainer/auto_trainer.py	0.00%	3 Missing ⚠️

❌ Your patch check has failed because the patch coverage (6.45%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (52.20%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9796      +/-   ##
===========================================
+ Coverage    52.06%   52.20%   +0.14%     
===========================================
  Files          734      730       -4     
  Lines       116591   115836     -755     
===========================================
- Hits         60703    60475     -228     
+ Misses       55888    55361     -527

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zhiqiu · 2025-02-05T03:42:24Z

paddlenlp/transformers/llama/modeling_auto.py

+        if get_env_device() in ["npu", "mlu", "intel_hpu"]:
+            x = paddle.to_tensor(0.0, dtype="float32")
+            y = paddle.to_tensor(paddle.finfo(dtype).min, dtype="float32")
+            expanded_attn_mask = paddle.where(expanded_attn_mask.cast("bool"), x, y).astype(dtype)
+        elif get_env_device() == "xpu":
+            x = paddle.to_tensor(0.0, dtype="float32")
+            y = paddle.to_tensor(-1.7005809656952787e38, dtype="float32")
+            expanded_attn_mask = paddle.where(expanded_attn_mask.cast("bool"), x, y)
+        elif get_env_device() == "gcu":
+            min_val = paddle.finfo(dtype).min
+            x = paddle.to_tensor(0.0, dtype=dtype)
+            y = paddle.to_tensor(min_val, dtype=dtype)
+            expanded_attn_mask = paddle.where(expanded_attn_mask.cast("bool"), x, y).astype(dtype)


what's mask generation differs between different devices.

The mask generation logic is same as here: https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/llama/modeling.py#L1606.

For the following two reasons that XPU needs a different mask:

The flash_attention kernel implemented in XPU is different than in GPU, which may lead to numeric overflow when the mask value is too small. Therefore, a specific mask number -1.7005809656952787e38 is needed. @runzhech is fixing this issue and we can use paddle.finfo(dtype).min like GPU after fixed.

The flash_attention kernel in XPU requires the mask input to be float32，so the astype(dtype) cannot be added in XPU mask generation.

See these two PRs for more details: #9495, #9652

Support XPU for auto-paralllel LLaMa

859900e

From00 added 3 commits January 29, 2025 15:18

Update

925210d

Update

11ae203

Update

3f4639c

zhiqiu reviewed Feb 5, 2025

View reviewed changes

From00 added 2 commits February 5, 2025 16:04

Update

927d878

Fix CI errors

bbbeb62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support XPU for auto-paralllel LLaMa #9796

Support XPU for auto-paralllel LLaMa #9796

From00 commented Jan 20, 2025 •

edited

Loading

codecov bot commented Jan 20, 2025 •

edited

Loading

zhiqiu Feb 5, 2025

From00 Feb 5, 2025 •

edited

Loading

Support XPU for auto-paralllel LLaMa #9796

Are you sure you want to change the base?

Support XPU for auto-paralllel LLaMa #9796

Conversation

From00 commented Jan 20, 2025 • edited Loading

PR types

PR changes

Description

codecov bot commented Jan 20, 2025 • edited Loading

Codecov Report

zhiqiu Feb 5, 2025

Choose a reason for hiding this comment

From00 Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

From00 commented Jan 20, 2025 •

edited

Loading

codecov bot commented Jan 20, 2025 •

edited

Loading

From00 Feb 5, 2025 •

edited

Loading