[Operator] Fix embedding op backward with bf16 #309

zhzhcookie · 2024-11-21T02:18:39Z

PR Category

Operator

Type of Change

Bug Fix

Description

Fix embedding op backward with bf16. Before atomic_add(bf16), we convert both ptr and data to fp32.

Issue

Resolves 在跑通『LLaVA单卡+gems』过程总遇到的embedding反向BF16的bug #305

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

tongxin · 2024-11-22T01:18:46Z

src/flag_gems/ops/embedding.py

@@ -70,11 +70,15 @@ def embedding_backward_kernel(
    if not HAS_PADDING_IDX:
        grad_in += row_idx * N
        embedding_grad = tl.load(grad_out + cols, mask, other=0.0)
+        if tl.constexpr(embedding_grad.dtype.is_bf16()):
+            embedding_grad = embedding_grad.to(tl.float32)
        tl.atomic_add(grad_in + cols, embedding_grad, mask=mask)


Atomically updating gradients from a massive number of ctas may be a problem. Could we reduce contention with local updates in a gsl style kernel?

[Operator] Fix embedding op backward with bf16

413561b

tongxin reviewed Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Operator] Fix embedding op backward with bf16 #309

[Operator] Fix embedding op backward with bf16 #309

zhzhcookie commented Nov 21, 2024 •

edited

Loading

tongxin Nov 22, 2024 •

edited

Loading

[Operator] Fix embedding op backward with bf16 #309

Are you sure you want to change the base?

[Operator] Fix embedding op backward with bf16 #309

Conversation

zhzhcookie commented Nov 21, 2024 • edited Loading

PR Category

Type of Change

Description

Issue

Progress

Performance

tongxin Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

zhzhcookie commented Nov 21, 2024 •

edited

Loading

tongxin Nov 22, 2024 •

edited

Loading