[NVIDIA] Use custom grad accumulation for FP8 params #3623

kaixih · 2024-01-16T19:21:23Z

This pull request introduces a custom data type rule for the FP8 parameters to implement custom gradient accumulation. Specifically, when reusing the FP8 parameters, the autograd will accumulate their gradients. In this case, we aim for the accumulation to be a maximum operation instead of the default addition operation.

kaixih · 2024-01-16T19:24:03Z

cc. @nluehr @reedwm @zhangqiaorjc

kaixih · 2024-01-16T19:26:12Z

Also, cc. @mingxu1067

kaixih · 2024-01-16T22:03:21Z

Note, the failed tests on ValueError: Cannot convert_element_type to dtype=fp8_meta32 is probably because we might have utilized some feature only available from the jax nightly (0.4.24.devXXXX). Currently, Jax is on its 0.4.23. Maybe @mattjj knows better.

cgarciae · 2024-01-18T11:53:35Z

@kaixih it looks good but we will have to wait for JAX to push a new release to pypi for tests to pass (according to your comment).

cgarciae · 2024-02-06T14:12:58Z

@kaixih thanks! Seems that pytest is still broken.

zhangqiaorjc · 2024-02-14T19:35:57Z

@kaixih do you mind fixing the CI errors?

kaixih · 2024-02-20T08:19:28Z

It seems the CI is still on jax 0.4.23 (failed test)

Requirement already satisfied: jax>=0.4.19 in ./venv/lib/python3.10/site-packages (from flax==0.8.1) (0.4.23)
Requirement already satisfied: jaxlib in ./venv/lib/python3.10/site-packages (from flax==0.8.1) (0.4.23)

I re-tested on my machine but using 0.4.24 and then the tests can pass.

pip install --no-deps jax==0.4.24
pip install --no-deps jaxlib==0.4.24

@zhangqiaorjc @cgarciae Can you help check if the jax has already been updated to 0.4.24 or later?

cgarciae · 2024-02-21T18:22:48Z

Based on the output of the test run it seems not:

====== test config =======
PYTEST_OPTS: --cov=flax --cov-report=xml --cov-report=term --cov-config=pyproject.toml
RUN_DOCTEST: false
RUN_PYTEST: true
RUN_MYPY: false
RUN_PYTYPE: false
GH_VENV: true
WHICH PYTHON: /home/runner/work/flax/flax/venv/bin/python
jax: 0.4.23
flax: 0.8.1
==========================

cgarciae · 2024-02-21T18:30:31Z

As a quick fix maybe add

venv/bin/python3 -m pip install -U jax jaxlib

after this line:

flax/.github/workflows/build.yml

Line 120 in daf06ea

venv/bin/python3 -m pip install -e .[all,testing]

kaixih · 2024-02-21T21:31:08Z

@cgarciae Do you mean I should add this line in this PR?

cgarciae · 2024-02-21T23:36:25Z

Created a PR so you can rebase when merged.

kaixih · 2024-02-26T19:29:00Z

@zhangqiaorjc It seems all tests pass now. Can you take another look or reassign? Thx.

zhangqiaorjc · 2024-02-28T18:25:23Z

i'll let @cgarciae merge this and review the follow up in praxis

zhangqiaorjc · 2024-03-02T17:07:00Z

@cgarciae is this PR merge blocked on any internal error?

kaixih mentioned this pull request Jan 16, 2024

[NVIDIA] Use custom grad accumulation for FP8 params google/praxis#40

Merged

kaixih mentioned this pull request Jan 23, 2024

Support custom FP8 dtype in Pipelined Transformer google/praxis#44

Closed

cgarciae mentioned this pull request Feb 21, 2024

Guarantee the latest JAX version on CI #3705

Merged

kaixih added 3 commits February 26, 2024 19:00

Use custom grad accum for fp8 meta params

0291303

Add the full attr

54fa392

Clean up

dd004c2

kaixih force-pushed the fp8_meta_custom_grad_accumulation branch from 3e31661 to dd004c2 Compare February 26, 2024 19:00

cgarciae approved these changes Feb 27, 2024

View reviewed changes

cgarciae added the pull ready label Feb 27, 2024

copybara-service bot merged commit e4282ee into google:main Mar 7, 2024
19 checks passed

james77777778 mentioned this pull request Apr 17, 2024

Fix gradient accumulation when using overwrite_with_gradient during float8 training keras-team/keras#19534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Use custom grad accumulation for FP8 params #3623

[NVIDIA] Use custom grad accumulation for FP8 params #3623

kaixih commented Jan 16, 2024

kaixih commented Jan 16, 2024

kaixih commented Jan 16, 2024

kaixih commented Jan 16, 2024

cgarciae commented Jan 18, 2024

cgarciae commented Feb 6, 2024

zhangqiaorjc commented Feb 14, 2024

kaixih commented Feb 20, 2024

cgarciae commented Feb 21, 2024

cgarciae commented Feb 21, 2024

kaixih commented Feb 21, 2024

cgarciae commented Feb 21, 2024

kaixih commented Feb 26, 2024

zhangqiaorjc commented Feb 28, 2024

zhangqiaorjc commented Mar 2, 2024

[NVIDIA] Use custom grad accumulation for FP8 params #3623

[NVIDIA] Use custom grad accumulation for FP8 params #3623

Conversation

kaixih commented Jan 16, 2024

kaixih commented Jan 16, 2024

kaixih commented Jan 16, 2024

kaixih commented Jan 16, 2024

cgarciae commented Jan 18, 2024

cgarciae commented Feb 6, 2024

zhangqiaorjc commented Feb 14, 2024

kaixih commented Feb 20, 2024

cgarciae commented Feb 21, 2024

cgarciae commented Feb 21, 2024

kaixih commented Feb 21, 2024

cgarciae commented Feb 21, 2024

kaixih commented Feb 26, 2024

zhangqiaorjc commented Feb 28, 2024

zhangqiaorjc commented Mar 2, 2024