Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
msaroufim authored Sep 2, 2024
1 parent 9d56a8c commit 2de6df0
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions torchao/prototype/low_bit_optim/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,15 @@ ao 4-bit | 33 | ~3600 | 42.34 | ~4 min

NOTE: lpmm's 4-bit AdamW does not support BF16 weights.

### Note on compile times

There are 2 approaches to compile optimizer step in low-bit optim:

1. Compile optim step for single param i.e. `torch.compile(single_param_adam)`
2. Compile optim step for all params i.e. `torch.compile(param_groups_adam)`

Currently Adam8bit and AdamFp8 use approach (2) (with static shape) since it is faster (but compile much slower), while Adam4bit uses approach (1) (with dynamic shape) since there are excessive memory usage for "Adam4bit + approach (2)". Approach (1) requires dynamic shape to avoid hitting recompiles limit.

## Optimizer CPU offload

This folder also implements optimizer CPU offload (i.e. ZeRO-Offload) for single GPU training. For multi-GPU training, you can use FSDP's built-in CPU offload.
Expand Down

0 comments on commit 2de6df0

Please sign in to comment.