Skip to content

Commit

Permalink
Update main README.md with more current float8 speedup (#816)
Browse files Browse the repository at this point in the history
  • Loading branch information
vkuzo authored and HDCharles committed Sep 8, 2024
1 parent d26bcb6 commit b2f4cfa
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ model = qat_quantizer.convert(model)

[torchao.float8](torchao/float8) implements training recipes with the scaled float8 dtypes, as laid out in https://arxiv.org/abs/2209.05433.

With ``torch.compile`` on, initial results show throughput speedups of up to **1.2x on small scale (8 GPUs) LLaMa pretraining jobs**. And you can validate the benchmarks [here](./torchao/float8/README.md#benchmarking)
With ``torch.compile`` on, current results show throughput speedups of up to **1.5x on 128 H100 GPU LLaMa 3 70B pretraining jobs** ([details](https://dev-discuss.pytorch.org/t/enabling-float8-all-gather-in-fsdp2/2359))

```python
from torchao.float8 import convert_to_float8_training
Expand Down

0 comments on commit b2f4cfa

Please sign in to comment.