-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantization + FSDP] Support quantize_()
for DTensor
#803
Comments
Yeah this came up in some discussions with inference providers like SGLang as well |
yanbing-j
pushed a commit
to yanbing-j/ao
that referenced
this issue
Dec 9, 2024
* [Android] Fix scripts and app * Use prebuilt aar instead of jar+jni * Fix a crash when llama directory is not created * Export model in scripts/android_example.sh * Run instrumented test in scripts/android_example.sh * Fix script * Update docs
@jerryzh168 @kwen2501 is this addressed now with quantize + distributed inference composability work? |
this is not addressed yet, this is training use case I think, that we can explore in 2025 H1 together with @vkuzo, we do need a guide on how DTensor composes with quantization in both inference and training use cases |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While trying out INT8 mixed precision pretraining (#748) with torchtitan, I came across an issue that if the model is FSDP-sharded,
quantize_()
won't work. The fix would be adding an extra logic to handle DTensor, similar to what FP8 is doingao/torchao/float8/float8_tensor.py
Lines 161 to 183 in f5703b0
The text was updated successfully, but these errors were encountered: