[Quantization + FSDP] Support `quantize_()` for DTensor #803

gau-nernst · 2024-09-04T02:32:33Z

While trying out INT8 mixed precision pretraining (#748) with torchtitan, I came across an issue that if the model is FSDP-sharded, quantize_() won't work. The fix would be adding an extra logic to handle DTensor, similar to what FP8 is doing

ao/torchao/float8/float8_tensor.py

Lines 161 to 183 in f5703b0

    
           if isinstance(bits_fp8, DTensor): 
        
               assert isinstance( 
        
                   scale, DTensor 
        
               ), "Expected Float8 scale to be a DTensor if bits_fp8 is a DTensor" 
        
               bits_mesh = bits_fp8.device_mesh 
        
               bits_placements = bits_fp8.placements 
        
               local_bits = bits_fp8.to_local() 
        
               local_scale = scale.to_local() 
        
               inner_float8_tensor = Float8Tensor( 
        
                   local_bits, 
        
                   local_scale, 
        
                   tensor.dtype, 
        
                   linear_mm_config=linear_mm_config, 
        
                   gemm_input_role=gemm_input_role, 
        
               ) 
        
               return DTensor.from_local( 
        
                   inner_float8_tensor, 
        
                   bits_mesh, 
        
                   bits_placements, 
        
                   run_check=False, 
        
                   shape=bits_fp8.size(), 
        
                   stride=bits_fp8.stride(), 
        
               )

The text was updated successfully, but these errors were encountered:

msaroufim · 2024-09-04T05:29:39Z

Yeah this came up in some discussions with inference providers like SGLang as well

* [Android] Fix scripts and app * Use prebuilt aar instead of jar+jni * Fix a crash when llama directory is not created * Export model in scripts/android_example.sh * Run instrumented test in scripts/android_example.sh * Fix script * Update docs

supriyar · 2025-01-09T05:12:49Z

@jerryzh168 @kwen2501 is this addressed now with quantize + distributed inference composability work?

jerryzh168 · 2025-01-09T06:07:47Z

this is not addressed yet, this is training use case I think, that we can explore in 2025 H1 together with @vkuzo, we do need a guide on how DTensor composes with quantization in both inference and training use cases

gau-nernst mentioned this issue Sep 13, 2024

Update quantization to use tensor subclasses pytorch/torchtune#1403

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization + FSDP] Support `quantize_()` for DTensor #803

[Quantization + FSDP] Support `quantize_()` for DTensor #803

gau-nernst commented Sep 4, 2024

msaroufim commented Sep 4, 2024

supriyar commented Jan 9, 2025

jerryzh168 commented Jan 9, 2025

[Quantization + FSDP] Support quantize_() for DTensor #803

[Quantization + FSDP] Support quantize_() for DTensor #803

Comments

gau-nernst commented Sep 4, 2024

msaroufim commented Sep 4, 2024

supriyar commented Jan 9, 2025

jerryzh168 commented Jan 9, 2025

[Quantization + FSDP] Support `quantize_()` for DTensor #803

[Quantization + FSDP] Support `quantize_()` for DTensor #803