[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale #5353

mgoin · 2024-06-07T22:12:56Z

BREAKING CHANGE: Because there can be input and output scales (kv_scale is an example of an "output" scale for k_proj+v_proj), the current usage of act_scale is quite vague. We would like to be explicit here to properly support future formats, so we are proposing to change act_scale —> input_scale.

Obviously this is going to be a breaking change for the checkpoints we have currently and against previous releases of vLLM. We think this is the right time to make such a change before we “finalize” the beta with v0.5.0 next week.

Here is a script for rewriting checkpoints with act_scale to use the new input_scale format:

import safetensors.torch
import json
import os
import argparse

def rename_tensors_in_directory(directory):

    for filename in os.listdir(directory):
        # Handle safetensors index files
        if filename.endswith('.safetensors.index.json'):
            # Load the index file
            index_file_path = os.path.join(directory, filename)
            print(f"Updating keys in {index_file_path}")
            with open(index_file_path, 'r') as f:
                index = json.load(f)

            # Rename index
            renamed_index = {}
            for name, location in index.items():
                new_name = name.replace('act_scale', 'input_scale')
                renamed_index[new_name] = location
            
            # Write the new index file
            with open(index_file_path, 'w') as f:
                json.dump(renamed_index, f, indent=2)

        # Handle safetensors files with data
        elif filename.endswith('.safetensors'):
            # Load the tensors from the safetensors file
            data_file_path = os.path.join(directory, filename)
            print(f"Updating keys in {data_file_path}")
            tensors = safetensors.torch.load_file(data_file_path)
            
            # Rename tensors
            renamed_tensors = {}
            for name, tensor in tensors.items():
                new_name = name.replace('act_scale', 'input_scale')
                renamed_tensors[new_name] = tensor
            
            # Save the modified tensors to the same safetensors file
            safetensors.torch.save_file(renamed_tensors, data_file_path)
        
        else:
            skipped_file_path = os.path.join(directory, filename)
            print(f"Skipping {skipped_file_path}")
        
    print(f"Tensors renamed and overwritten in the directory {directory}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Rename act_scale tensors to input_scale in safetensors files.')
    parser.add_argument('directory', type=str, help='The directory containing the safetensors files and index file.')

    args = parser.parse_args()
    rename_tensors_in_directory(args.directory)

vllm/model_executor/layers/quantization/fp8.py

comaniac

LGTM

robertgshaw2-redhat · 2024-06-08T17:06:06Z

Thanks Michael!

…_scale (vllm-project#5353)

Change FP8 checkpoint format from act_scale -> input_scale

a4c438f

tlrmchlsmth approved these changes Jun 7, 2024

View reviewed changes

Format

bdb5300

pcmoritz reviewed Jun 7, 2024

View reviewed changes

vllm/model_executor/layers/quantization/fp8.py Outdated Show resolved Hide resolved

pcmoritz reviewed Jun 7, 2024

View reviewed changes

vllm/model_executor/layers/quantization/fp8.py Outdated Show resolved Hide resolved

comaniac approved these changes Jun 7, 2024

View reviewed changes

pcmoritz approved these changes Jun 7, 2024

View reviewed changes

mgoin added 5 commits June 7, 2024 19:49

Update fp8.py

7fa4086

Update fp8.py

6e5754d

Poke

f31ef0c

Merge branch 'upstream-main' into fp8-input-scale

809faa1

Fix comment

cbb8d78

mgoin merged commit c09dade into vllm-project:main Jun 8, 2024
103 checks passed

mgoin deleted the fp8-input-scale branch June 8, 2024 17:54

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 9, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

0cea2c2

…_scale (vllm-project#5353)

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jun 10, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

48f0e31

…_scale (vllm-project#5353)

mgoin mentioned this pull request Jun 10, 2024

[Model][Hardware][NV] Add support for ModelOpt static scaling checkpoints #5387

Closed

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

f8fe956

…_scale (vllm-project#5353)

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

227f85f

…_scale (vllm-project#5353)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

91b0e2a

…_scale (vllm-project#5353)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

1b430b6

…_scale (vllm-project#5353)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

77891b8

…_scale (vllm-project#5353)

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input…

8f8e2ed

…_scale (vllm-project#5353)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale #5353

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale #5353

mgoin commented Jun 7, 2024 •

edited

Loading

comaniac left a comment

robertgshaw2-redhat commented Jun 8, 2024

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale #5353

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale #5353

Conversation

mgoin commented Jun 7, 2024 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

robertgshaw2-redhat commented Jun 8, 2024

mgoin commented Jun 7, 2024 •

edited

Loading