Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Autogptq checkpoint conversion support #82

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Jun 13, 2024

Fixes #153

Copy link

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss the converters design in more detail with Rob and Michael. I like the approach of modular transformations but I think it would be ideal if this could more closely match the compressors design and we could simplify the code. The end goal would be making it easy for a 3rd party to implement a new converter



class ConverterNames(Enum):
EXLLAMA_TO_COMPRESSED_TENSOR = "exllama_to_compressed_tensor"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "to_compress_tensor" part of this can be removed, makes it to wordy. We can assume that we are always going from another format to compressed tensors


filepath_: Path = Path(filepath)
if not save_dir:
save_dir = "compressed_tensors_model"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could this be a constant at the top of the file rather than in the function?

save_dir_: Path = Path(save_dir)
save_dir_.mkdir(exist_ok=True, parents=True)

metadata = {"format": "pt", "source": "Created by SparseML"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to created by compressed tensors instead

Comment on lines 84 to 138
for file in filepath_.glob("*.safetensors"):
_LOGGER.info(f"Loading file: {file}")
new_state_dict = {}
state_dict: Iterable[StateDictType] = load_safetensors_state_dict(
file, by_layers=True
)
for layer_state_dict in state_dict:
new_state_dict.update(
cls.translate(state_dict=layer_state_dict, **kwargs)
)

if new_state_dict:
save_file(
new_state_dict,
filename=save_dir_ / file.name,
metadata=metadata,
)
_copy_non_safetensor_files_(filepath_, save_dir_)
_update_quantization_config(filepath_, save_dir_)

elif filepath_.is_file():
new_state_dict = {}
state_dict: Iterable[StateDictType] = load_safetensors_state_dict(
file, by_layers=True
)
for layer_state_dict in state_dict:
new_state_dict.update(cls.translate(state_dict=layer_state_dict))

save_file(
new_state_dict, save_path=save_dir_ / filepath_.name, metadata=metadata
)

return str(save_dir_)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this logic can be simplified by using the existing get_weight_mappings helper in compressed_tensors along with safetensors .get_weight() function

Comment on lines +109 to +132
for layer_state_dict in state_dict:
new_state_dict.update(cls.translate(state_dict=layer_state_dict))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this applied one parameter at a time? I thought that a translation would happen for the whole state_dict not a single layer. There could be a case where a translation needs multiple keys to be completed

Comment on lines 172 to 186
for file in source_dir.glob("*"):
if file.suffix != ".safetensors":
_LOGGER.info(f"Copying file: {file} to {dest_dir}")
shutil.copy(file, dest_dir / file.name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this will copy over "model.safetensors.index.json", not sure if this is desired

:param source_dir: The directory containing the original config.json file
:param dest_dir: The directory to save the updated config.json file
"""
from sparseml.transformers import SparseAutoConfig
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely not be importing from sparseml in this repo, can we just use AutoConfig?

shutil.copy(file, dest_dir / file.name)


def _update_quantization_config(source_dir: Path, dest_dir: Path):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we rename this fn to clarify more what it does? "update" feels too generic when what we are actually doing is removing the quantization config

Comment on lines 196 to 271
def load_safetensors_state_dict(
file_path: str, by_layers: bool = True
) -> Iterator[Tuple[str, Dict[str, torch.Tensor]]]:
"""
Load a safetensors file from disk

:param file_path: path to the safetensors file
:param by_layers: if True, return a iterator with dictionary of safetensors
data by layers
:return: Iterator of dictionary of safetensors data or iterator of
dictionaries by layers
"""
with safe_open(file_path, framework="pt", device="cpu") as f:
if by_layers:
current_layer = None
layer_data = {}
for key in sorted(f.keys()):
layer_name, param_name = key.split(".", 1)
if current_layer is None:
current_layer = layer_name
elif layer_name != current_layer:
yield current_layer, layer_data
current_layer = layer_name
layer_data = {}
layer_data[key] = f.get_tensor(key)
if layer_data:
yield layer_data
else:
yield {key: f.get_tensor(key) for key in f.keys()}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed over the existing helper functions in safetensors_load and the safetensors .get_weights function. If we do still need it we should move it to safetensors_load.py

Comment on lines +40 to +47
def is_gptq_quantization_target(key: str) -> bool:
"""
Assumes self_attn and mlp are the only quantization targets
in model layers of the state_dict.
:param key: The key of the state_dict
:return: True if the key is a quantization target, False otherwise
"""
return "model.layers" in key and ("self_attn" in key or "mlp" in key)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we keying off of names here? I would think we want to instead check whether specific quantization keys exist in the state_dict like we do for the compressors. You could use the get_nested_weight_mappings helper

@rahul-tuli rahul-tuli force-pushed the autogptq-checkpoint-support branch from 97b8e15 to 282d96e Compare June 27, 2024 14:15
@rahul-tuli rahul-tuli force-pushed the autogptq-checkpoint-support branch from 282d96e to f45b4a1 Compare June 27, 2024 14:16
Add tests for the same
@rahul-tuli rahul-tuli force-pushed the autogptq-checkpoint-support branch from f45b4a1 to ed95dfc Compare June 27, 2024 14:17
@markurtz markurtz self-assigned this Oct 18, 2024
@markurtz
Copy link
Member

Assigning to myself for now to tackle after the reactors for observers and lifecycle are completed and ideally handle AutoAWQ as well as filed in #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does it support importing the AWQ model and then exporting it in the compressed-tensor format?
3 participants