-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Autogptq checkpoint conversion support #82
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should discuss the converters design in more detail with Rob and Michael. I like the approach of modular transformations but I think it would be ideal if this could more closely match the compressors design and we could simplify the code. The end goal would be making it easy for a 3rd party to implement a new converter
|
||
|
||
class ConverterNames(Enum): | ||
EXLLAMA_TO_COMPRESSED_TENSOR = "exllama_to_compressed_tensor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "to_compress_tensor" part of this can be removed, makes it to wordy. We can assume that we are always going from another format to compressed tensors
|
||
filepath_: Path = Path(filepath) | ||
if not save_dir: | ||
save_dir = "compressed_tensors_model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could this be a constant at the top of the file rather than in the function?
save_dir_: Path = Path(save_dir) | ||
save_dir_.mkdir(exist_ok=True, parents=True) | ||
|
||
metadata = {"format": "pt", "source": "Created by SparseML"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change this to created by compressed tensors instead
for file in filepath_.glob("*.safetensors"): | ||
_LOGGER.info(f"Loading file: {file}") | ||
new_state_dict = {} | ||
state_dict: Iterable[StateDictType] = load_safetensors_state_dict( | ||
file, by_layers=True | ||
) | ||
for layer_state_dict in state_dict: | ||
new_state_dict.update( | ||
cls.translate(state_dict=layer_state_dict, **kwargs) | ||
) | ||
|
||
if new_state_dict: | ||
save_file( | ||
new_state_dict, | ||
filename=save_dir_ / file.name, | ||
metadata=metadata, | ||
) | ||
_copy_non_safetensor_files_(filepath_, save_dir_) | ||
_update_quantization_config(filepath_, save_dir_) | ||
|
||
elif filepath_.is_file(): | ||
new_state_dict = {} | ||
state_dict: Iterable[StateDictType] = load_safetensors_state_dict( | ||
file, by_layers=True | ||
) | ||
for layer_state_dict in state_dict: | ||
new_state_dict.update(cls.translate(state_dict=layer_state_dict)) | ||
|
||
save_file( | ||
new_state_dict, save_path=save_dir_ / filepath_.name, metadata=metadata | ||
) | ||
|
||
return str(save_dir_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of this logic can be simplified by using the existing get_weight_mappings
helper in compressed_tensors along with safetensors .get_weight() function
for layer_state_dict in state_dict: | ||
new_state_dict.update(cls.translate(state_dict=layer_state_dict)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this applied one parameter at a time? I thought that a translation would happen for the whole state_dict not a single layer. There could be a case where a translation needs multiple keys to be completed
for file in source_dir.glob("*"): | ||
if file.suffix != ".safetensors": | ||
_LOGGER.info(f"Copying file: {file} to {dest_dir}") | ||
shutil.copy(file, dest_dir / file.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this will copy over "model.safetensors.index.json", not sure if this is desired
:param source_dir: The directory containing the original config.json file | ||
:param dest_dir: The directory to save the updated config.json file | ||
""" | ||
from sparseml.transformers import SparseAutoConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely not be importing from sparseml in this repo, can we just use AutoConfig?
shutil.copy(file, dest_dir / file.name) | ||
|
||
|
||
def _update_quantization_config(source_dir: Path, dest_dir: Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we rename this fn to clarify more what it does? "update" feels too generic when what we are actually doing is removing the quantization config
def load_safetensors_state_dict( | ||
file_path: str, by_layers: bool = True | ||
) -> Iterator[Tuple[str, Dict[str, torch.Tensor]]]: | ||
""" | ||
Load a safetensors file from disk | ||
|
||
:param file_path: path to the safetensors file | ||
:param by_layers: if True, return a iterator with dictionary of safetensors | ||
data by layers | ||
:return: Iterator of dictionary of safetensors data or iterator of | ||
dictionaries by layers | ||
""" | ||
with safe_open(file_path, framework="pt", device="cpu") as f: | ||
if by_layers: | ||
current_layer = None | ||
layer_data = {} | ||
for key in sorted(f.keys()): | ||
layer_name, param_name = key.split(".", 1) | ||
if current_layer is None: | ||
current_layer = layer_name | ||
elif layer_name != current_layer: | ||
yield current_layer, layer_data | ||
current_layer = layer_name | ||
layer_data = {} | ||
layer_data[key] = f.get_tensor(key) | ||
if layer_data: | ||
yield layer_data | ||
else: | ||
yield {key: f.get_tensor(key) for key in f.keys()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed over the existing helper functions in safetensors_load and the safetensors .get_weights function. If we do still need it we should move it to safetensors_load.py
def is_gptq_quantization_target(key: str) -> bool: | ||
""" | ||
Assumes self_attn and mlp are the only quantization targets | ||
in model layers of the state_dict. | ||
:param key: The key of the state_dict | ||
:return: True if the key is a quantization target, False otherwise | ||
""" | ||
return "model.layers" in key and ("self_attn" in key or "mlp" in key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we keying off of names here? I would think we want to instead check whether specific quantization keys exist in the state_dict like we do for the compressors. You could use the get_nested_weight_mappings
helper
97b8e15
to
282d96e
Compare
282d96e
to
f45b4a1
Compare
Add tests for the same
f45b4a1
to
ed95dfc
Compare
Assigning to myself for now to tackle after the reactors for observers and lifecycle are completed and ideally handle AutoAWQ as well as filed in #153 |
Fixes #153