-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marlin24 Compressor #77
Conversation
|
||
for name, value in tqdm(model_state.items(), desc="Compressing model"): | ||
if name.endswith(weight_suffix): | ||
prefix = name[: -(len(weight_suffix))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix = name[: -(len(weight_suffix))] | |
prefix = name.replace(weight_suffix) |
More readable :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the replace code runs the risk of replacing ".weight" if it exists elsewhere in the string, not likely but possible
|
||
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bfineran can we have a license attached to this code if it's largely copied over from an open-source project (vllm), even though "our" Alex is the author?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I think this got added in automatically when I ran make style, is there a way to turn that off for a certain file??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can skip it with `neuralmagic: no copyright. it's an apache license though so I think usage should be pretty permissable
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>
I like the design and how simple the implementation is. I didnt check the math and bit operations but other than that green from me |
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>
|
||
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can skip it with `neuralmagic: no copyright. it's an apache license though so I think usage should be pretty permissable
Implements Marlin24 compression in SparseML. Note that this compressor does not have a decompress method, it is meant only for feeding directly into vllm.
Example Usage
Since this is a one-way conversion for vLLM, we won't automatically convert to this format when
save_compressed
is set in SparseML. Instead the save will have to be done explicitly like this:Testing
Tested with a 7b Llama 2 model in vLLM, confirmed outputs for the
marlin-24
format match the ouputs for the equivalentpack-quantized
model. Also added some unit tests to compressed-tensors