-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error loading Llama-2-70b gptq weights from local directory #728
Comments
Yep, working as expected and getting coherent outputs |
you can just replace def _get_gptq_params(self) -> Tuple[int, int]:
return self.gptq_bits, self.gptq_groupsize because these attribs are set in advance by def _set_gptq_params(self, model_id):
p = Path(model_id)/'quantize_config.json'
try:
if p.exists(): data = json.loads(p.read_text())
else:
filename = hf_hub_download(model_id, filename="quantize_config.json")
with open(filename, "r") as f:
data = json.load(f)
self.gptq_bits = data["bits"]
self.gptq_groupsize = data["group_size"]
except Exception as e:
raise beacuse hf_hub_download doesn't work with local model IDs |
This should have fixed it: Can you confirm? ( |
Can confirm this is working with the latest docker image now |
Does your local mode have Currently TGI expects:
|
Ok I will close this, and we can move the discussion over #766 |
how to get quantization_config.json |
System Info
Docker deployment version 0.9.4
Hardware: AWS g5.12xlarge
Information
Tasks
Reproduction
Running using docker-compose with the following compose file:
and the following env variables in the tgi.env file:
Which gives the following error:
Expected behavior
Expect the model to load correctly.
I did a little digging into where the error was happening and I can see it's when it tries to load the gptq config settings in the
_get_gptq_params
method inthe
server/text_generation_server/utils/weights.py
file.I'm not entirely sure why it doesn't seem to pick up these settings from the local dir as the quantize_config.json file does exist there.
I modified the
_get_gptq_params
method to revert to getting this from env variables if it errors (see below) as was the case before this last release. I rebuilt the image and this seems to successfully load the modelThe text was updated successfully, but these errors were encountered: