Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug #170

Merged
merged 2 commits into from
Sep 26, 2024
Merged

fix bug #170

merged 2 commits into from
Sep 26, 2024

Conversation

horheynm
Copy link
Member

@horheynm horheynm commented Sep 26, 2024

Before:
Bug on running from llm-compressor

from transformers import AutoTokenizer, AutoModelForCausalLM
from llmcompressor.transformers import SparseAutoModelForCausalLM


MODEL_ID = "nm-testing/Meta-Llama-3-8B-Instruct-fp8-hf_compat"
#MODEL_ID = "/home/dsikka/llm-compressor/examples/quantization_w4a16/new_quant_format"
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="cuda")
"""
model = SparseAutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="cuda",
)
"""


tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids.to("cuda"), max_new_tokens=100)
print(tokenizer.decode(output[0]))

Error:

Traceback (most recent call last):
  File "/home/dsikka/llm-compressor/examples/quantization_w4a16/run_script.py", line 19, in <module>
    output = model.generate(input_ids.to("cuda"), max_new_tokens=100)
  File "/home/dsikka/venv/hf_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/dsikka/venv/hf_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2048, in generate
    result = self._sample(
  File "/home/dsikka/venv/hf_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 3044, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

After:
Error is caused by not passing in force_zero_point, which may populate zero point to null.
Hence causes fake quant-ed value to be null

Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thank you!

@horheynm horheynm merged commit a852897 into main Sep 26, 2024
1 check passed
@horheynm horheynm deleted the bug-null-zp branch September 26, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants