Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

Open
HengJayWang opened this issue Oct 19, 2024 · 2 comments
Open

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

HengJayWang opened this issue Oct 19, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@HengJayWang
Copy link

HengJayWang commented Oct 19, 2024

👋 Hello Neural Magic community developers,

I encountered an issue while calculating the perplexity for a locally converted Llama3-8B sparse model using the llm-compress library. I'm refer the sparse conversion example script and change model to meta-llama/Meta-Llama-3-8B-Instruct by my self, the sparse conversion need ~ 1.2 hours to finish.
Here’s a detailed breakdown:

Describe the bug
While trying to compute the WikiText2 Perplexity for a Llama3-8B model that has been sparsified (load local model from disk), the resulting perplexity values always turn out to be NaN. I suspect that some configurations might not be properly set when using the custom SparseAutoModelForCausalLM class in combination with the compressed-tensors library.

Expected behavior
I expected the perplexity values to be reasonable and comparable to the official Hugging Face models. For example, when testing with the standard Llama-3.2-3B model from Hugging Face (without sparsification), I got a perplexity of around ~8.8 with the following parameters:

•	max_length=16K
•	stride=1, 2, 4, 8, 16K

I expected similar results for the sparse model, not NaN values.

Environment
I use RunPod online env with A100-80GB-SXM *2

To Reproduce
Steps to reproduce the behavior:

1.	Convert the Llama3-8B model using llm-compress to a sparse version.
2.	Load the sparse model using **_SparseAutoModelForCausalLM_** (same process [here](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_24_sparse_w4a16) ) and set up the environment to calculate perplexity.
3.	Run perplexity calculation on WikiText2 dataset following Hugging Face’s [official perplexity guide](https://huggingface.co/docs/transformers/perplexity), but using the custom sparse model.
4.	Observe the NaN perplexity values in the output.

Errors
Here’s the output I receive when running the perplexity calculation, see the attachment image. The perplexity of local Llama-8B model (load by SparseAutoModelForCausalLM class) always be NaN value. Test with Llama-3B model (load by AutoModelForCausalLM class) can successfully get ppl value.

Sparse Llama 8B (load by SparseAutoModelForCausalLM class) : ppl will be NaN

LoadSparseLlama8BModel

PerplexityNaNOfSparseLlama8B

Load Online Llama 3B (load by AutoModelForCausalLM class) : successfully get ppl value

LoadLlama3BModel

PerplexityOfLlama3B

Additional context
The same perplexity calculation process works perfectly when using the Hugging Face Llama-3.2-3B model without sparsification, which gives a perplexity value of ~8.8. I believe the issue lies in either the custom sparse model class or the integration with compressed-tensors. Maybe I miss some additional configuration/setting of Sparse model ? 🧐
Any guidance on this would be appreciated! 🥰

Additional Question
How to load the final quantization model (i.e the model be saved in stage_quantization folder) correctly ?
I also interest of ppl of final quantization model, but I try load with SparseAutoModelForCausalLM it can not be work 😢
it shows some message mean : "... ... class not support ..."
So how to load the final quantization model correctly ? any documentation can be refer ? 🙏🏼

@HengJayWang HengJayWang added the bug Something isn't working label Oct 19, 2024
@robertgshaw2-neuralmagic
Copy link
Collaborator

robertgshaw2-neuralmagic commented Oct 21, 2024

Can you share the model and perhaps some text output from the model? Does the text look reasonable?

@HengJayWang
Copy link
Author

Hi @robertgshaw2-neuralmagic Robert, you were right to question this. I retested the original llama-7B Sparse conversion example from llm-compressor today, along with a simple model.generate test to check the model's text output. It turns out the model doesn’t seem to generate any correct outputs, and as expected, I couldn’t calculate the model’s perplexity under these circumstances.

Load local Sparse Llama-7B model

Sparse-Llama2-7B-load

Test Model Output (Ref)

Sparse-Llama2-7B-text-output

Calculating Perplexity

Sparse-Llama2-7B-ppl-calculating

NaN Result

Sparse-Llama2-7B-ppl-result

I think the issue is now clearer. I believe the problem lies in how I load the local Sparse Model & Tokenizer. Does llm-compressor have any examples or documentation I can refer to? Any suggestions would be appreciated, thank you! 🥰

Also, I apologize for not providing the exact sparse model I used. After running it in the online RunPod environment, I didn’t download the model. However, this process should be easy to replicate. Here are the steps I followed for testing:

Step 1: Execute the official llama-7B sparse conversion example from llm-compressor : run python llama7b_sparse_w4a16.py
Step 2: After about an hour, the sparse conversion finishes, and you’ll find the model saved in three stages in the output folder output_llama7b_2:4_w4a16_channel and I rename to output_llama7b_2_4_w4a16_channel for easy use.
Step 3: Load the stage_finetuing sparse model and Tokenizer in output_llama7b_2_4_w4a16_channel/stage_finetuning, and follow the HuggingFace process to calculate perplexity"

##The Success Case with Llama3-3B online model
Llama3-3B-load

Test Model Output

Llama3-3B-text-output

Calculating Perplexity

Llama3-3B-ppl-calculating

Result

Llama3-3B-ppl-result

Summary

I want to correctly load the local sparse model and calculate its perplexity as an evaluation metric. However, it seems that I haven’t used the correct method to load the model (through the SparseAutoModelForCausalLM class) or the Tokenizer. If there are any documents or resources I can refer to, please let me know. Thanks! 🥰

And my testing jupyter notebook is in attatchment.
Perplexity of model.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants