Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3 had its config.json changed by llm-compressor #81

Closed
Lin-K76 opened this issue Aug 12, 2024 · 2 comments
Closed

Phi-3 had its config.json changed by llm-compressor #81

Lin-K76 opened this issue Aug 12, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Lin-K76
Copy link
Collaborator

Lin-K76 commented Aug 12, 2024

Describe the bug
A clear and concise description of what the bug is.

Using llm-compressor to quantize Phi-3-medium-128k-instruct changes the config.json. Specifically, "rope_scaling" is changed from "type": "su" to "type": "longrope". This is not a problem with Phi-3-mini-128k-instruct for some reason (although the type is also changed), but for Phi-3-medium, this prevents evaluation of the model through lm eval harness.

To fix this issue, simply change the "type" in config back to "su".

The recipe I use is below:

recipe = """
quant_stage:
    quant_modifiers:
        QuantizationModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 8
                        type: float
                        strategy: channel
                        dynamic: false
                        symmetric: true
                    input_activations:
                        num_bits: 8
                        type: float
                        strategy: token
                        dynamic: true
                        symmetric: true
                    targets: ["Linear"]
"""

Expected behavior

Successful evaluation through lm eval harness.

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 18.04]: Ubuntu 22.04.4
  2. Python version [e.g. 3.7]: 3.11.9
  3. LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: v0.1.0
  4. ML framework version(s) [e.g. torch 1.7.1]: torch 2.4.0, transformers 4.40.2
  5. Other Python package versions [e.g. SparseZoo, DeepSparse, numpy, ONNX]: lm_eval 0.4.3
  6. Other relevant environment information [e.g. hardware, CUDA version]: CUDA 12.5

To Reproduce
Exact steps to reproduce the behavior:

Install lm eval harness, vllm, and llm-compressor. Use the recipe above (https://github.com/vllm-project/llm-compressor/blob/main/examples/big_model_offloading/big_model_fp8.py) to quantize Phi-3-medium-128k-instruct (https://huggingface.co/microsoft/Phi-3-medium-128k-instruct).

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

Tagging @robertgshaw2-neuralmagic

@Lin-K76 Lin-K76 added the bug Something isn't working label Aug 12, 2024
@robertgshaw2-neuralmagic
Copy link
Collaborator

@horheynm can you take a look at this?

horheynm added a commit that referenced this issue Aug 20, 2024
* fix

* set default trust_remote_code to False

* compatible w recent changes

* update multi gpu code

* lint
@robertgshaw2-neuralmagic
Copy link
Collaborator

@horheynm can this be closed?

markmc pushed a commit to markmc/llm-compressor that referenced this issue Nov 13, 2024
* fix serialization

* unit test fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants