Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml parsing fails with a custom mapping provided to SmoothQuantModifier recipe #105

Open
aatkinson opened this issue Aug 22, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@aatkinson
Copy link

aatkinson commented Aug 22, 2024

Using released llmcompressor 0.1.0 on python 3.11 on ubuntu 20.04

Phi3Small Instruct does not have the default weights in the mapping (q_proj, k_proj, v_proj), so I supplied my own and it failed with a yaml parsing error :( I believe my mapping should pass. The same thing happens when I pass in the default mapping

Usage:
recipe = [ SmoothQuantModifier(smoothing_strength=0.8, mappings=[[["re:.*mlp",], "re:.*input_layernorm"]]) ]
gives

2024-08-22T18:42:12.884605-0400 | pre_initialize_structure | INFO - Compression lifecycle structure pre-initialized for 0 modifiers
2024-08-22T18:42:12.884699-0400 | pre_initialize_structure | INFO - Compression lifecycle structure pre-initialized for 0 modifiers
2024-08-22T18:42:12.889040-0400 | one_shot | INFO - *** One Shot ***
2024-08-22T18:42:12.891316-0400 | from_modifiers | INFO - Creating recipe from modifiers                                      
2024-08-22T18:42:12.891912-0400 | create_instance | WARNING - Could not process input as a file path or zoo stub, attempting to process it as a string.
                                                                                                                           
Traceback (most recent call last):
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 596, in _load_json_or_yaml_string
    return json.loads(content)            
           ^^^^^^^^^^^^^^^^^^^            
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/json/__init__.py", line 346, in loads                                                                                                                                                                         
    return _default_decoder.decode(s)              
           ^^^^^^^^^^^^^^^^^^^^^^^^^^              
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/json/decoder.py", line 337, in decode                                                                                                                                                                         
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/json/decoder.py", line 355, in raw_decode                                                                                                                                                                     
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/adatkins/dev/llm_compressor_tk/try_compressor.py", line 432, in <module>
    oneshot(
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
    main(model_args, data_args, training_args)
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 359, in main
    stage_runner.one_shot()
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
    self.trainer.one_shot(calibration_data=calib_data, stage=stage)
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
    apply(
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
    return active_session().apply(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/session.py", line 210, in apply
    self.initialize(**kwargs)
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/session.py", line 156, in initialize
    mod_data = self._lifecycle.initialize(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/lifecycle.py", line 120, in initialize
    extras = self.recipe_container.update(**extras)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/container.py", line 75, in update
    recipe = Recipe.create_instance(recipe)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 114, in create_instance
    return cls.from_modifiers(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 71, in from_modifiers
    return cls.create_instance(path_or_modifiers=recipe_string)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 126, in create_instance
    obj = _load_json_or_yaml_string(path_or_modifiers)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 602, in _load_json_or_yaml_string
    raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*mlp
        - re:.*input_layernorm
    GPTQModifier:
      targets: Linear
      scheme: W8A8

I tried passing in the default mapping and got a similar issue
import llmcompressor.modifiers.smoothquant.base as llmb SmoothQuantModifier(smoothing_strength=0.8, mappings=llmb.DEFAULT_SMOOTHQUANT_MAPPINGS)
gives a similar error

  in "<unicode string>", line 6, column 9:
          - !!python/tuple
            ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/adatkins/dev/llm_compressor_tk/try_compressor.py", line 433, in <module>
    oneshot(
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
    main(model_args, data_args, training_args)
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 359, in main
    stage_runner.one_shot()
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
    self.trainer.one_shot(calibration_data=calib_data, stage=stage)
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
    apply(
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
    return active_session().apply(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/session.py", line 210, in apply
    self.initialize(**kwargs)
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/session.py", line 156, in initialize
    mod_data = self._lifecycle.initialize(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/core/lifecycle.py", line 120, in initialize
    extras = self.recipe_container.update(**extras)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/container.py", line 75, in update
    recipe = Recipe.create_instance(recipe)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 114, in create_instance
    return cls.from_modifiers(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 71, in from_modifiers
    return cls.create_instance(path_or_modifiers=recipe_string)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 126, in create_instance
    obj = _load_json_or_yaml_string(path_or_modifiers)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 602, in _load_json_or_yaml_string
    raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*q_proj
          - re:.*k_proj
          - re:.*v_proj
        - re:.*input_layernorm
      - !!python/tuple
        - - re:.*gate_proj
          - re:.*up_proj
        - re:.*post_attention_layernorm
    GPTQModifier:
      targets: Linear
      scheme: W8A8```
@aatkinson aatkinson added the bug Something isn't working label Aug 22, 2024
@robertgshaw2-neuralmagic
Copy link
Collaborator

Thanks for raising this issue. We are working on improving the SmoothQuant layermapping UX. Any suggestions you have for how to make it more intuitive would be appreciated.

For now, we have a sample of quantizing Phi3 here -> https://huggingface.co/neuralmagic/Phi-3-medium-128k-instruct-quantized.w8a8

Your recipe should look like this:

recipe = [
  SmoothQuantModifier(
    smoothing_strength=0.8,
    mappings=[
      [["re:.*qkv_proj"], "re:.*input_layernorm"],
      [["re:.*gate_up_proj"], "re:.*post_attention_layernorm"],
    ],
  ),
  GPTQModifier(
    sequential=True,
    targets="Linear",
    scheme="W8A8",
    ignore=["lm_head"],
    dampening_frac=0.01,
    observer="mse",
  )
]

@robertgshaw2-neuralmagic
Copy link
Collaborator

cc @Satrat @rahul-tuli for visibility

@aatkinson
Copy link
Author

aatkinson commented Aug 26, 2024

Thank you @robertgshaw2-neuralmagic

I tried yours and the DEFAULT_SMOOTHQUANT_MAPPINGS
Same error. Maybe I should try main rather than 0.1.0 to see if the issue persists?

File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 602, in _load_json_or_yaml_string
    raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*qkv_proj
        - re:.*input_layernorm
      - !!python/tuple
        - - re:.*gate_up_proj
        - re:.*post_attention_layernorm
    GPTQModifier:
      targets: Linear
      dampening_frac: 0.01
      ignore:
      - lm_head
      scheme: W8A8
     ```

@robertgshaw2-neuralmagic
Copy link
Collaborator

Thank you @robertgshaw2-neuralmagic

I tried yours and the DEFAULT_SMOOTHQUANT_MAPPINGS Same error. Maybe I should try main rather than 0.1.0 to see if the issue persists?

File "/home/adatkins/miniconda3/envs/quant-llmcompressor/lib/python3.11/site-packages/llmcompressor/recipe/recipe.py", line 602, in _load_json_or_yaml_string
    raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*qkv_proj
        - re:.*input_layernorm
      - !!python/tuple
        - - re:.*gate_up_proj
        - re:.*post_attention_layernorm
    GPTQModifier:
      targets: Linear
      dampening_frac: 0.01
      ignore:
      - lm_head
      scheme: W8A8
     ```

Can you post your full script so I can try to reproduce?

@caojinpei
Copy link

@robertgshaw2-neuralmagic
Hi
Could you help me check these questions when you available? Thanks a lot.
#73

@HelloCard
Copy link

same problam, I use like https://huggingface.co/neuralmagic/Phi-3-medium-128k-instruct-quantized.w8a8/blob/main/recipe.yaml, here is my script and error:

from llmcompressor.transformers import SparseAutoModelForCausalLM
from transformers import AutoTokenizer

MODEL_ID = "/root/autodl-tmp/Phi-3-medium-4k-instruct"
model = SparseAutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)





from datasets import load_dataset

NUM_CALIBRATION_SAMPLES = 2048
MAX_SEQUENCE_LENGTH = 8192

# Load and preprocess the dataset
ds = load_dataset("/root/autodl-tmp/ultrachat_200k", split="train_sft")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))

def preprocess(example):
    return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False)}
ds = ds.map(preprocess)

def tokenize(sample):
    return tokenizer(sample["text"], padding=False, max_length=MAX_SEQUENCE_LENGTH, truncation=True, add_special_tokens=False)
ds = ds.map(tokenize, remove_columns=ds.column_names)






from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier

# Configure the quantization algorithms
recipe = [
    SmoothQuantModifier(
    smoothing_strength=0.8,
    mappings=[
      [["re:.*qkv_proj"], "re:.*input_layernorm"],
      [["re:.*gate_up_proj"], "re:.*post_attention_layernorm"],
    ],
  ),
    GPTQModifier(
    sequential=True,
    targets="Linear",
    scheme="W8A8",
    ignore=["lm_head"],
    dampening_frac=0.01,
    observer="mse",
  )
]

# Apply quantization
oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)

# Save the compressed model
SAVE_DIR = MODEL_ID.split("/")[1] + "-W8A8-Dynamic-Per-Token"
model.save_pretrained(SAVE_DIR, save_compressed=True)
tokenizer.save_pretrained(SAVE_DIR)
root@autodl-container-542540859a-97df04fc:~/autodl-tmp# python3 quant.py 
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:08<00:00,  1.42s/it]
2024-10-06T15:22:43.299228+0800 | main | WARNING - Process rank: 0, device: cuda:0, n_gpu: 2, distributed training: True, 16-bits training: False
2024-10-06T15:22:43.300035+0800 | main | INFO - Training/evaluation parameters TrainingArguments(
_n_gpu=2,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
clear_sparse_session=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_oneshot=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_strategy=IntervalStrategy.NO,
eval_use_gather_object=False,
evaluation_strategy=None,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/runs/Oct06_15-22-43_autodl-container-542540859a-97df04fc,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
oneshot_device=cuda:0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=./output,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
recipe=[SmoothQuantModifier(index=None, group=None, start=None, end=None, update=None, initialized_structure_=False, initialized_=False, finalized_=False, started_=False, ended_=False, smoothing_strength=0.8, mappings=[(['re:.*qkv_proj'], 're:.*input_layernorm'), (['re:.*gate_up_proj'], 're:.*post_attention_layernorm')], ignore=None, num_calibration_steps=None, calibration_function=None, hooks_=None, resolved_mappings_=None, scales_=None), GPTQModifier(index=None, group=None, start=None, end=None, update=None, initialized_structure_=False, initialized_=False, finalized_=False, started_=False, ended_=False, sequential_update=True, targets='Linear', sequential_targets=None, block_size=128, quantize=True, dampening_frac=0.01, config_groups=None, ignore=['lm_head'], disable_quantization_observer_epoch=None, num_calibration_steps=None, scheme='W8A8', model=None, layer_compressors_=None, compressible_layers_=None, quantization_modifier_=None)],
recipe_args=None,
remove_unused_columns=True,
report_to=['tensorboard'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=./output,
run_stages=False,
save_compressed=True,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_liger_kernel=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
2024-10-06T15:22:43.384374+0800 | _check_create_state | INFO - State created for compression lifecycle
2024-10-06T15:22:43.386507+0800 | pre_initialize_structure | INFO - Compression lifecycle structure pre-initialized for 0 modifiers
2024-10-06T15:22:43.387711+0800 | pre_initialize_structure | INFO - Compression lifecycle structure pre-initialized for 0 modifiers
2024-10-06T15:22:43.399178+0800 | __init__ | WARNING - The max_seq_length passed (8192) is larger than the maximum length for the model (4096). Using max_seq_length=4096.
2024-10-06T15:22:43.408002+0800 | one_shot | INFO - *** One Shot ***
2024-10-06T15:22:43.412990+0800 | from_modifiers | INFO - Creating recipe from modifiers
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/recipe.py", line 596, in _load_json_or_yaml_string
    ret = json.loads(content)
          ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/recipe.py", line 599, in _load_json_or_yaml_string
    ret = yaml.safe_load(content)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/constructor.py", line 60, in construct_document
    for dummy in generator:
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/constructor.py", line 408, in construct_yaml_seq
    data.extend(self.construct_sequence(node))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/constructor.py", line 129, in construct_sequence
    return [self.construct_object(child, deep=deep)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/yaml/constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
  in "<unicode string>", line 6, column 9:
          - !!python/tuple
            ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/autodl-tmp/quant.py", line 60, in <module>
    oneshot(
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
    main(model_args, data_args, training_args)
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 364, in main
    stage_runner.one_shot()
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
    self.trainer.one_shot(calibration_data=calib_data, stage=stage)
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
    apply(
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
    return active_session().apply(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/core/session.py", line 210, in apply
    self.initialize(**kwargs)
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/core/session.py", line 156, in initialize
    mod_data = self._lifecycle.initialize(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/core/lifecycle.py", line 120, in initialize
    extras = self.recipe_container.update(**extras)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/container.py", line 75, in update
    recipe = Recipe.create_instance(recipe)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/recipe.py", line 114, in create_instance
    return cls.from_modifiers(
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/recipe.py", line 71, in from_modifiers
    return cls.create_instance(path_or_modifiers=recipe_string)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/recipe.py", line 126, in create_instance
    obj = _load_json_or_yaml_string(path_or_modifiers)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/llmcompressor/recipe/recipe.py", line 601, in _load_json_or_yaml_string
    raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*qkv_proj
        - re:.*input_layernorm
      - !!python/tuple
        - - re:.*gate_up_proj
        - re:.*post_attention_layernorm
    GPTQModifier:
      targets: Linear
      dampening_frac: 0.01
      ignore:
      - lm_head
      scheme: W8A8

@piamo
Copy link

piamo commented Oct 15, 2024

same problem too

@markurtz
Copy link
Collaborator

@dsikka and @kylesayrs can we take a look into this one alongside the latest fixes we're doing for the vision pipelines and see if that is handled there or is an easy fix to include?

@kylesayrs kylesayrs self-assigned this Oct 21, 2024
markmc pushed a commit to markmc/llm-compressor that referenced this issue Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants