model eval error: NameError: name 'model_state_dict' is not defined #1776

elfisworking · 2024-10-09T08:35:51Z

i use this commend to run evaluation

tune run eleuther_eval --config eleuther_evaluation \
>     tasks="[hellaswag, wikitext]" \
>     model._component_=torchtune.models.llama3.llama3_8b \
>     quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer\
>     quantizer.groupsize=128 \
>     checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
>     checkpointer.checkpoint_dir="/QAT/output/llama3-8B" \
>     checkpointer.output_dir="/QAT/output/llama3-8B" \
>     checkpointer.checkpoint_files=[meta_model_2-8da4w.pt] \
>     checkpointer.model_type=LLAMA3 \
>     tokenizer._component_=torchtune.models.llama3.llama3_tokenizer \
>     tokenizer.path=/QAT/Meta-Llama-3-8B/original/tokenizer.model

But i get this.

2024-10-09:08:30:57,790 INFO     [_logging.py:101] Running EleutherEvalRecipe with resolved config:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelTorchTuneCheckpointer
  checkpoint_dir: /QAT/output/llama3-8B
  checkpoint_files:
  - meta_model_2-8da4w.pt
  model_type: LLAMA3
  output_dir: /QAT/output/llama3-8B
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
  _component_: torchtune.models.llama3.llama3_8b
quantizer:
  _component_: torchtune.training.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 128
seed: 1234
tasks:
- hellaswag
- wikitext
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /QAT/Meta-Llama-3-8B/original/tokenizer.model

Traceback (most recent call last):
  File "/usr/local/bin/tune", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/run.py", line 196, in _run_cmd
    self._run_single_device(args, is_builtin=is_builtin)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/run.py", line 102, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "/usr/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 576, in <module>
    sys.exit(recipe_main())
  File "/usr/local/lib/python3.10/dist-packages/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 571, in recipe_main
    recipe.setup(cfg=cfg)
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 494, in setup
    for k, v in model_state_dict.items():
NameError: name 'model_state_dict' is not defined

i read the code https://github.com/pytorch/torchtune/blob/main/recipes/eleuther_eval.py, i can not find where model_state_dict is defined. A bug ???
I have used this config file, but get same error

model:
  _component_: torchtune.models.llama3.llama3_8b

checkpointer:
  _component_: torchtune.training.FullModelTorchTuneCheckpointer
  checkpoint_dir: /QAT/output/llama3-8B/
  checkpoint_files: [
    meta_model_2-8da4w.pt
  ]
  output_dir: /QAT/output/llama3-8B/
  model_type: LLAMA3

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /QAT/Meta-Llama-3-8B/original/tokenizer.model
  max_seq_len: null

# Environment
device: cuda
dtype: bf16
seed: 42 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed

# EleutherAI specific eval args
tasks: ["hellaswag"]
limit: null
max_seq_length: 8192
batch_size: 8

# Quantization specific args
quantizer:
  _component_: torchtune.training.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 256

Anyone can helps, thanks very much !!!

The text was updated successfully, but these errors were encountered:

SalmanMohammadi · 2024-10-09T08:42:37Z

Hey @elfisworking!

I think this is due to a bug (which I also found in #1763). I'll open a separate PR with the fix to unblock you : )

elfisworking · 2024-10-09T09:16:05Z

thanks you !

SalmanMohammadi · 2024-10-09T13:17:02Z

Hey @elfisworking. Could you try this out on our next nightly release? This should be fixed.

SalmanMohammadi mentioned this issue Oct 9, 2024

Fixing quantization in eval recipe #1777

Merged

13 tasks

SalmanMohammadi closed this as completed in #1777 Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model eval error: NameError: name 'model_state_dict' is not defined #1776

model eval error: NameError: name 'model_state_dict' is not defined #1776

elfisworking commented Oct 9, 2024

SalmanMohammadi commented Oct 9, 2024

elfisworking commented Oct 9, 2024

SalmanMohammadi commented Oct 9, 2024

model eval error: NameError: name 'model_state_dict' is not defined #1776

model eval error: NameError: name 'model_state_dict' is not defined #1776

Comments

elfisworking commented Oct 9, 2024

SalmanMohammadi commented Oct 9, 2024

elfisworking commented Oct 9, 2024

SalmanMohammadi commented Oct 9, 2024