Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model eval error: NameError: name 'model_state_dict' is not defined #1776

Closed
elfisworking opened this issue Oct 9, 2024 · 3 comments · Fixed by #1777
Closed

model eval error: NameError: name 'model_state_dict' is not defined #1776

elfisworking opened this issue Oct 9, 2024 · 3 comments · Fixed by #1777

Comments

@elfisworking
Copy link

i use this commend to run evaluation

tune run eleuther_eval --config eleuther_evaluation \
>     tasks="[hellaswag, wikitext]" \
>     model._component_=torchtune.models.llama3.llama3_8b \
>     quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer\
>     quantizer.groupsize=128 \
>     checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
>     checkpointer.checkpoint_dir="/QAT/output/llama3-8B" \
>     checkpointer.output_dir="/QAT/output/llama3-8B" \
>     checkpointer.checkpoint_files=[meta_model_2-8da4w.pt] \
>     checkpointer.model_type=LLAMA3 \
>     tokenizer._component_=torchtune.models.llama3.llama3_tokenizer \
>     tokenizer.path=/QAT/Meta-Llama-3-8B/original/tokenizer.model

But i get this.

2024-10-09:08:30:57,790 INFO     [_logging.py:101] Running EleutherEvalRecipe with resolved config:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelTorchTuneCheckpointer
  checkpoint_dir: /QAT/output/llama3-8B
  checkpoint_files:
  - meta_model_2-8da4w.pt
  model_type: LLAMA3
  output_dir: /QAT/output/llama3-8B
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
  _component_: torchtune.models.llama3.llama3_8b
quantizer:
  _component_: torchtune.training.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 128
seed: 1234
tasks:
- hellaswag
- wikitext
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /QAT/Meta-Llama-3-8B/original/tokenizer.model

Traceback (most recent call last):
  File "/usr/local/bin/tune", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/run.py", line 196, in _run_cmd
    self._run_single_device(args, is_builtin=is_builtin)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/run.py", line 102, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "/usr/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 576, in <module>
    sys.exit(recipe_main())
  File "/usr/local/lib/python3.10/dist-packages/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 571, in recipe_main
    recipe.setup(cfg=cfg)
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 494, in setup
    for k, v in model_state_dict.items():
NameError: name 'model_state_dict' is not defined

i read the code https://github.com/pytorch/torchtune/blob/main/recipes/eleuther_eval.py, i can not find where model_state_dict is defined. A bug ???
I have used this config file, but get same error

model:
  _component_: torchtune.models.llama3.llama3_8b

checkpointer:
  _component_: torchtune.training.FullModelTorchTuneCheckpointer
  checkpoint_dir: /QAT/output/llama3-8B/
  checkpoint_files: [
    meta_model_2-8da4w.pt
  ]
  output_dir: /QAT/output/llama3-8B/
  model_type: LLAMA3

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /QAT/Meta-Llama-3-8B/original/tokenizer.model
  max_seq_len: null

# Environment
device: cuda
dtype: bf16
seed: 42 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed

# EleutherAI specific eval args
tasks: ["hellaswag"]
limit: null
max_seq_length: 8192
batch_size: 8

# Quantization specific args
quantizer:
  _component_: torchtune.training.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 256

Anyone can helps, thanks very much !!!

@SalmanMohammadi
Copy link
Collaborator

Hey @elfisworking!

I think this is due to a bug (which I also found in #1763). I'll open a separate PR with the fix to unblock you : )

@elfisworking
Copy link
Author

thanks you !

@SalmanMohammadi
Copy link
Collaborator

Hey @elfisworking. Could you try this out on our next nightly release? This should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants