Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce zephyr-7b-gemma-v0.1 #148

Closed
jasonyux opened this issue Apr 4, 2024 · 3 comments
Closed

Cannot reproduce zephyr-7b-gemma-v0.1 #148

jasonyux opened this issue Apr 4, 2024 · 3 comments

Comments

@jasonyux
Copy link

jasonyux commented Apr 4, 2024

I tried to reproduce zephyr-7b-gemma-v0.1 using the exact code provided in this repository with 4xA100 GPUs. However, the resulting MT-bench test score was much lower than reported: 6.63, versus the reported value on huggingface pages: 7.81.

I wonder if anyone else is encountering this issue?

Command ran (the same as what's mentioned in the repo but modified gradient accumulation since I am using only 4xA100)

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml \
scripts/run_dpo.py recipes/zephyr-7b-gemma/dpo/config_full.yaml \
--output_dir=xxx/zephyr-7b-gemma-dpo-full_reprod \
--num_train_epochs=2 \
--gradient_accumulation_steps=16

and when generating model answers for MT-bench I used the default commands:

python gen_model_answer.py --model-path [MODEL-PATH] --model-id [MODEL-ID]

Related library versions I used:

  • Python 3.8.10 (I had to convert some source code from "ClassA | ClassB" to "Union[ClassA, ClassB]"
  • torch==2.1.2+cu118, transformers==4.39.1, trl==0.8.1, flash-attn==2.5.6, fschat==0.2.36

training curves from wandb:
image

eval reward curves:
image

@jasonyux
Copy link
Author

It seems that the issue is with chat templates used by fastchat during evaluation. Using the following templates to test H4's gemma models recovers the reported performance:

from fastchat.conversation import register_conv_template

register_conv_template(
    Conversation(
        name="templ=h4_gemma_chatml",
        system_template="<bos><|im_start|>system\n{system_message}",
        system_message="You are an AI assistant.",
        roles=("<|im_start|>user", "<|im_start|>assistant"),
        sep_style=SeparatorStyle.CHATML,
        sep="<|im_end|>",
        stop_str=["<|im_end|>", "<|endoftext|>"],
    )
)

# other init code omitted

@fanconic
Copy link

fanconic commented May 7, 2024

May I ask where this template originates from?

@jasonyux
Copy link
Author

This relates from how the model is trained using the run_dpo.py script. In that script, chat data is first formatted using tokenizer's template and then fed into Trainer. Unless you use (maybe) the latest version of fschat (which uses hardcoded templates), fschat will not use that same template; which leads to performance degradation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants