Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reward model 使用do_predict得到的结果和直接用api部署不同 #5967

Open
1 task done
vxfla opened this issue Nov 8, 2024 · 0 comments
Open
1 task done
Labels
pending This problem is yet to be addressed

Comments

@vxfla
Copy link

vxfla commented Nov 8, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.8.4.dev0
  • Platform: Linux-5.15.0-88-generic-x86_64-with-glibc2.35
  • Python version: 3.9.18
  • PyTorch version: 2.3.0 (GPU)
  • Transformers version: 4.41.2
  • Datasets version: 2.18.0
  • Accelerate version: 0.32.0
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100 80GB PCIe
  • DeepSpeed version: 0.15.0
  • vLLM version: 0.5.0

Reproduction

如下两种方式对同一批数据打分结果不一致:
方式1:
本地部署一个训练过的reward model
API_PORT=8001 llamafactory-cli api --model_name_or_path xxx --template qwen --stage rm

通过如下方式获取score

    prompt = "You are a helpful assistant."
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": instruct},
        {"role": "assistant", "content": output}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    return text

def get_score(instruct, output):
    text = make_text(instruct, output)
    data = {
                "model": "qwen2.5_3B_style_rm_3k",
                "messages": [
                    text
                ]
            }
    r = requests.post("http://127.0.0.1:8001/v1/score/evaluation", data=json.dumps(data))
    return json.loads(r.text)["scores"][0]```

方式2:
llamafactory-cli train xxx.yaml

yaml内容

model_name_or_path: xxx

stage: rm
do_train: false
do_eval: false
do_predict: true

eval_dataset: xxx
template: qwen
cutoff_len: 1024
max_samples: 10000
overwrite_cache: true
preprocessing_num_workers: 16

output_dir: xxx

per_device_eval_batch_size: 1


### Expected behavior

方式1给出的score比较低,且chosen > reject 的比例只有60%
方式2 给出score 数值较高,且chosen > reject 的比例有100%

想知道是我部署出了问题,还是评测出了问题

### Others

_No response_
@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant