Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running single judgement with references #20

Open
sersoage opened this issue Dec 29, 2023 · 0 comments
Open

Issue running single judgement with references #20

sersoage opened this issue Dec 29, 2023 · 0 comments

Comments

@sersoage
Copy link

Hi guys first of all thank you for the great paper. I am trying the single case scenario that is where i have a question, a model generated answer and a reference answer. Looking at the code i am using gen_model_judgment_single.py.
The first thing i did was to generate the dataset in the desired format:
"question_id": i,
"question_body": question["question"],
"decoding_method": "top_p_sampling", # Placeholder value
"model": "alpaca-native", # Placeholder value
"text": answer,
"scores": {"logprobs": -7.0179795026779175} #placheholder
I ase generated the reference answer dataset like this
combined_entry = {
"question_id": i,
"question_body": question["question"],
"decoding_method": "top_p_sampling", # Placeholder value
"model": "alpaca-native", # Placeholder value
"reference": {
"text": answer # You can update this with the correct reference text
},
"scores": {
"logprobs": -7.0179795026779175 # place holder
}
}
Then as stated in the repo i ran the judgelm_preprocess.py which generated a json with the following format
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated answer, "answer2_body": "reference answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}}
First question is it ok for the answer2body to be the reference answer?

Then having this dataset a run:
!python ./judgelm/llm_judge/gen_model_judgement_single.py
--model-path "BAAI/JudgeLM-7B-v1.0"
--model-id 7b-full-model
--question-file /root/JudgeLM/judgelm/data/judgelm-val-5k-judge-samples.jsonl
--answer-file /root/JudgeLM/judgelm/data/JudgeLM/output
--num-gpus-per-model 1
--num-gpus-total 1
--temperature 0
--reference-file /root/JudgeLM/judgelm/data/JudgeLM/combined_questions_answers_ref.jsonl
--if-fast-eval 1
First issue i run into was that since i was using references the copy function of conversation requests num_answers but this is a single one so i had to change the code to add this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy()
to this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy(answer_num=answer_num_value)
passing 1 as answer_num_value
So i do not know if this is a bug, if my change is ok?
After changing this a get the code to run however I do not see any judgment on the output, here is a sample output:
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated_answer", "answer2_body": reference_answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}, "pred_id": "ie5CkG9JTxcCYmAwt3pwrj", "pred_text": "10", "pred_model_id": "7b-full-model", "tstamp": 1703790064.0357897, "reference": "reference_anwer"}
I was wondering if you could help me to properly run this code and point anything i am doing wrong
Best
Sergio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant