You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys first of all thank you for the great paper. I am trying the single case scenario that is where i have a question, a model generated answer and a reference answer. Looking at the code i am using gen_model_judgment_single.py.
The first thing i did was to generate the dataset in the desired format:
"question_id": i,
"question_body": question["question"],
"decoding_method": "top_p_sampling", # Placeholder value
"model": "alpaca-native", # Placeholder value
"text": answer,
"scores": {"logprobs": -7.0179795026779175} #placheholder
I ase generated the reference answer dataset like this
combined_entry = {
"question_id": i,
"question_body": question["question"],
"decoding_method": "top_p_sampling", # Placeholder value
"model": "alpaca-native", # Placeholder value
"reference": {
"text": answer # You can update this with the correct reference text
},
"scores": {
"logprobs": -7.0179795026779175 # place holder
}
}
Then as stated in the repo i ran the judgelm_preprocess.py which generated a json with the following format
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated answer, "answer2_body": "reference answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}}
First question is it ok for the answer2body to be the reference answer?
Then having this dataset a run:
!python ./judgelm/llm_judge/gen_model_judgement_single.py
--model-path "BAAI/JudgeLM-7B-v1.0"
--model-id 7b-full-model
--question-file /root/JudgeLM/judgelm/data/judgelm-val-5k-judge-samples.jsonl
--answer-file /root/JudgeLM/judgelm/data/JudgeLM/output
--num-gpus-per-model 1
--num-gpus-total 1
--temperature 0
--reference-file /root/JudgeLM/judgelm/data/JudgeLM/combined_questions_answers_ref.jsonl
--if-fast-eval 1
First issue i run into was that since i was using references the copy function of conversation requests num_answers but this is a single one so i had to change the code to add this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy()
to this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy(answer_num=answer_num_value)
passing 1 as answer_num_value
So i do not know if this is a bug, if my change is ok?
After changing this a get the code to run however I do not see any judgment on the output, here is a sample output:
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated_answer", "answer2_body": reference_answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}, "pred_id": "ie5CkG9JTxcCYmAwt3pwrj", "pred_text": "10", "pred_model_id": "7b-full-model", "tstamp": 1703790064.0357897, "reference": "reference_anwer"}
I was wondering if you could help me to properly run this code and point anything i am doing wrong
Best
Sergio
The text was updated successfully, but these errors were encountered:
Hi guys first of all thank you for the great paper. I am trying the single case scenario that is where i have a question, a model generated answer and a reference answer. Looking at the code i am using gen_model_judgment_single.py.
The first thing i did was to generate the dataset in the desired format:
"question_id": i,
"question_body": question["question"],
"decoding_method": "top_p_sampling", # Placeholder value
"model": "alpaca-native", # Placeholder value
"text": answer,
"scores": {"logprobs": -7.0179795026779175} #placheholder
I ase generated the reference answer dataset like this
combined_entry = {
"question_id": i,
"question_body": question["question"],
"decoding_method": "top_p_sampling", # Placeholder value
"model": "alpaca-native", # Placeholder value
"reference": {
"text": answer # You can update this with the correct reference text
},
"scores": {
"logprobs": -7.0179795026779175 # place holder
}
}
Then as stated in the repo i ran the judgelm_preprocess.py which generated a json with the following format
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated answer, "answer2_body": "reference answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}}
First question is it ok for the answer2body to be the reference answer?
Then having this dataset a run:
!python ./judgelm/llm_judge/gen_model_judgement_single.py
--model-path "BAAI/JudgeLM-7B-v1.0"
--model-id 7b-full-model
--question-file /root/JudgeLM/judgelm/data/judgelm-val-5k-judge-samples.jsonl
--answer-file /root/JudgeLM/judgelm/data/JudgeLM/output
--num-gpus-per-model 1
--num-gpus-total 1
--temperature 0
--reference-file /root/JudgeLM/judgelm/data/JudgeLM/combined_questions_answers_ref.jsonl
--if-fast-eval 1
First issue i run into was that since i was using references the copy function of conversation requests num_answers but this is a single one so i had to change the code to add this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy()
to this line
conv = conv_judge_single.copy() if references is None else conv_judge_single_w_reference.copy(answer_num=answer_num_value)
passing 1 as answer_num_value
So i do not know if this is a bug, if my change is ok?
After changing this a get the code to run however I do not see any judgment on the output, here is a sample output:
{"question_id": 0, "score": [{"logprobs": -7.0179795026779175}, {"logprobs": -7.0179795026779175}], "question_body": "question", "answer1_body": " generated_answer", "answer2_body": reference_answer", "answer1_model_id": "alpaca-native", "answer2_model_id": "alpaca-native", "answer1_metadata": {"decoding_method": "top_p_sampling"}, "answer2_metadata": {"decoding_method": "top_p_sampling"}, "pred_id": "ie5CkG9JTxcCYmAwt3pwrj", "pred_text": "10", "pred_model_id": "7b-full-model", "tstamp": 1703790064.0357897, "reference": "reference_anwer"}
I was wondering if you could help me to properly run this code and point anything i am doing wrong
Best
Sergio
The text was updated successfully, but these errors were encountered: