-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to evaluate? #3
Comments
Thank you for your question. Regarding the helpfulness dimension, we conduct tests on MTBench. For the honesty and harmlessness dimensions, our test data can be found at: src/CPSFT/cpsft/test_data_harmlessness and src/CPSFT/cpsft/test_data_honesty. The test code is located at: src/CPSFT/cpsft/test_harmlessness.py and src/CPSFT/cpsft/test_honesty.py. I hope this information is helpful to you. |
It seems that the code provided for evaluating honesty only output answers from LLMs. There is no output metric. Can you provide the code for this? |
To rate the responses generated by GPT-4 regarding the templates for "helpfulness" and "honesty" in the file preference_templates.py from the repository https://github.com/OpenBMB/UltraFeedback/blob/main/src/data_annotation/preference_templates.py |
I see the evaluate script, but the test_***.py python file is missing.
Want to ask if it's open source, or provide implementation details
The text was updated successfully, but these errors were encountered: