Skip to content

The overall score is not matching with the principles #11

@ASC-Competition

Description

@ASC-Competition

Hi,
I found that some answer with higher overall_socre possessing a lower helpfulness_score in evol_instruct.jsonl dataset which the principle is 100% helpfulness.

for example, the scores of 9th sample in evol_instruct.jsonl dataset is as following:

models helpfulness honesty instruction following truthfulness overall score
gpt-3.5-turbo 4 5 4 5 7
llama-2-70b-chat 4 4 5 5 7.5
mpt-30b-chat 3 4 3 5 6.5
vicuna-33b 5 4 4 5 6.5

The answer of vicuna-33b has the highest helpfulness but lowest overall score.

My question is should I pickup the answer with the highest overall score or the highest helpfulness score as a preference anwer, or should I use the mean of the four principles.

Any suggestions will be appriciated, thx.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions