Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evol_instruct issues: prompts with missing data #12

Open
sanderland opened this issue Jan 5, 2024 · 3 comments
Open

evol_instruct issues: prompts with missing data #12

sanderland opened this issue Jan 5, 2024 · 3 comments

Comments

@sanderland
Copy link

ds = datasets.load_dataset('openbmb/ultrafeedback')
print(ds['train'][490]['instruction'])

Gives

Add a requirement for the given prompt that the hashtag must also include the top 3 countries with the highest sustainable energy consumption in 2020, based on their percentage of total energy consumption.

But there is no "given prompt". This seems to be an issue with several of the evol_instruct prompts.
Also note that the completions for such samples include wild hallucinations, and ratings evaluating them as free of hallucinations.

In addition, even evol_instruct prompts that do include the prompt to be modified are often full of issues, with either the model or the evaluator interpreting it as a request to answer the original prompt.

@lifan-yuan
Copy link
Collaborator

Hi,

Thanks for pointing this out! We will check these samples immediately and get back to you after processing.

@sanderland
Copy link
Author

These are some strings that are common in problematic prompts:

["Rewritten Prompt", "the given prompt and rewrite", "The Given Prompt"]

@lifan-yuan
Copy link
Collaborator

Thanks for your assistance!

I've meticulously inspected all these samples and found they are about prompt engineering. All models including the GPT-4 judge are not able to follow the instructions. Considering that these challenging instructions should be meaningful in examining models' instruction-following ability, we tend to manually rectify them rather than remove them from the dataset.

Currently, I am still striving to prompt the models, especially the GPT-4 judge, to understand the instructions, though little progress has been made. I'd appreciate it very much if anyone could help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants