evol_instruct issues: prompts with missing data #12

sanderland · 2024-01-05T13:38:38Z

ds = datasets.load_dataset('openbmb/ultrafeedback')
print(ds['train'][490]['instruction'])

Gives

Add a requirement for the given prompt that the hashtag must also include the top 3 countries with the highest sustainable energy consumption in 2020, based on their percentage of total energy consumption.

But there is no "given prompt". This seems to be an issue with several of the evol_instruct prompts.
Also note that the completions for such samples include wild hallucinations, and ratings evaluating them as free of hallucinations.

In addition, even evol_instruct prompts that do include the prompt to be modified are often full of issues, with either the model or the evaluator interpreting it as a request to answer the original prompt.

lifan-yuan · 2024-01-06T13:42:23Z

Hi,

Thanks for pointing this out! We will check these samples immediately and get back to you after processing.

sanderland · 2024-01-12T12:26:26Z

These are some strings that are common in problematic prompts:

["Rewritten Prompt", "the given prompt and rewrite", "The Given Prompt"]

lifan-yuan · 2024-01-13T14:35:13Z

Thanks for your assistance!

I've meticulously inspected all these samples and found they are about prompt engineering. All models including the GPT-4 judge are not able to follow the instructions. Considering that these challenging instructions should be meaningful in examining models' instruction-following ability, we tend to manually rectify them rather than remove them from the dataset.

Currently, I am still striving to prompt the models, especially the GPT-4 judge, to understand the instructions, though little progress has been made. I'd appreciate it very much if anyone could help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evol_instruct issues: prompts with missing data #12

evol_instruct issues: prompts with missing data #12

sanderland commented Jan 5, 2024

lifan-yuan commented Jan 6, 2024

sanderland commented Jan 12, 2024

lifan-yuan commented Jan 13, 2024

evol_instruct issues: prompts with missing data #12

evol_instruct issues: prompts with missing data #12

Comments

sanderland commented Jan 5, 2024

lifan-yuan commented Jan 6, 2024

sanderland commented Jan 12, 2024

lifan-yuan commented Jan 13, 2024