Support few-shot chain-of-thought in GPQA / MMLU #3088

yifanmai · 2024-10-22T23:26:09Z

Both GPQA and MMLU have a similar way of doing few-shot chain-of-thought so we should reuse the common infrastructure. We should coordinate to not duplicate work.

Scenario

When constructing instances, set instance.extra_data["chain_of_thought"] to the chain of thought in the dataset instance.

Run spec function

Update the run spec function to take in an boolean parameter use_chain_of_thought
Update the function to use the new adapter and metric if and only if use_chain_of_thought is true.

Adapter

Create a subclass of MultipleChoiceJointAdapter that includes the chain of thought in the output.
Add a new enum value (similar to ADAPT_MULTIPLE_CHOICE_JOINT) for that subclass and add the enum value and the subclass to AdapterFactory.
Set the AdapterSpec to use this enum value when constructing the run spec for in the run spec functions.

Metric

Create a new subclass of Metric in a new file chain_of_thought_metrics.py
Override evaluate_generation() to parse the model generated output to look for something like (A) and output a Stat named chain_of_thought_exact_match

The text was updated successfully, but these errors were encountered:

yifanmai mentioned this issue Oct 24, 2024

Proposal: Add free-form data fields to Instance #3057

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support few-shot chain-of-thought in GPQA / MMLU #3088

Support few-shot chain-of-thought in GPQA / MMLU #3088

yifanmai commented Oct 22, 2024

Support few-shot chain-of-thought in GPQA / MMLU #3088

Support few-shot chain-of-thought in GPQA / MMLU #3088

Comments

yifanmai commented Oct 22, 2024

Scenario

Run spec function

Adapter

Metric