You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both GPQA and MMLU have a similar way of doing few-shot chain-of-thought so we should reuse the common infrastructure. We should coordinate to not duplicate work.
Scenario
When constructing instances, set instance.extra_data["chain_of_thought"] to the chain of thought in the dataset instance.
Run spec function
Update the run spec function to take in an boolean parameter use_chain_of_thought
Update the function to use the new adapter and metric if and only if use_chain_of_thought is true.
Adapter
Create a subclass of MultipleChoiceJointAdapter that includes the chain of thought in the output.
Add a new enum value (similar to ADAPT_MULTIPLE_CHOICE_JOINT) for that subclass and add the enum value and the subclass to AdapterFactory.
Set the AdapterSpec to use this enum value when constructing the run spec for in the run spec functions.
Metric
Create a new subclass of Metric in a new file chain_of_thought_metrics.py
Override evaluate_generation() to parse the model generated output to look for something like (A) and output a Stat named chain_of_thought_exact_match
The text was updated successfully, but these errors were encountered:
Related: #3017 and #3018
Both GPQA and MMLU have a similar way of doing few-shot chain-of-thought so we should reuse the common infrastructure. We should coordinate to not duplicate work.
Scenario
instance.extra_data["chain_of_thought"]
to the chain of thought in the dataset instance.Run spec function
use_chain_of_thought
use_chain_of_thought
is true.Adapter
MultipleChoiceJointAdapter
that includes the chain of thought in the output.ADAPT_MULTIPLE_CHOICE_JOINT
) for that subclass and add the enum value and the subclass toAdapterFactory
.AdapterSpec
to use this enum value when constructing the run spec for in the run spec functions.Metric
Metric
in a new filechain_of_thought_metrics.py
evaluate_generation()
to parse the model generated output to look for something like(A)
and output aStat
namedchain_of_thought_exact_match
The text was updated successfully, but these errors were encountered: