Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPQA scenario and tests added #3068

Merged
merged 3 commits into from
Oct 21, 2024
Merged

GPQA scenario and tests added #3068

merged 3 commits into from
Oct 21, 2024

Conversation

liamjxu
Copy link
Contributor

@liamjxu liamjxu commented Oct 17, 2024

No description provided.

@liamjxu liamjxu requested a review from yifanmai October 17, 2024 05:45
@liamjxu
Copy link
Contributor Author

liamjxu commented Oct 17, 2024

Both the scenario (GPQA, 3 subsets, with HF datasets) and the tests are added. All tests passed.

@yifanmai Regarding the run spec function, should we create one in lite_run_specs.py or create a new separate file specifically for GPQA?

@liamjxu liamjxu linked an issue Oct 17, 2024 that may be closed by this pull request
@liamjxu
Copy link
Contributor Author

liamjxu commented Oct 17, 2024

Currently seeing issues with the automatic checks, will look into it

@yifanmai
Copy link
Collaborator

FYI

The code formatting check failed. To fix the formatting, run:

	black src scripts
	git commit -a

instances = scenario.get_instances(tmpdir)
assert len(instances) == 448
assert instances[0].input == Input(
text="A large gene has dozens of exons, of which the central ones code for folded triple helical repeats that connect the cytoskeleton with sarcolemma and extracellular space. Each exon usually codes for one folded triple alpha helix. The most common mutations of the gene are central exon deletions that create out-of-frame peptides and progressive degenerative organ waste. A solution is to deliver a Morpholino that recognizes the 5' end of the out-of-frame exon in pre-mRNA. The molecule prevents binding of the spliceosome and creates exon skipping and in-frame joining. Several missing exons are well tolerated by an organism. Which structure below is not involved in the proposed therapy?"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid posting plain text from the dataset, as per the instructions from the dataset card for GPQA. Just check the string length instead.

instance = Instance(
input=input,
references=references,
split=TRAIN_SPLIT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the paper, the authors defined five fixed train split examples in this file. All other examples are test split.

You should also filter out the train examples from the test set. Unfortunately, the examples in the train split are also in the Hugging Face dataset, so you may have to do this filtering manually. You could do something like hardcoding the indexes of the train examples, but the indexes might be different depending on the subset.

Also see the discussion here:

idavidrein/gpqa#28

Comment on lines 55 to 58
Reference(Output(text=row["Correct Answer"].strip()), tags=[CORRECT_TAG]),
Reference(Output(text=row["Incorrect Answer 1"].strip()), tags=[]),
Reference(Output(text=row["Incorrect Answer 2"].strip()), tags=[]),
Reference(Output(text=row["Incorrect Answer 3"].strip()), tags=[])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always sets the correct answer to A, which is not great. The authors shuffle the choices deterministically, and you should do something similar.

Comment on lines +57 to +58
if idx in EXCLUDED_TRAIN_EXAMPLES[self.subset]:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no train split examples currently; let's address this in a different PR.

@yifanmai yifanmai merged commit f5ca627 into main Oct 21, 2024
12 checks passed
@yifanmai yifanmai deleted the jialiang/add_gpqa branch October 21, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GPQA scenario
2 participants