Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added text classification datasets from Glue benchmark #2363

Merged
merged 5 commits into from
Aug 17, 2021
Merged

Added text classification datasets from Glue benchmark #2363

merged 5 commits into from
Aug 17, 2021

Conversation

lukasgarbas
Copy link
Collaborator

Following datasets from GLUE were added:

  • Text classification: CoLA and SST2 (which already exists as SENTEVAL_SST_BINARY so I just checked that train/dev/test splits are kept in original form and not sampled)
  • Text-pair classification: MNLI, QNLI, QQP, MRPC, WNLI

The only missing dataset is Semantic Textual Similarity STS-B, which is a text-pair regression task and would probably require a TextPairRegressor model.

Three language inference (NLI) tasks: MNLI, QNLI, WNLI.
Two paraphrase identification tasks: MRPC, QQP.
MRPC already comes with an annotated test set, where others have eval_datasets without labels.
@alanakbik alanakbik requested a review from marcelmmm August 5, 2021 13:56

matched_suffix = "matched" if evaluate_on_matched else "mismatched"

dev_dataset = "MNLI/dev_" + matched_suffix + ".tsv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is aproblem with the path here. I get an assertion error when first initializing the corpus. Maybe one 'MNLI' too much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Thanks for pointing that out. I fixed the path for MNLI and trained the small electra model with it. MNLI seems to work now : )

Copy link
Collaborator

@marcelmmm marcelmmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@alanakbik
Copy link
Collaborator

@lukasgarbas thanks for adding this!

@alanakbik alanakbik merged commit ff7c8ab into flairNLP:master Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants