Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Add glue hf datasets #3624

Merged
merged 7 commits into from
May 1, 2021
Merged

Add glue hf datasets #3624

merged 7 commits into from
May 1, 2021

Conversation

meganung
Copy link
Contributor

@meganung meganung commented Apr 27, 2021

Adding all glue datasets (https://huggingface.co/datasets/glue) and super glue datasets (https://huggingface.co/datasets/super_glue).

Added test.py for both glue and superglue and all pass.

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow move fast! Great to see

  • Add tests for all of them, please
  • Add a MultiTeacher as the new default for glue and superglue. I'll show you an example shortly

Comment on lines 10 to 15
class AxTeacher(AbstractHuggingFaceTeacher):
hf_path = 'glue'
hf_name = 'ax'
hf_text_fields = ['premise', 'hypothesis']
hf_label_field = 'label'
hf_splits_mapping = {'train': 'test', 'valid': 'test', 'test': 'test'}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an evaluation dataset (see https://huggingface.co/datasets/glue) so it only has a test split.
An alternative is to have train and valid map to None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. Not sure what's going to happen if we make an empty teacher but we can try

@stephenroller
Copy link
Contributor

class DecaNLPTeacher(MultiTaskTeacher):
def __init__(self, opt, shared=None):
decanlp_tasks = [
'squad',
'iwslt14',
'cnn_dm',
'multinli',
'sst',
'qasrl',
'qazre',
'woz',
'wikisql',
'mwsc',
]
opt = deepcopy(opt)
opt['task'] = ', '.join(decanlp_tasks)
super().__init__(opt, shared)

Example of a MultiTeacher

hf_name = 'ax'
hf_text_fields = ['premise', 'hypothesis']
hf_label_field = 'label'
hf_splits_mapping = {'train': None, 'valid': None, 'test': 'test'}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for evaluation datasets, changed mapping of train and valid to None e.g. hf_splits_mapping = {'train': None, 'valid': None, 'test': 'test'} and removed the task from the tests and the multitaskteacher, otherwise will produce errors. e.g. display_data -t glue:ax will work only if you set the datatype to test now.

@meganung meganung requested a review from stephenroller April 30, 2021 17:48
Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@meganung meganung merged commit 0120e9e into master May 1, 2021
@meganung meganung deleted the add-glue-hf-datasets branch May 1, 2021 05:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants