Add new run_swag example #9175

sgugger · 2020-12-17T20:53:46Z

What does this PR do?

This PR adds a new example for multiple-choice using Trainer and Datasets, and moves the older one to the legacy folder.

patrickvonplaten · 2020-12-18T08:22:15Z

examples/legacy/multiple_choice/utils_multiple_choice.py

+        contexts: list of str. The untokenized text of the first sequence (context of corresponding question).
+        endings: list of str. multiple choice's options. Its length must be equal to contexts' length.
+        label: (Optional) string. The label of the example. This should be
+        specified for train and dev examples, but not for test examples.


shouldn't there be a tab?

Suggested change

specified for train and dev examples, but not for test examples.

specified for train and dev examples, but not for test examples.

This is not written by me, I just copied it there. (We don't care since it's not rendered in the docs.)

patrickvonplaten · 2020-12-18T08:23:16Z

examples/legacy/multiple_choice/utils_multiple_choice.py

+
+            cached_features_file = os.path.join(
+                data_dir,
+                "cached_{}_{}_{}_{}".format(


no f-strings? ;-)

Again, not my file ;-)

patrickvonplaten · 2020-12-18T08:24:26Z

examples/legacy/multiple_choice/utils_multiple_choice.py

+
+    def get_train_examples(self, data_dir):
+        """See base class."""
+        logger.info("LOOKING AT {} train".format(data_dir))


why capitalized here?

Still not my file :-p

patrickvonplaten · 2020-12-18T08:25:42Z

examples/legacy/multiple_choice/utils_multiple_choice.py

+
+
+class RaceProcessor(DataProcessor):
+    """Processor for the RACE data set."""


If possible I think it makes a lot of sense to link to the dataset in datasets if available here

patrickvonplaten · 2020-12-18T08:26:05Z

examples/legacy/multiple_choice/utils_multiple_choice.py

+
+
+class SwagProcessor(DataProcessor):
+    """Processor for the SWAG data set."""


link to dataset in datasets

patrickvonplaten · 2020-12-18T08:26:16Z

examples/legacy/multiple_choice/utils_multiple_choice.py

+
+
+class ArcProcessor(DataProcessor):
+    """Processor for the ARC data set (request from allennlp)."""


link to dataset if available

patrickvonplaten · 2020-12-18T08:29:23Z

examples/multiple-choice/run_swag.py

+    def __post_init__(self):
+        if self.train_file is not None:
+            extension = self.train_file.split(".")[-1]
+            assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."


There can also be files in .json format that don't have the .json extension (I saw that already quite a bit when porting datasets....) my better to have a try except catch here?

The script is not universal, I'd leave it to the users to do the change if they have a json file without the json extension (the bottomline is that if they did not prepare it themselves, it's unlikely to work out of the box since the data is expected to have the same format as swag on datasets).

patrickvonplaten

Nice! Left some comments, mostly nits

LysandreJik

Yes, LGTM! Great!

LysandreJik · 2020-12-18T18:35:00Z

examples/README.md

-| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice)           | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
+| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice)           | SWAG, RACE, ARC | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)


This is a satisfying diff :)

examples/multiple-choice/README.md

LysandreJik · 2020-12-18T18:36:43Z

examples/multiple-choice/run_swag.py

@@ -0,0 +1,349 @@
+# coding=utf-8
+# Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.


I think we usually put the date here (2020) ?

examples/multiple-choice/run_swag.py

LysandreJik · 2020-12-18T18:39:25Z

examples/multiple-choice/run_swag.py

+    # https://huggingface.co/docs/datasets/loading_datasets.html.
+
+    # Load pretrained model and tokenizer
+    #


Same octothorp-related question

LysandreJik · 2020-12-18T18:39:55Z

examples/multiple-choice/run_swag.py

+        # Flatten out
+        first_sentences = sum(first_sentences, [])
+        second_sentences = sum(second_sentences, [])
+
+        # Tokenize
+        tokenized_examples = tokenizer(
+            first_sentences,
+            second_sentences,
+            truncation=True,
+            max_length=data_args.max_seq_length,
+            padding="max_length" if data_args.pad_to_max_length else False,
+        )
+        # Un-flatten
+        return {k: [v[i : i + 4] for i in range(0, len(v), 4)] for k, v in tokenized_examples.items()}


This is nice, and understandable. Good job!

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

LysandreJik · 2020-12-18T19:12:03Z

examples/multiple-choice/run_swag.py

-    #
+


I'm satisfied.

* Add new run_swag example * Add check * Add sample * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Very important change to make Lysandre happy Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Add new run_swag example

69952ea

sgugger requested review from thomwolf, patrickvonplaten and LysandreJik December 17, 2020 20:53

sgugger added 2 commits December 17, 2020 15:54

Add check

5edbd68

Add sample

83e6a48

patrickvonplaten reviewed Dec 18, 2020

View reviewed changes

patrickvonplaten approved these changes Dec 18, 2020

View reviewed changes

LysandreJik approved these changes Dec 18, 2020

View reviewed changes

sgugger and others added 2 commits December 18, 2020 14:10

Apply suggestions from code review

7983d87

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Very important change to make Lysandre happy

296a22f

LysandreJik reviewed Dec 18, 2020

View reviewed changes

examples/multiple-choice/run_swag.py Outdated

#

Copy link

Member

LysandreJik Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm satisfied.

sgugger merged commit 9a25c5b into master Dec 18, 2020

sgugger deleted the run_swag branch December 18, 2020 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new run_swag example #9175

Add new run_swag example #9175

sgugger commented Dec 17, 2020

patrickvonplaten Dec 18, 2020

sgugger Dec 18, 2020

patrickvonplaten Dec 18, 2020

sgugger Dec 18, 2020

patrickvonplaten Dec 18, 2020

sgugger Dec 18, 2020

patrickvonplaten Dec 18, 2020

patrickvonplaten Dec 18, 2020

patrickvonplaten Dec 18, 2020

patrickvonplaten Dec 18, 2020

sgugger Dec 18, 2020

patrickvonplaten left a comment

LysandreJik left a comment

LysandreJik Dec 18, 2020

LysandreJik Dec 18, 2020

LysandreJik Dec 18, 2020

LysandreJik Dec 18, 2020

LysandreJik Dec 18, 2020

	specified for train and dev examples, but not for test examples.
	specified for train and dev examples, but not for test examples.



		class RaceProcessor(DataProcessor):
		"""Processor for the RACE data set."""



		class SwagProcessor(DataProcessor):
		"""Processor for the SWAG data set."""



		class ArcProcessor(DataProcessor):
		"""Processor for the ARC data set (request from allennlp)."""

		\| [`multiple-choice`](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) \| SWAG, RACE, ARC \| ✅ \| ✅ \| - \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
		\| [`multiple-choice`](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) \| SWAG, RACE, ARC \| ✅ \| ✅ \| ✅ \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)

		@@ -0,0 +1,349 @@
		# coding=utf-8
		# Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.

Add new run_swag example #9175

Add new run_swag example #9175

Conversation

sgugger commented Dec 17, 2020

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment