diff --git a/docs/source/index.rst b/docs/source/index.rst index e069b997e8140a..43e0152973c781 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -385,6 +385,7 @@ TensorFlow and/or Flax. main_classes/callback main_classes/configuration + main_classes/data_collator main_classes/logging main_classes/model main_classes/optimizer_schedules diff --git a/docs/source/main_classes/data_collator.rst b/docs/source/main_classes/data_collator.rst new file mode 100644 index 00000000000000..8162f2d65f0245 --- /dev/null +++ b/docs/source/main_classes/data_collator.rst @@ -0,0 +1,83 @@ + + +DataCollator +----------------------------------------------------------------------------------------------------------------------- + +DataCollators are objects that will form a batch by using a list of elements as input. These lists of elements are of +the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`. + +A data collator will default to :func:`transformers.data.data_collator.default_data_collator` if no `tokenizer` has +been provided. This is a function that takes a list of samples from a Dataset as input and collates them into a batch +of a dict-like object. The default collator performs special handling of potential keys: + + - ``label``: handles a single value (int or float) per object + - ``label_ids``: handles a list of values per object + +This function does not perform any preprocessing. An example of use can be found in glue and ner. + + +Default data collator +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autofunction:: transformers.data.data_collator.default_data_collator + + +DataCollatorWithPadding +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorWithPadding + :special-members: __call__ + :members: + +DataCollatorForTokenClassification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification + :special-members: __call__ + :members: + +DataCollatorForSeq2Seq +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq + :special-members: __call__ + :members: + +DataCollatorForLanguageModeling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling + :special-members: __call__ + :members: mask_tokens + +DataCollatorForWholeWordMask +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask + :special-members: __call__ + :members: mask_tokens + +DataCollatorForSOP +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorForSOP + :special-members: __call__ + :members: mask_tokens + +DataCollatorForPermutationLanguageModeling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling + :special-members: __call__ + :members: mask_tokens