Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large model class refactoring: Introduce ...AdapterModel classes #289

Merged
merged 11 commits into from
Feb 23, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests_torch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,4 @@ jobs:
pip install datasets
- name: Test
run: |
make test-reduced
make test-adapters
12 changes: 2 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -82,16 +82,8 @@ test:

# Run the adapter tests

test-adapter:
python -m pytest -n auto --dist=loadfile -s -v\
-k test_adapter\
--ignore-glob='tests/test_tokenization*'\
--ignore-glob='tests/test_processor*'\
./tests/

# Run a reduced test suite in the CI pipeline of adapter-transformers
test-reduced:
python utils/run_tests.py
test-adapters:
python -m pytest -n auto --dist=loadfile -s -v ./tests_adapters/

# Run tests for examples

Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ To get started with adapters, refer to these locations:
- **https://adapterhub.ml** to explore available pre-trained adapter modules and share your own adapters
- **[Examples folder](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples)** of this repository containing HuggingFace's example training scripts, many adapted for training adapters

## Supported Models

We currently support the PyTorch versions of all models listed on the **[Model Overview](https://docs.adapterhub.ml/model_overview.html) page** in our documentation.

## Citation

Expand Down
2 changes: 1 addition & 1 deletion adapter_docs/adapter_composition.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ In the following example, we load two adapters for semantic textual similarity (
We activate a parallel setup where the input is passed through both adapters and their respective prediction heads.

```python
model = AutoModelWithHeads.from_pretrained("distilbert-base-uncased")
model = AutoAdapterModel.from_pretrained("distilbert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

adapter1 = model.load_adapter("sts/sts-b@ukp")
Expand Down
11 changes: 11 additions & 0 deletions adapter_docs/classes/models/auto.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Auto Classes
============

Similar to the ``AutoModel`` classes built-in into HuggingFace Transformers, adapter-transformers provides an ``AutoAdapterModel`` class.
As with other auto classes, the correct adapter model class is automatically instantiated based on the pre-trained model passed to the ``from_pretrained()`` method.

AutoAdapterModel
~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.adapters.AutoAdapterModel
:members:
51 changes: 2 additions & 49 deletions adapter_docs/classes/models/bart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,57 +16,10 @@ According to the abstract,
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.

.. note::
This class is nearly identical to the PyTorch implementation of BART in Huggingface Transformers.
For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/bart.html>`_.


BartConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartConfig
:members:


BartTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartTokenizer
:members:



BartModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartModel
:members: forward


BartModelWithHeads
BartAdapterModel
~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartModelWithHeads
.. autoclass:: transformers.adapters.BartAdapterModel
:members:
:inherited-members: BartPretrainedModel


BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartForConditionalGeneration
:members: forward


BartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartForSequenceClassification
:members: forward


BartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BartForQuestionAnswering
:members: forward
78 changes: 2 additions & 76 deletions adapter_docs/classes/models/bert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,84 +5,10 @@ The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transfo
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It is a bidirectional transformer
pre-trained using a combination of masked language modeling objective and next sentence prediction.

.. note::
This class is nearly identical to the PyTorch implementation of BERT in Huggingface Transformers.
For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/bert.html>`_.

BertConfig
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertConfig
:members:


BertTokenizer
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


BertModel
~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertModel
:members:


BertModelWithHeads
BertAdapterModel
~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertModelWithHeads
.. autoclass:: transformers.adapters.BertAdapterModel
:members:
:inherited-members: BertPreTrainedModel


BertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForPreTraining
:members:


BertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForMaskedLM
:members:


BertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForNextSentencePrediction
:members:


BertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForSequenceClassification
:members:


BertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForMultipleChoice
:members:


BertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForTokenClassification
:members:


BertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BertForQuestionAnswering
:members:
57 changes: 2 additions & 55 deletions adapter_docs/classes/models/distilbert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,63 +8,10 @@ DistilBERT is a small, fast, cheap and light Transformer model trained by distil
parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of Bert's performances as measured on
the GLUE language understanding benchmark.

.. note::
This class is nearly identical to the PyTorch implementation of DistilBERT in Huggingface Transformers.
For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/distilbert.html>`_.


DistilBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertConfig
:members:


DistilBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertTokenizer
:members:


DistilBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertTokenizerFast
:members:


DistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertModel
:members:


DistilBertModelWithHeads
DistilBertAdapterModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertModelWithHeads
.. autoclass:: transformers.adapters.DistilBertAdapterModel
:members:
:inherited-members: DistilBertPreTrainedModel


DistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertForMaskedLM
:members:


DistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertForSequenceClassification
:members:


DistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.DistilBertForQuestionAnswering
:members:
6 changes: 0 additions & 6 deletions adapter_docs/classes/models/encoderdecoder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,6 @@ and decoder for a summarization model as was shown in: `Text Summarization with
This class is nearly identical to the PyTorch implementation of DistilBERT in Huggingface Transformers.
For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/distilbert.html>`_.

EncoderDecoderConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.EncoderDecoderConfig
:members:


EncoderDecoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
83 changes: 2 additions & 81 deletions adapter_docs/classes/models/gpt2.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
OpenAI GPT2
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_ by Alec
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
Expand All @@ -17,86 +14,10 @@ text. The diversity of the dataset causes this simple goal to contain naturally
across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than
10X the amount of data.*

Tips:

- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
this argument.

`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.

.. note::
This class is nearly identical to the PyTorch implementation of BERT in Huggingface Transformers.
For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/bert.html>`_.


GPT2Config
GPT2AdapterModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2Config
:members:


GPT2Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2Tokenizer
:members: save_vocabulary


GPT2TokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2TokenizerFast
:members:


GPT2 specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput
:members:


GPT2Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2Model
:members: forward


GPT2ModelWithHeads
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2ModelWithHeads
.. autoclass:: transformers.adapters.GPT2AdapterModel
:members:
:inherited-members: GPT2PreTrainedModel


GPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2LMHeadModel
:members: forward


GPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2DoubleHeadsModel
:members: forward


GPT2ForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPT2ForSequenceClassification
:members: forward
Loading