adapter-hub · calpt · Feb 23, 2022 · Feb 9, 2022 · Feb 16, 2022 · Feb 16, 2022
diff --git a/.github/workflows/tests_torch.yml b/.github/workflows/tests_torch.yml
@@ -60,4 +60,4 @@ jobs:
  pip install datasets
  - name: Test
  run: |
- make test-reduced
+ make test-adapters
diff --git a/Makefile b/Makefile
@@ -82,16 +82,8 @@ test:
 
 # Run the adapter tests
 
-test-adapter:
- python -m pytest -n auto --dist=loadfile -s -v\
- -k test_adapter\
- --ignore-glob='tests/test_tokenization*'\
- --ignore-glob='tests/test_processor*'\
- ./tests/
-
-# Run a reduced test suite in the CI pipeline of adapter-transformers
-test-reduced:
- python utils/run_tests.py
+test-adapters:
+ python -m pytest -n auto --dist=loadfile -s -v ./tests_adapters/
 
 # Run tests for examples
 

diff --git a/README.md b/README.md
@@ -62,6 +62,9 @@ To get started with adapters, refer to these locations:
 - **https://adapterhub.ml** to explore available pre-trained adapter modules and share your own adapters
 - **[Examples folder](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples)** of this repository containing HuggingFace's example training scripts, many adapted for training adapters
 
+## Supported Models
+
+We currently support the PyTorch versions of all models listed on the **[Model Overview](https://docs.adapterhub.ml/model_overview.html) page** in our documentation.
 
 ## Citation
 

diff --git a/adapter_docs/adapter_composition.md b/adapter_docs/adapter_composition.md
@@ -175,7 +175,7 @@ In the following example, we load two adapters for semantic textual similarity (
 We activate a parallel setup where the input is passed through both adapters and their respective prediction heads.
 
 ```python
-model = AutoModelWithHeads.from_pretrained("distilbert-base-uncased")
+model = AutoAdapterModel.from_pretrained("distilbert-base-uncased")
 tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
 
 adapter1 = model.load_adapter("sts/sts-b@ukp")

diff --git a/adapter_docs/classes/models/auto.rst b/adapter_docs/classes/models/auto.rst
@@ -0,0 +1,11 @@
+Auto Classes
+============
+
+Similar to the ``AutoModel`` classes built-in into HuggingFace Transformers, adapter-transformers provides an ``AutoAdapterModel`` class.
+As with other auto classes, the correct adapter model class is automatically instantiated based on the pre-trained model passed to the ``from_pretrained()`` method.
+
+AutoAdapterModel
+~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.adapters.AutoAdapterModel
+ :members:
diff --git a/adapter_docs/classes/models/bart.rst b/adapter_docs/classes/models/bart.rst
@@ -16,57 +16,10 @@ According to the abstract,
  state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
  of up to 6 ROUGE.
 
-.. note::
- This class is nearly identical to the PyTorch implementation of BART in Huggingface Transformers.
- For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/bart.html>`_.
 
-
-BartConfig
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BartConfig
- :members:
-
-
-BartTokenizer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BartTokenizer
- :members:
-
-
-
-BartModel
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BartModel
- :members: forward
-
-
-BartModelWithHeads
+BartAdapterModel
 ~~~~~~~~~~~~~~~~~~~~
 
-.. autoclass:: transformers.BartModelWithHeads
+.. autoclass:: transformers.adapters.BartAdapterModel
  :members:
  :inherited-members: BartPretrainedModel
-
-
-BartForConditionalGeneration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BartForConditionalGeneration
- :members: forward
-
-
-BartForSequenceClassification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BartForSequenceClassification
- :members: forward
-
-
-BartForQuestionAnswering
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BartForQuestionAnswering
- :members: forward
diff --git a/adapter_docs/classes/models/bert.rst b/adapter_docs/classes/models/bert.rst
@@ -5,84 +5,10 @@ The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transfo
 by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It is a bidirectional transformer
 pre-trained using a combination of masked language modeling objective and next sentence prediction.
 
-.. note::
- This class is nearly identical to the PyTorch implementation of BERT in Huggingface Transformers.
- For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/bert.html>`_.
 
-BertConfig
-~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertConfig
- :members:
-
-
-BertTokenizer
-~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertTokenizer
- :members: build_inputs_with_special_tokens, get_special_tokens_mask,
- create_token_type_ids_from_sequences, save_vocabulary
-
-
-BertModel
-~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertModel
- :members:
-
-
-BertModelWithHeads
+BertAdapterModel
 ~~~~~~~~~~~~~~~~~~~~
 
-.. autoclass:: transformers.BertModelWithHeads
+.. autoclass:: transformers.adapters.BertAdapterModel
  :members:
  :inherited-members: BertPreTrainedModel
-
-
-BertForPreTraining
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForPreTraining
- :members:
-
-
-BertForMaskedLM
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForMaskedLM
- :members:
-
-
-BertForNextSentencePrediction
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForNextSentencePrediction
- :members:
-
-
-BertForSequenceClassification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForSequenceClassification
- :members:
-
-
-BertForMultipleChoice
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForMultipleChoice
- :members:
-
-
-BertForTokenClassification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForTokenClassification
- :members:
-
-
-BertForQuestionAnswering
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.BertForQuestionAnswering
- :members:
diff --git a/adapter_docs/classes/models/distilbert.rst b/adapter_docs/classes/models/distilbert.rst
@@ -8,63 +8,10 @@ DistilBERT is a small, fast, cheap and light Transformer model trained by distil
 parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of Bert's performances as measured on
 the GLUE language understanding benchmark.
 
-.. note::
- This class is nearly identical to the PyTorch implementation of DistilBERT in Huggingface Transformers.
- For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/distilbert.html>`_.
 
-
-DistilBertConfig
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertConfig
- :members:
-
-
-DistilBertTokenizer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertTokenizer
- :members:
-
-
-DistilBertTokenizerFast
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertTokenizerFast
- :members:
-
-
-DistilBertModel
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertModel
- :members:
-
-
-DistilBertModelWithHeads
+DistilBertAdapterModel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. autoclass:: transformers.DistilBertModelWithHeads
+.. autoclass:: transformers.adapters.DistilBertAdapterModel
  :members:
  :inherited-members: DistilBertPreTrainedModel
-
-
-DistilBertForMaskedLM
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertForMaskedLM
- :members:
-
-
-DistilBertForSequenceClassification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertForSequenceClassification
- :members:
-
-
-DistilBertForQuestionAnswering
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.DistilBertForQuestionAnswering
- :members:
diff --git a/adapter_docs/classes/models/encoderdecoder.rst b/adapter_docs/classes/models/encoderdecoder.rst
@@ -31,12 +31,6 @@ and decoder for a summarization model as was shown in: `Text Summarization with
  This class is nearly identical to the PyTorch implementation of DistilBERT in Huggingface Transformers.
  For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/distilbert.html>`_.
 
-EncoderDecoderConfig
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.EncoderDecoderConfig
- :members:
-
 
 EncoderDecoderModel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/adapter_docs/classes/models/gpt2.rst b/adapter_docs/classes/models/gpt2.rst
@@ -1,9 +1,6 @@
 OpenAI GPT2
 -----------------------------------------------------------------------------------------------------------------------
 
-Overview
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
 OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
 <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_ by Alec
 Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
@@ -17,86 +14,10 @@ text. The diversity of the dataset causes this simple goal to contain naturally
 across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than
 10X the amount of data.*
 
-Tips:
-
-- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
- the left.
-- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
- token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
- observed in the `run_generation.py` example script.
-- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
- this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
- `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
- this argument.
-
-`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
-Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
-different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
-
-.. note::
- This class is nearly identical to the PyTorch implementation of BERT in Huggingface Transformers.
- For more information, visit `the corresponding section in their documentation <https://huggingface.co/transformers/model_doc/bert.html>`_.
-
 
-GPT2Config
+GPT2AdapterModel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. autoclass:: transformers.GPT2Config
- :members:
-
-
-GPT2Tokenizer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2Tokenizer
- :members: save_vocabulary
-
-
-GPT2TokenizerFast
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2TokenizerFast
- :members:
-
-
-GPT2 specific outputs
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput
- :members:
-
-
-GPT2Model
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2Model
- :members: forward
-
-
-GPT2ModelWithHeads
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2ModelWithHeads
+.. autoclass:: transformers.adapters.GPT2AdapterModel
  :members:
  :inherited-members: GPT2PreTrainedModel
-
-
-GPT2LMHeadModel
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2LMHeadModel
- :members: forward
-
-
-GPT2DoubleHeadsModel
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2DoubleHeadsModel
- :members: forward
-
-
-GPT2ForSequenceClassification
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: transformers.GPT2ForSequenceClassification
- :members: forward