Hotfixes for DistilBERT adapter & AdapterFusion implementations #102

calpt · 2020-12-07T14:14:36Z

This PR fixes the following for DistilBERT:

AdapterFusion regularization by moving the fusion regularization loss implementation to the model classes
the layer norm in the Transformer block: The layer norm module is part of the TransformerBlock class (different from BERT implementation), however we need to access it from the adapter module, which is a submodule of the former. Therefore, we need some sort of reference in the submodule. The previous implementation didn't work as it copied the layer norm weights.

Hotfixes for DistilBERT adapter & AdapterFusion implementations

29ebc5b

calpt added the bug Something isn't working label Dec 7, 2020

calpt marked this pull request as ready for review December 7, 2020 17:16

calpt requested a review from arueckle December 10, 2020 09:27

calpt merged commit 243ebda into adapter-hub:master Dec 10, 2020

calpt deleted the fix/distilbert_adapters branch December 10, 2020 09:38

Provide feedback