-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A workaround for multilingual MT by mBART with multiple language pairs and DDP #432
Comments
Is it possible to do this?
EDIT: EDIT: It is because ModelAdaptersMixin.forward_context() assign output_adapter_gating_scores only if active_adapters is not None. |
Hello, I am trying to do the same thing and I am getting the same error |
@coding-phoenix-12 Due to the long time that has passed, I cannot recall exactly how to resolve this, but from my previous comments you can see that this error was caused by the active_adapters being None somewhere. I suspect that if you can dive in and inspect, and set this property, it may resolve the issue. |
I want to implement an mBART model for multilingual MT, in which inputs are randomly sampled from multiple language pairs of parallel data. To achieve this, I need to activate two different pre-trained language adapters in mBART's encoder and decoder according to the data received. For example, the "en" adapter in encoder and the "fr" adapter in decoder when the data is en-fr pair. But it seems
adapter-transformers
do not support activating different adapters in encoder and decoder.My workaround is listed below:
set_encoder_active_adapters
andset_decoder_active_adapters
stated below.MutlilingualMixin
that haveset_encoder_active_adapters
andset_decoder_active_adapters
methods, which loopencoder.layers
ordecoder.layers
to findEncoderAdapterLayer
orDecoderAdapterLayer
.EncoderAdapterLayer
andDecoderAdapterLayer
is used to replace AdapterLayer in current BartEncoderLayerAdaptersMixin and BartDecoderLayerAdaptersMixin.MultilingualMBartModel
class by inherent MBartModel andMutlilingualMixin
.EncoderAdapterLayer
andDecoderAdapterLayer
inherent from AdapterLayer and override theget_active_setup
method to return the LID. In this way, theadapter_layer_forward
should use the adapter corresponding to this language.However, I am not sure whether it is work and is compatible with DDP training, as AdapterLayers in different GPUs will not be identically the same (the attribute of LID will be different).
Can you give me an answer or help?
The text was updated successfully, but these errors were encountered: