Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A workaround for multilingual MT by mBART with multiple language pairs and DDP #432

Closed
ZeguanXiao opened this issue Oct 7, 2022 · 4 comments
Labels
question Further information is requested

Comments

@ZeguanXiao
Copy link

ZeguanXiao commented Oct 7, 2022

I want to implement an mBART model for multilingual MT, in which inputs are randomly sampled from multiple language pairs of parallel data. To achieve this, I need to activate two different pre-trained language adapters in mBART's encoder and decoder according to the data received. For example, the "en" adapter in encoder and the "fr" adapter in decoder when the data is en-fr pair. But it seems adapter-transformers do not support activating different adapters in encoder and decoder.

My workaround is listed below:

  1. Get Language ID (LID) for input_ids and decoder_input_ids, and call set_encoder_active_adapters and set_decoder_active_adapters stated below.
  2. Implement a MutlilingualMixin that have set_encoder_active_adapters and set_decoder_active_adapters methods, which loop encoder.layers or decoder.layers to find EncoderAdapterLayer or DecoderAdapterLayer.
  3. The EncoderAdapterLayer and DecoderAdapterLayer is used to replace AdapterLayer in current BartEncoderLayerAdaptersMixin and BartDecoderLayerAdaptersMixin.
  4. Then set the LID to be an attribute of these adapter layer classes. Implement MultilingualMBartModel class by inherent MBartModel andMutlilingualMixin.
  5. The EncoderAdapterLayer and DecoderAdapterLayer inherent from AdapterLayer and override the get_active_setup method to return the LID. In this way, the adapter_layer_forward should use the adapter corresponding to this language.

However, I am not sure whether it is work and is compatible with DDP training, as AdapterLayers in different GPUs will not be identically the same (the attribute of LID will be different).

Can you give me an answer or help?

@ZeguanXiao ZeguanXiao added the question Further information is requested label Oct 7, 2022
@ZeguanXiao ZeguanXiao changed the title Multilingual MT by mBART with multiple language pairs and DDP A workaround for multilingual MT by mBART with multiple language pairs and DDP Oct 7, 2022
@ZeguanXiao
Copy link
Author

ZeguanXiao commented Oct 7, 2022

I also notice there are AdapterSetup and ForwardContext context managers, but I don't know how to use them in my use case. 257 and 267

@ZeguanXiao
Copy link
Author

ZeguanXiao commented Oct 7, 2022

Is it possible to do this?

if encoder_outputs is None:
    with AdapterSetup("en"):
        encoder_outputs = self.encoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

with AdapterSetup("fr"):
    decoder_outputs = self.decoder(
                input_ids=decoder_input_ids,
                attention_mask=decoder_attention_mask,
                encoder_hidden_states=encoder_outputs[0],
                encoder_attention_mask=attention_mask,
                head_mask=decoder_head_mask,
                cross_attn_head_mask=cross_attn_head_mask,
                past_key_values=past_key_values,
                inputs_embeds=decoder_inputs_embeds,
                use_cache=use_cache,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
                return_dict=return_dict,
            )

EDIT:
When I try the above code in MBartModel, ForwardContext.get_context() lost some attributes in the call of adapter_stack() if I don't call model.set_active_adapters. But if I call it, it seems to work fine.

EDIT:
error message: AttributeError: 'ForwardContext' object has no attribute 'output_adapter_gating_scores'

It is because ModelAdaptersMixin.forward_context() assign output_adapter_gating_scores only if active_adapters is not None.

@coding-phoenix-12
Copy link

Hello, I am trying to do the same thing and I am getting the same error
error message: AttributeError: 'ForwardContext' object has no attribute 'output_adapter_gating_scores'
Have you figured out a workaround for this?
Thanks!!

@ZeguanXiao
Copy link
Author

@coding-phoenix-12 Due to the long time that has passed, I cannot recall exactly how to resolve this, but from my previous comments you can see that this error was caused by the active_adapters being None somewhere. I suspect that if you can dive in and inspect, and set this property, it may resolve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants