A workaround for multilingual MT by mBART with multiple language pairs and DDP #432

ZeguanXiao · 2022-10-07T12:20:12Z

I want to implement an mBART model for multilingual MT, in which inputs are randomly sampled from multiple language pairs of parallel data. To achieve this, I need to activate two different pre-trained language adapters in mBART's encoder and decoder according to the data received. For example, the "en" adapter in encoder and the "fr" adapter in decoder when the data is en-fr pair. But it seems adapter-transformers do not support activating different adapters in encoder and decoder.

My workaround is listed below:

Get Language ID (LID) for input_ids and decoder_input_ids, and call set_encoder_active_adapters and set_decoder_active_adapters stated below.
Implement a MutlilingualMixin that have set_encoder_active_adapters and set_decoder_active_adapters methods, which loop encoder.layers or decoder.layers to find EncoderAdapterLayer or DecoderAdapterLayer.
The EncoderAdapterLayer and DecoderAdapterLayer is used to replace AdapterLayer in current BartEncoderLayerAdaptersMixin and BartDecoderLayerAdaptersMixin.
Then set the LID to be an attribute of these adapter layer classes. Implement MultilingualMBartModel class by inherent MBartModel andMutlilingualMixin.
The EncoderAdapterLayer and DecoderAdapterLayer inherent from AdapterLayer and override the get_active_setup method to return the LID. In this way, the adapter_layer_forward should use the adapter corresponding to this language.

However, I am not sure whether it is work and is compatible with DDP training, as AdapterLayers in different GPUs will not be identically the same (the attribute of LID will be different).

Can you give me an answer or help?

The text was updated successfully, but these errors were encountered:

ZeguanXiao · 2022-10-07T12:55:59Z

I also notice there are AdapterSetup and ForwardContext context managers, but I don't know how to use them in my use case. 257 and 267

ZeguanXiao · 2022-10-07T13:08:25Z

Is it possible to do this?

if encoder_outputs is None:
    with AdapterSetup("en"):
        encoder_outputs = self.encoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

with AdapterSetup("fr"):
    decoder_outputs = self.decoder(
                input_ids=decoder_input_ids,
                attention_mask=decoder_attention_mask,
                encoder_hidden_states=encoder_outputs[0],
                encoder_attention_mask=attention_mask,
                head_mask=decoder_head_mask,
                cross_attn_head_mask=cross_attn_head_mask,
                past_key_values=past_key_values,
                inputs_embeds=decoder_inputs_embeds,
                use_cache=use_cache,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
                return_dict=return_dict,
            )

EDIT:
When I try the above code in MBartModel, ForwardContext.get_context() lost some attributes in the call of adapter_stack() if I don't call model.set_active_adapters. But if I call it, it seems to work fine.

EDIT:
error message: AttributeError: 'ForwardContext' object has no attribute 'output_adapter_gating_scores'

It is because ModelAdaptersMixin.forward_context() assign output_adapter_gating_scores only if active_adapters is not None.

coding-phoenix-12 · 2024-03-26T09:40:50Z

Hello, I am trying to do the same thing and I am getting the same error
error message: AttributeError: 'ForwardContext' object has no attribute 'output_adapter_gating_scores'
Have you figured out a workaround for this?
Thanks!!

ZeguanXiao · 2024-03-28T11:26:23Z

@coding-phoenix-12 Due to the long time that has passed, I cannot recall exactly how to resolve this, but from my previous comments you can see that this error was caused by the active_adapters being None somewhere. I suspect that if you can dive in and inspect, and set this property, it may resolve the issue.

ZeguanXiao added the question Further information is requested label Oct 7, 2022

ZeguanXiao changed the title ~~Multilingual MT by mBART with multiple language pairs and DDP~~ A workaround for multilingual MT by mBART with multiple language pairs and DDP Oct 7, 2022

ZeguanXiao closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A workaround for multilingual MT by mBART with multiple language pairs and DDP #432

A workaround for multilingual MT by mBART with multiple language pairs and DDP #432

ZeguanXiao commented Oct 7, 2022 •

edited

Loading

ZeguanXiao commented Oct 7, 2022 •

edited

Loading

ZeguanXiao commented Oct 7, 2022 •

edited

Loading

coding-phoenix-12 commented Mar 26, 2024

ZeguanXiao commented Mar 28, 2024

A workaround for multilingual MT by mBART with multiple language pairs and DDP #432

A workaround for multilingual MT by mBART with multiple language pairs and DDP #432

Comments

ZeguanXiao commented Oct 7, 2022 • edited Loading

ZeguanXiao commented Oct 7, 2022 • edited Loading

ZeguanXiao commented Oct 7, 2022 • edited Loading

coding-phoenix-12 commented Mar 26, 2024

ZeguanXiao commented Mar 28, 2024

ZeguanXiao commented Oct 7, 2022 •

edited

Loading

ZeguanXiao commented Oct 7, 2022 •

edited

Loading

ZeguanXiao commented Oct 7, 2022 •

edited

Loading