Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marian RNN conversion support #36651

Open
FricoRico opened this issue Mar 11, 2025 · 3 comments
Open

Marian RNN conversion support #36651

FricoRico opened this issue Mar 11, 2025 · 3 comments
Labels
Feature request Request for a new feature

Comments

@FricoRico
Copy link

Feature request

Add support for converting Marian Transformer RNN models.

Motivation

Firefox is doing great work at training new models. Their teacher models are able to be converted to PyTorch models using existing conversion tooling.

However their student models which are way smaller, and more efficient, are structured as Transformer-RNN. This is currently not supported by the conversion tools. Would it be even possible to add support for this?

Your contribution

At this point I'm just a parrot trying to seek help and information on this topic. It is far outside my current knowledge on what this exactly means. Perhaps if someone could point me in the right direction I could figure this out.

@Rocketknight1
Copy link
Member

Hi @FricoRico, I don't know if the Marian modeling code in Transformers supports Transformer-RNN architectures either! This means that you'd need to either:

  1. Convert the models as "custom code" models: https://huggingface.co/docs/transformers/en/custom_models
  2. Write a full PR to add the Transformer-RNN architecture to Transformers

@FricoRico
Copy link
Author

@Rocketknight1 Yeah you are right, ideally the Marian tools would also need to be expanded to support Transformer-RNN for inference. But I guess step one would be to even allow to export the models in the first place. But perhaps I'm over simplifying things.

@Rocketknight1
Copy link
Member

In general, we need modeling code first in order to support conversion of model checkpoints, rather than the other way around! The model code provides the "architecture" that runs a particular set of weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants