Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Musicgen to MLX #206

Closed
akashicMarga opened this issue Dec 30, 2023 · 11 comments
Closed

Convert Musicgen to MLX #206

akashicMarga opened this issue Dec 30, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@akashicMarga
Copy link

I would like to express my gratitude for your hard work on this project! I am interested in converting Musigen from the Meta team to MLX. To optimize the model for my limited RAM, I am considering using the smaller-model version (https://huggingface.co/facebook/musicgen-stereo-small).

Could you please suggest a good starting point for this conversion? From the Hugging Face implementation, I can see that it uses the T5 encoder for text encoding, which is already available in this repository, and Encodec from Meta team for audio encoding.

Thank you for your assistance!

@awni
Copy link
Member

awni commented Dec 31, 2023

That would be awesome! For conversions usually the best place to start is a reference PyTorch implementation. This is a slightly bigger project since it involves multiple models (encodec, the music generator and the LM).

We already have a T5 example as you mentioned.

It might make sense to have Encodec as a standalone example since it's useful for lot's of downstream audio generation. For example, I was looking recently at converting a different TTS model which also uses it. Maybe that is a good place to start?

@awni awni added the enhancement New feature or request label Dec 31, 2023
@akashicMarga
Copy link
Author

Yes, I was thinking the same of Separating Encodec as a different module as it could be used individually and in many TTS systems like VALLE and VITs.

@awni
Copy link
Member

awni commented Jan 2, 2024

Is anyone working on a port of Encodec? If not I might take a stab myself as I'm interested in getting some audio generation up and running!

@akashicMarga
Copy link
Author

@awni I just started yesterday night and there are modules which are directly available in torch like LSTM and sequential layers which are used in ENcodec but not available in mlx directly. I started from encodec main repo as it was pretty simple. I will be getting time only over weekends as i have my org works too. Can't give a timeline TBH. And i really want audio generation up and running.

@awni
Copy link
Member

awni commented Jan 3, 2024

Got it. Ok let me know what's missing in terms of layers etc that should be in mlx.nn and we can prioritize getting them in. For example there is a PR for RNNs/LSTMs out now that we can try to get merged sooner.

If you think it will take a while, you can always start a draft PR and we can collaborate on it!

@akashicMarga
Copy link
Author

Yes i checked that PR today morning. it has most of the things. i will go through Encodec code and come back with more details. Maybe by tomorrow. @awni just a suggestion, can we keep discussions instead of issues as most of the issues reported here are only enhancements as mlx is still growing and wrt to performance i haven't got any issues till now.

@akashicMarga
Copy link
Author

@awni

below modules will be required in mlx and some existing PRs and issues have already been addressed.

  1. Setting dilation, groups in convolution layer - [Feature Request] Groups added to Conv2d mlx#100
  2. Addition of sequential layers like lstm, rnn, gru - Implement RNN, GRU, LSTM mlx#268
  3. torch has a function for full layer normalisation which will be helpful here - https://pytorch.org/docs/stable/_modules/torch/nn/utils/weight_norm.html#weight_norm

Rest of the items seems easily portable. for point 1 i have tried using cnn without dilation and group params as the values set did not have major impact when i went through pytorch code. it's default only.

@signalprime
Copy link

signalprime commented Feb 25, 2024

I'm digging into the C++ for it https://github.com/pytorch/pytorch/blob/834c7a1d3ea07878ad87d127ee28606fc140b552/aten/src/ATen/native/WeightNorm.cpp#L50

I'm fine with C++ but not configured to build the MLX project.. questioning motivation on a Saturday night haha. Do we have a Discord? I'd like to speak with someone, maybe there is already an implementation or an obvious way to get it done. I'm brand new (few hours) to MLX. Reading into https://github.com/ml-explore/mlx/blob/main/mlx/primitives.cpp

@awni
Copy link
Member

awni commented Feb 25, 2024

We have a discord link here: ml-explore/mlx#733

You shouldn't need to implement weight norm in C++. That can all be done in Python using existing ops.

@signalprime
Copy link

That's great, and thanks a lot for the input @awni. Part of me was thinking that too.

@awni
Copy link
Member

awni commented Nov 1, 2024

This was done. Check it out here.

@awni awni closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants