Convert Musicgen to MLX #206

akashicMarga · 2023-12-30T11:12:05Z

I would like to express my gratitude for your hard work on this project! I am interested in converting Musigen from the Meta team to MLX. To optimize the model for my limited RAM, I am considering using the smaller-model version (https://huggingface.co/facebook/musicgen-stereo-small).

Could you please suggest a good starting point for this conversion? From the Hugging Face implementation, I can see that it uses the T5 encoder for text encoding, which is already available in this repository, and Encodec from Meta team for audio encoding.

Thank you for your assistance!

awni · 2023-12-31T05:59:28Z

That would be awesome! For conversions usually the best place to start is a reference PyTorch implementation. This is a slightly bigger project since it involves multiple models (encodec, the music generator and the LM).

We already have a T5 example as you mentioned.

It might make sense to have Encodec as a standalone example since it's useful for lot's of downstream audio generation. For example, I was looking recently at converting a different TTS model which also uses it. Maybe that is a good place to start?

akashicMarga · 2024-01-01T07:41:19Z

Yes, I was thinking the same of Separating Encodec as a different module as it could be used individually and in many TTS systems like VALLE and VITs.

awni · 2024-01-02T21:26:03Z

Is anyone working on a port of Encodec? If not I might take a stab myself as I'm interested in getting some audio generation up and running!

akashicMarga · 2024-01-03T05:00:24Z

@awni I just started yesterday night and there are modules which are directly available in torch like LSTM and sequential layers which are used in ENcodec but not available in mlx directly. I started from encodec main repo as it was pretty simple. I will be getting time only over weekends as i have my org works too. Can't give a timeline TBH. And i really want audio generation up and running.

awni · 2024-01-03T05:02:10Z

Got it. Ok let me know what's missing in terms of layers etc that should be in mlx.nn and we can prioritize getting them in. For example there is a PR for RNNs/LSTMs out now that we can try to get merged sooner.

If you think it will take a while, you can always start a draft PR and we can collaborate on it!

akashicMarga · 2024-01-03T05:08:22Z

Yes i checked that PR today morning. it has most of the things. i will go through Encodec code and come back with more details. Maybe by tomorrow. @awni just a suggestion, can we keep discussions instead of issues as most of the issues reported here are only enhancements as mlx is still growing and wrt to performance i haven't got any issues till now.

akashicMarga · 2024-01-04T06:27:43Z

@awni

below modules will be required in mlx and some existing PRs and issues have already been addressed.

Setting dilation, groups in convolution layer - [Feature Request] Groups added to Conv2d mlx#100
Addition of sequential layers like lstm, rnn, gru - Implement RNN, GRU, LSTM mlx#268
torch has a function for full layer normalisation which will be helpful here - https://pytorch.org/docs/stable/_modules/torch/nn/utils/weight_norm.html#weight_norm

Rest of the items seems easily portable. for point 1 i have tried using cnn without dilation and group params as the values set did not have major impact when i went through pytorch code. it's default only.

signalprime · 2024-02-25T05:34:07Z

I'm digging into the C++ for it https://github.com/pytorch/pytorch/blob/834c7a1d3ea07878ad87d127ee28606fc140b552/aten/src/ATen/native/WeightNorm.cpp#L50

I'm fine with C++ but not configured to build the MLX project.. questioning motivation on a Saturday night haha. Do we have a Discord? I'd like to speak with someone, maybe there is already an implementation or an obvious way to get it done. I'm brand new (few hours) to MLX. Reading into https://github.com/ml-explore/mlx/blob/main/mlx/primitives.cpp

awni · 2024-02-25T06:24:13Z

We have a discord link here: ml-explore/mlx#733

You shouldn't need to implement weight norm in C++. That can all be done in Python using existing ops.

signalprime · 2024-02-25T06:32:05Z

That's great, and thanks a lot for the input @awni. Part of me was thinking that too.

awni · 2024-11-01T18:06:29Z

This was done. Check it out here.

awni added the enhancement New feature or request label Dec 31, 2023

signalprime mentioned this issue Feb 25, 2024

possibly use MLX for MacOS users with WhisperSpeech WhisperSpeech/WhisperSpeech#111

Open

awni closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert Musicgen to MLX #206

Convert Musicgen to MLX #206

akashicMarga commented Dec 30, 2023

awni commented Dec 31, 2023

akashicMarga commented Jan 1, 2024

awni commented Jan 2, 2024

akashicMarga commented Jan 3, 2024

awni commented Jan 3, 2024

akashicMarga commented Jan 3, 2024

akashicMarga commented Jan 4, 2024

signalprime commented Feb 25, 2024 •

edited

Loading

awni commented Feb 25, 2024

signalprime commented Feb 25, 2024

awni commented Nov 1, 2024

Convert Musicgen to MLX #206

Convert Musicgen to MLX #206

Comments

akashicMarga commented Dec 30, 2023

awni commented Dec 31, 2023

akashicMarga commented Jan 1, 2024

awni commented Jan 2, 2024

akashicMarga commented Jan 3, 2024

awni commented Jan 3, 2024

akashicMarga commented Jan 3, 2024

akashicMarga commented Jan 4, 2024

signalprime commented Feb 25, 2024 • edited Loading

awni commented Feb 25, 2024

signalprime commented Feb 25, 2024

awni commented Nov 1, 2024

signalprime commented Feb 25, 2024 •

edited

Loading