-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Musicgen to MLX #206
Comments
That would be awesome! For conversions usually the best place to start is a reference PyTorch implementation. This is a slightly bigger project since it involves multiple models (encodec, the music generator and the LM). We already have a T5 example as you mentioned. It might make sense to have Encodec as a standalone example since it's useful for lot's of downstream audio generation. For example, I was looking recently at converting a different TTS model which also uses it. Maybe that is a good place to start? |
Yes, I was thinking the same of Separating Encodec as a different module as it could be used individually and in many TTS systems like VALLE and VITs. |
Is anyone working on a port of Encodec? If not I might take a stab myself as I'm interested in getting some audio generation up and running! |
@awni I just started yesterday night and there are modules which are directly available in torch like LSTM and sequential layers which are used in ENcodec but not available in mlx directly. I started from encodec main repo as it was pretty simple. I will be getting time only over weekends as i have my org works too. Can't give a timeline TBH. And i really want audio generation up and running. |
Got it. Ok let me know what's missing in terms of layers etc that should be in If you think it will take a while, you can always start a draft PR and we can collaborate on it! |
Yes i checked that PR today morning. it has most of the things. i will go through Encodec code and come back with more details. Maybe by tomorrow. @awni just a suggestion, can we keep discussions instead of issues as most of the issues reported here are only enhancements as mlx is still growing and wrt to performance i haven't got any issues till now. |
below modules will be required in mlx and some existing PRs and issues have already been addressed.
Rest of the items seems easily portable. for point 1 i have tried using cnn without dilation and group params as the values set did not have major impact when i went through pytorch code. it's default only. |
I'm digging into the C++ for it https://github.com/pytorch/pytorch/blob/834c7a1d3ea07878ad87d127ee28606fc140b552/aten/src/ATen/native/WeightNorm.cpp#L50 I'm fine with C++ but not configured to build the MLX project.. questioning motivation on a Saturday night haha. Do we have a Discord? I'd like to speak with someone, maybe there is already an implementation or an obvious way to get it done. I'm brand new (few hours) to MLX. Reading into https://github.com/ml-explore/mlx/blob/main/mlx/primitives.cpp |
We have a discord link here: ml-explore/mlx#733 You shouldn't need to implement weight norm in C++. That can all be done in Python using existing ops. |
That's great, and thanks a lot for the input @awni. Part of me was thinking that too. |
This was done. Check it out here. |
I would like to express my gratitude for your hard work on this project! I am interested in converting Musigen from the Meta team to MLX. To optimize the model for my limited RAM, I am considering using the smaller-model version (https://huggingface.co/facebook/musicgen-stereo-small).
Could you please suggest a good starting point for this conversion? From the Hugging Face implementation, I can see that it uses the T5 encoder for text encoding, which is already available in this repository, and Encodec from Meta team for audio encoding.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: