A port of Suno's Bark model in Apple's ML Framework, MLX. Bark is a transformer based text-to-audio model that can generate speech and miscellaneous audio i.e. background noise / music.
Repository is under active development, but the model is functional. Currently the model has a few dependencies that are not supported in MLX, such as encodec and the tokenizer. I am working on a port for these dependencies and will update the repository as soon as I have a working solution.
Hello World! My name is Bark and I'm running on Apple's new machine learning framework MLX
generation.mp4
Sorted by priority
- Add support for MLX based Encodec
- Add support for MLX based Tokenizer
- Fix softmax and multinomial sampling issue
- Add support for large model
- Support for max_gen_duration and history prompts
First, install the dependencies:
pip install -r requirements.txt
To convert a model, first download the Bark PyTorch checkpoint and convert
the weights to the MLX format. For example, to convert the small
model use:
huggingface-cli download suno/bark coarse.pt fine.pt text.pt
Then, convert the weights to the MLX format:
# for large model, specify --model large instead of small
python convert.py --torch_weights_dir weights/ --model small
# Run the model
python model.py --path weights/ --model small --text "hello world my name is bark"
Listed in requirements.txt
- Python 3.8 or later
- mlx
- transformers
- Huggingface CLI
- tqdm
- numpy
- torch
- encodec
- scipy
Thanks to Suno for the original model, weights and training code repository. Also thanks to the MLX team for the MLX framework and examples.
Links: