MLX Community Projects #654

awni · 2024-02-08T16:11:11Z

awni
Feb 8, 2024
Maintainer

Let's collect some cool MLX integrations and community lead projects here for visibility!

If you have a project you would like to feature, leave a comment, and we will add it. If the project is build with MLX Swift, add it to the MLX Swift Community Project page.

Text Generation

mlx-ui: A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.
mlx-moe: Scripts to create your own moe models using mlx
mlx-rag: Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.
mlx-rag-gguf Minimal, clean code implementation of RAG with mlx using gguf model weights
mlx-llm: LLM applications running on Apple Silicon thanks to mlx from Apple
outlines-mlx: A fast minimalistic implementation of guided generation on Apple Silicon using Outlines and MLX
mlx-whatsapp: An mlx project to train a base model on your whatsapp chats using (Q)Lora finetuning
iClone: Clone your friends with iMessage and MLX
autogram: Grammar checker with a keyboard shortcut for Ollama and Apple MLX with Automator on macOS.
mamba.py: A simple Mamba implementation in PyTorch and MLX.
nanoGPT_mlx: Port of Andrej Karpathy's nanoGPT to Apple MLX framework.
mlx-tuning-fork: Very basic framework for parameterized large language model (Q)LoRa fine-tuning using mlx, mlx_lm, and OgbujiPT. Architecture for systematic running of easily parameterized fine-tunes
mlx-moe-models: A lightweight package for extending the mlx-lm to support custom moe models
chat-with-mlx: Chat with your data natively on Apple Silicon using MLX Framework.
transformerlab-app: A research platform to run, train, RAG, and evaluate LLMs through a GUI.
mlx-transformers: Model implementations in MLX with a similar interface as Hugging Face Transformers.
plpxsk/bert-qa: Fine tune BERT model for Q&A on MacBook.
VimLM A local LLM-powered Copilot for Vim—offline, private, and fully integrated into your workflow.

Vision

ml-aim: AIM: Autoregressive Image Models
mimm: MLX Image Models
mlx-image: MLX image models for Apple Silicon machines
aggressor: A simplest possible implementation of Autoregressive Image Generation without Vector Quantization in Apple MLX.
DINO_DETR_MLX: A port of the DINO DETR model for object detection in MLX.

Speech and Audio

mlx_bark: Port of Suno's Bark TTS transformer in Apple's MLX Framework

Multi-modal

voice-assistant: A simple toy demo of a local voice assistant with whisper and large language model.
Video_summarization_mlx: Transcribe and summarize youtube video using mlx
MLX-VLM: Run Vision LLMs locally on your Mac using MLX.
e2tts-mlx: A single-file implementation of Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS model in MLX.
whisper-turbo-mlx: A blazing fast single-file implementation of the OpenAI's Whisper Turbo (all in less than 250 lines of code).
NotebookMLX: A port of NotebookLlama to MLX. Generate podcasts fully on device.

Misc

mlx-omni-server: MLX Omni Server is a local inference server powered by Apple's MLX.
mlx-optimizers: Seamlessly experiment with and adopt new optimization algorithms into your MLX workflow!
mlx-graphs: Graph Neural Network library made for Apple Silicon
flower: Flower: A Friendly Federated Learning Framework
samplex: Package of useful sampling algorithms written in MLX.
rlx: A reinforcement learning framework based on MLX.
mlx3D: A library for deep learning with 3D data using MLX.
mlx-ctrc: CTC loss in MLX on the CPU and GPU.
mlx-hub: A command-line tool to search, download & manage MLX AI models on macOS.

Educational

Deep-Dive-Into-AI-With-MLX-PyTorch: "Deep Dive into AI with MLX and PyTorch" is an educational initiative designed to help anyone interested in AI, specifically in machine learning and deep learning, using Apple's MLX and Meta's PyTorch frameworks.
cv-ml-lecture-notebooks: Computer Vision and Machine Learning Jupyter Notebooks for Educational Purposes
mlx-micrograd: MLX port of micrograd - a tiny scalar-valued autograd engine with a small PyTorch-like neural network library on top.

pikaGPT: A tiny implementation of a GPT, accelerated for Apple Silicon, built on picoGPT.

chimezie · 2024-02-08T16:52:47Z

chimezie
Feb 8, 2024

Text generation: mlx-tuning-fork

0 replies

mzbac · 2024-02-08T17:21:20Z

mzbac
Feb 8, 2024

text generation: https://github.com/mzbac/mlx-moe-models
A lightweight package for extending the mlx-lm to support custom moe models

0 replies

noahfarr · 2024-02-09T14:44:13Z

noahfarr
Feb 9, 2024

An implementation of Reinforcement Learning algorithms in MLX based in the Implementations from CleanRL. Still WIP because it’s missing a benchmark and some other minor things, but the implementations work correctly.
https://github.com/noahfarr/rlx

0 replies

RahulBhalley · 2024-02-10T08:12:10Z

RahulBhalley
Feb 10, 2024

mlx-models. Currently supporting vision models by loading/converting from PyTorch checkpoints. Will later add support for text and audio models as well.

1 reply

awni Mar 1, 2024
Maintainer Author

Could you share a bit more about what it does?

qnguyen3 · 2024-03-01T08:41:43Z

qnguyen3
Mar 1, 2024

Hi I would love to add chat-with-mlx. It is a Chat UI + RAG Implementation on MLX. I wIll add more features later on (more advanced RAG pipeline + multimodal)

0 replies

adhulipa · 2024-03-11T08:40:15Z

adhulipa
Mar 11, 2024

I have an example of training a simple language model using BitLinear instead of nn.Linear. It's a port of Karpathy's minGPT to MLX along with a custom implementation of a BitLinear module. https://github.com/adhulipa/mlx-mingpt

I noticed this collection already has the far more meatier nanoGPT version ported to mlx, which is awesome! MinGPT, OTOH, is super simple, easy to follow and can serve as a reference example for folks looking to compare mlx equivalent operations from the original PyTorch implementation.

0 replies

aliasaria · 2024-03-27T06:01:07Z

aliasaria
Mar 27, 2024

Transformer Lab https://github.com/transformerlab/transformerlab-app is an LLM research platform that allows you to run, train, perform RAG, and evaluate LLMs through a GUI.

0 replies

Jaykef · 2024-03-29T13:41:18Z

Jaykef
Mar 29, 2024

MLX RAG with GGUF Models: https://github.com/Jaykef/mlx-rag-gguf

The code here builds on https://github.com/vegaluisjose/mlx-rag, it has been optimized to support RAG-based inferencing for .gguf models. I am using BAAI/bge-small-en for the embedding model, TinyLlama-1.1B-Chat-v1.0-GGUF as base model and the custom vector database script for indexing texts in a pdf file. Inference speeds can go up to ~413 tokens/sec for prompts and ~36 tokens/sec for generation on my 8G M2 Air.

0 replies

lin72h · 2024-03-30T07:13:52Z

lin72h
Mar 30, 2024

@Jaykef Very cool, thanks for sharing

0 replies

amirhossein-razlighi · 2024-03-31T10:47:04Z

amirhossein-razlighi
Mar 31, 2024

Vision: MLX3D A library for deep learning with 3D data using mlx.

3 replies

amirhossein-razlighi Apr 1, 2024

@awni
Can you please add this to the list? 🙏🏻

lin72h Apr 1, 2024

very cool! thanks for working on the 3D support

awni Apr 2, 2024
Maintainer Author

Done! Cool library btw!

dc-dc-dc · 2024-04-08T18:25:36Z

dc-dc-dc
Apr 8, 2024

mlx-lite & mlx-onnx

0 replies

otriscon · 2024-04-08T22:46:14Z

otriscon
Apr 8, 2024

JSON schema decoding (allowing function calling, including an OpenAI-compatible server with tools) using MLX: https://github.com/otriscon/llm-structured-output

0 replies

Extremys · 2024-05-01T08:16:22Z

Extremys
May 1, 2024

Hello for text generation part, I'm happy to share with you that I've proposed and contributed to the integration of MLX with LibreChat.ai. So now you can use your local LLM powered by MLX through a fancy interface privately, enjoy! :D

See danny-avila/LibreChat#2580

If in the future the community proposes an API servers supporting also multimodality, transcription, image generation for example, I will add them into LibreChat ;) It could be great also to have and LLM API supporting /models endpoint and multiple models simultaneously :D

1 reply

awni May 3, 2024
Maintainer Author

Awesome!!

NicoNico6 · 2024-05-02T13:29:49Z

NicoNico6
May 2, 2024

Hello, mlx community, we are happy to share with you that we have contributed the first strong sub-4 bit LLM model zoo for MLX community.

HF collections: https://huggingface.co/collections/GreenBitAI/greenbitai-mlx-llm-6614eb6ceb8da657c2b4ed58

The modern LLM families include Llama3/2, Phi-3, Mistral, 01-Yi, and Qwen. A mlx-style inference toolkit is also shared for the local web chatting.

gbx-lm here: https://github.com/GreenBitAI/gbx-lm.

We are an active team here, supporting the better low-bit community on the local platform. Enjoy!

3 replies

awni May 3, 2024
Maintainer Author

This looks really interesting! I'm curious if there is somewhere one can learn more about how you reduce the precision of the models?

NicoNico6 May 6, 2024

Thanks for your interest!

The MLX models are a subset of our main work green-bit-llm.

We construct the lower-bit models by combining neural architecture search and post-training quantization technique. All models are built from a mix precision of 4/2 bit group-wise min-max quantization only (for better large-scale deployment, e.g., MLX only supports 4/2 bit min-max quantization currently). The MLX models in HF collection is the layer-mix version in the model zoo.

Except better but not harder inference, we are also interested in the low-cost fine-tuning. We released Bitorch Engine for low-bit quantized neural network operations. Our release supports full parameter fine-tuning directly in quantized space, even under extremely constrained GPU resource conditions.

We are preparing a blog in huggingface for a better understanding of everything. Please stay tuned and star us in git if you like our project QvQ.

BuildBackBuehler May 29, 2024

Thanks for your interest!

The MLX models are a subset of our main work green-bit-llm.

We construct the lower-bit models by combining neural architecture search and post-training quantization technique. All models are built from a mix precision of 4/2 bit group-wise min-max quantization only (for better large-scale deployment, e.g., MLX only supports 4/2 bit min-max quantization currently). The MLX models in HF collection is the layer-mix version in the model zoo.

Except better but not harder inference, we are also interested in the low-cost fine-tuning. We released Bitorch Engine for low-bit quantized neural network operations. Our release supports full parameter fine-tuning directly in quantized space, even under extremely constrained GPU resource conditions.

We are preparing a blog in huggingface for a better understanding of everything. Please stay tuned and star us in git if you like our project QvQ.

Heck yeah, this is truly awesome! I've been banging my head trying to get a 3-bit Omniquant model to work but this'll be so much better. I'd love to see a low-bit precision Wizard LM2-8x22B (2.2, 2.5, 3.0 range)!

It'd also be great to see some metrics versus exllama/QUIP/AQLM as far as speed/accuracy. I'm (trying) to run an AQLM model right now and I believe it is still considered SotA but its about as much degradation as I'm willing to deal with. IIRC -10 to -15 (so say 78-80% Llama3 -> 65-68% accuracy).

In any case, I'll be giving your project an updoot and following it

Jaykef · 2024-05-05T15:32:55Z

Jaykef
May 5, 2024

mlx_micrograd - mlx port of Karpathy's micrograd - a tiny scalar-valued autograd engine with a small PyTorch-like neural network library on top.

Installation

pip install mlx_micrograd

Example usage

Example showing a number of possible supported operations:

from mlx_micrograd.engine import Value

a = Value(-4.0)
b = Value(2.0)
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu()
d += 3 * d + (b - a).relu()
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f
print(f'{g.data}') # prints array(24.7041, dtype=float32), the outcome of this forward pass
g.backward()
print(f'{a.grad}') # prints array(138.834, dtype=float32), i.e. the numerical value of dg/da
print(f'{b.grad}') # prints array(645.577, dtype=float32), i.e. the numerical value of dg/db

1 reply

Jaykef May 11, 2024

@awni this should be under Educational.

nkasmanoff · 2024-05-11T18:14:57Z

nkasmanoff
May 11, 2024

This one is a little stale, but I've taken the approach used for adding LoRA to LLMs and applied it to LlaVA in mlx-examples
https://github.com/nkasmanoff/mlx-llava-finetuning

Can use this as a starting point for fine tuning VLMs as datasets get more popular, like https://huggingface.co/datasets/HuggingFaceM4/the_cauldron

0 replies

plpxsk · 2024-07-17T17:52:17Z

plpxsk
Jul 17, 2024

plpxsk/pikaGPT: pikaGPT: A tiny implementation of a GPT, accelerated for Apple Silicon, built on picoGPT

0 replies

g-aggarwal · 2024-08-21T16:57:43Z

g-aggarwal
Aug 21, 2024

Hi,

I wanted to share my project here - MLX Hub
Its a command-line tool & library for searching, downloading and managing MLX AI models from Hugging Face Hub, on Apple Silicon Devices. Available for download from PyPI.

0 replies

g-aggarwal · 2024-08-23T21:35:48Z

g-aggarwal
Aug 23, 2024

I noticed the project description I provided in my previous comment looks a bit too verbose.

Do you mind updating it to this one -

mlx-hub - "A command-line tool to search, download & manage MLX AI models on macOS."

Thanks!

0 replies

johnmai-dev · 2024-09-09T03:22:06Z

johnmai-dev
Sep 9, 2024

🤖✨ChatMLX is a modern, open-source, high-performance chat application for MacOS based on large language models.
https://github.com/maiqingqiang/ChatMLX

Features 🚀

Multilingual: Supports English, Simplified Chinese, Traditional Chinese, Japanese, and Korean.
Multiple Models: Provides multiple models, including Llama, OpenELM, Phi, Qwen, Starcoder, Cohere, Gemma.
High Performance: Based on the powerful performance of MLX and Apple silicon.
Privacy and Security: Run LLM locally to ensure user privacy and security.
Open Source: Open source, welcome to contribute.

output.mp4

0 replies

JosefAlbers · 2024-09-13T11:15:02Z

JosefAlbers
Sep 13, 2024

aggressor: A simplest possible implementation of Autoregressive Image Generation without Vector Quantization in Apple MLX.

0 replies

sachinraja13 · 2024-09-30T12:08:41Z

sachinraja13
Sep 30, 2024

DINO_DETR_MLX : Port of the DINO DETR model for object detection in MLX. API to load pre-trained PyTorch model weights, training/fine-tuning and evaluation using COCO API. This implementation uses Data Loader from torchvision.datasets and also provides a simple custom data loader. Also added a synthetic dataset to run profiler for time/memory cost analysis without the need to download COCO dataset.

Please feel free to open an issue / pull request or start a discussion.

0 replies

JosefAlbers · 2024-10-06T04:16:03Z

JosefAlbers
Oct 6, 2024

e2tts-mlx: A single-file implementation of Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS model in MLX.

0 replies

JosefAlbers · 2024-10-17T09:50:04Z

JosefAlbers
Oct 17, 2024

whisper-turbo-mlx: A blazing fast single-file implementation of the openai's Whisper Turbo (all in less than 250 lines of code)

2 replies

awni Oct 17, 2024
Maintainer Author

This is so rad!

JosefAlbers Oct 18, 2024

Thanks so much! I really appreciate it!

johnmai-dev · 2024-10-30T03:57:41Z

johnmai-dev
Oct 30, 2024

NotebookMLX: A port of NotebookLlama to MLX! Generate podcasts fully on device 🚀🚀🚀

NotebookMLX Output Audio

_podcast.mp4

0 replies

plpxsk · 2024-11-05T18:53:28Z

plpxsk
Nov 5, 2024

plpxsk/bert-qa: Fine tune BERT model for Q&A on MacBook

Category: Text/NLP

Obtains comparable performance to original BERT on squad 1.1 dataset
 Builds off BERT implementation in ml-explore/mlx-examples

0 replies

stockeh · 2024-11-11T17:26:46Z

stockeh
Nov 11, 2024

mlx-optimizers: Seamlessly experiment with and adopt new optimization algorithms into your MLX workflow!

0 replies

madroidmaq · 2024-12-02T17:59:32Z

madroidmaq
Dec 2, 2024

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.

The server implements OpenAI-compatible endpoints:

Chat completions: /v1/chat/completions
- ✅ Chat
- ✅ Tools, Function Calling
- ✅ LogProbs
- 🚧 Vision
Audio
- ✅ /v1/audio/speech - Text-to-Speech
- ✅ /v1/audio/transcriptions - Speech-to-Text
Models
- ✅ /v1/models - List models
- ✅ /v1/models/{model} - Retrieve or Delete model
Images
- ✅ /v1/images/generations - Image generation

Quick Start

from openai import OpenAI

# Configure client to use local server
client = OpenAI(
    base_url="http://localhost:10240/v1",  # Point to local server
    api_key="not-needed"  # API key is not required for local server
)

# Text-to-Speech Example
response = client.audio.speech.create(
    model="lucasnewman/f5-tts-mlx",
    input="Hello, welcome to MLX Omni Server!"
)

# Speech-to-Text Example
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="mlx-community/whisper-large-v3-turbo",
    file=audio_file
)

# Chat Completion Example
chat_completion = client.chat.completions.create(
    model="meta-llama/Llama-3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "What can you do?"}
    ]
)

# Image Generation Example
image_response = client.images.generate(
    model="argmaxinc/mlx-FLUX.1-schnell",
    prompt="A serene landscape with mountains and a lake",
    n=1,
    size="512x512"
)

1 reply

awni Dec 3, 2024
Maintainer Author

Very cool!!

JosefAlbers · 2025-02-09T05:29:21Z

JosefAlbers
Feb 9, 2025

VimLM A local LLM-powered Copilot for Vim—offline, private, and fully integrated into your workflow.

0 replies

MLX Community Projects #654

awni Feb 8, 2024 Maintainer

Text Generation

Vision

Speech and Audio

Multi-modal

Misc

Educational

Replies: 29 comments · 12 replies

awni Mar 1, 2024 Maintainer Author

awni Apr 2, 2024 Maintainer Author

awni May 3, 2024 Maintainer Author

awni May 3, 2024 Maintainer Author

Installation

Example usage

Features 🚀

awni Oct 17, 2024 Maintainer Author

NotebookMLX Output Audio

awni Dec 3, 2024 Maintainer Author

awni
Feb 8, 2024
Maintainer

Replies: 29 comments 12 replies

awni Mar 1, 2024
Maintainer Author

awni Apr 2, 2024
Maintainer Author

awni May 3, 2024
Maintainer Author

awni May 3, 2024
Maintainer Author

awni Oct 17, 2024
Maintainer Author

awni Dec 3, 2024
Maintainer Author