MLX Transformers

MLX Transformers is a library that provides model implementations in MLX. It uses a similar model interface as HuggingFace Transformers and provides a way to load and use models in Apple Silicon devices. Implemented models have the same modules and module key as the original implementations in transformers.

MLX transformers is currently only available for inference on Apple Silicon devices. Training support will be added in the future.

Installation

This library is available on PyPI and can be installed using pip:

pip install mlx-transformers

It is also recommended to install asitop which can be super useful for monitoring the GPU and CPU usage on Apple Silicon devices.

Models Supported

Phi Family of Models (Phi3, Phi2, Phi)
LLama
Fuyu and Persimmon
Machine Translation Models (NLLB, M2M-100)
Encoder Models (Bert, RoBERTa, XLMRoberta, Sentence Transformers)

Chat Interface

MLX Transformers provides a streamlit chat interface that can be used to interact with the models. This template was adopted from https://github.com/da-z/mlx-ui.

The chat interface is available in the mlx_transformers/chat module and can be used as follows:

- cd chat
- bash start.sh

Quick Tour

A list of the available models can be found in the mlx_transformers.models module and are also listed in the section below. The following example demonstrates how to load a model and use it for inference:

You can load the model using MLX transformers in few lines of code

import mlx.core as mx
from transformers import BertConfig, BertTokenizer
from mlx_transformers.models import BertForMaskedLM as MLXBertForMaskedLM

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
config = BertConfig.from_pretrained("bert-base-uncased")

model = MLXBertForMaskedLM(config)
model.from_pretrained("bert-base-uncased")

sample_input = "Hello, world!"
inputs = tokenizer(sample_input, return_tensors="np")
inputs = {key: mx.array(v) for key, v in inputs.items()}

outputs = model(**inputs)

Sentence Transformer Example

import mlx.core as mx
import numpy as np

from transformers import AutoConfig, AutoTokenizer
from mlx_transformers.models import BertModel as MLXBertModel


def _mean_pooling(last_hidden_state: mx.array, attention_mask: mx.array):
    token_embeddings = last_hidden_state
    input_mask_expanded = mx.expand_dims(attention_mask, -1)
    input_mask_expanded = mx.broadcast_to(input_mask_expanded, token_embeddings.shape).astype(mx.float32)
    sum_embeddings = mx.sum(token_embeddings * input_mask_expanded, 1)
    sum_mask = mx.clip(input_mask_expanded.sum(axis=1), 1e-9, None)
    return sum_embeddings / sum_mask

sentences = ['This is an example sentence', 'Each sentence is converted']

tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
config = AutoConfig.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

model = MLXBertModel(config)
model.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

inputs = tokenizer(sentences, return_tensors="np", padding=True, truncation=True)
inputs = {key: mx.array(v) for key, v in inputs.items()}

outputs = model(**inputs)

sentence_embeddings = _mean_pooling(outputs.last_hidden_state, inputs.attention_mask)

Other Examples

The examples directory contains a few examples that demonstrate how to use the models in MLX Transformers.

LLama Example

python3 examples/llama_generation.py --model-name "meta-llama/Llama-2-7b-hf"

NLLB Translation Example

python3 examples/translation/nllb_translation.py --model_name facebook/nllb-200-distilled-600M --source_language English --target_language Yoruba --text_to_translate "Let us translate text to Yoruba"

Output:==> ['Ẹ jẹ́ ká tú àwọn ẹsẹ Bíbélì sí èdè Yoruba']

Phi Generation Example

python3 examples/text_generation/phi3_generation.py --temp 1.0

Benchmarks

Coming soon...

Contributions

Contributions to MLX transformers are welcome. We would like to have as many model implementations as possible. See the contributing documentation for instructions on setting up a development environment.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
chat		chat
docs		docs
examples		examples
images		images
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLX Transformers

Installation

Models Supported

Chat Interface

Quick Tour

Sentence Transformer Example

Other Examples

Benchmarks

Contributions

About

Releases 4

Packages

Contributors 3

Languages

License

ToluClassics/mlx-transformers

Folders and files

Latest commit

History

Repository files navigation

MLX Transformers

Installation

Models Supported

Chat Interface

Quick Tour

Sentence Transformer Example

Other Examples

Benchmarks

Contributions

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages