Skip to content
@MinishLab

The Minish Lab

Solving big problems with small models

Hello, we're minish!

We're a two-person (@pringled and @stephantul) open-source company, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

  • Ingest the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: make tiny models that are still really really good.
  • potion: the best small model in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, approximate deduplication for your text datasets.

You can also find us on: 🤗 huggingface 👽 LinkedIn

Pinned Loading

  1. model2vec model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    Python 1k 45

  2. semhash semhash Public

    Fast Semantic Text Deduplication

    Python 497 21

  3. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 236 6

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 44 3

Repositories

Showing 9 of 9 repositories
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 0 0 0 Updated Feb 5, 2025
  • model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    MinishLab/model2vec’s past year of commit activity
    Python 1,010 MIT 45 2 1 Updated Feb 5, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 44 MIT 3 1 1 Updated Jan 29, 2025
  • semhash Public

    Fast Semantic Text Deduplication

    MinishLab/semhash’s past year of commit activity
    Python 497 MIT 21 1 2 Updated Jan 28, 2025
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 236 MIT 6 1 1 Updated Jan 28, 2025
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Jan 5, 2025
  • korok Public

    Lightweight Hybrid Search and Reranking

    MinishLab/korok’s past year of commit activity
    Python 7 MIT 1 0 0 Updated Dec 26, 2024
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 1 MIT 1 0 0 Updated Dec 9, 2024
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 9 MIT 0 0 0 Updated Sep 25, 2024

Top languages

Loading…

Most used topics

Loading…