Skip to content
@MinishLab

The Minish Lab

Solving big problems with small models

Hi there 🥬

We are the minish lab 🍄! Welcome to our github page. We're a two-person (@pringled and @stephantul) open-source research lab, with a focus on Natural Language Processing. Our goal is to provide with usable and fun tools to make working with language data easy and fun.

  • We like fast things, so we focus on small models.
  • We like "classical" nlp and machine learning, so no LLM interfaces here.
  • We like cpu-bound work, not everyone has access to GPUs or wants to pay big tech companies for using GPUs.
  • We try to be as multi-lingual as possible: NLP work tends to focus purely on English, to the detriment of other languages.
  • We write in Python, and use a pretty opinionated stack (uv, everything fully typed, everything fully documented, no exceptions).
  • We try to be inclusive: if you'd like to help out, please let us know 🤗.

Main goals

We aim to make software that is:

  • Easy to use ⛓️
  • Fun to use 🥳
  • Opinionated 🤔
  • Open for integration 🧲
  • Original (does not re-invent the wheel) 🤸
  • Fast 🚴

In short, this means we make software packages that do one thing well, and that let you do that specific thing, and integrate it into a use of your choosing. We're not going to try and tell you what to do, we'll just show you what you can do, and we'll hope you have fun doing it.

You can also find us on: 🤗 huggingface 👽 LinkedIn

Pinned Loading

  1. model2vec model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    Python 519 21

  2. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 150 5

  3. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 24

Repositories

Showing 8 of 8 repositories
  • model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    MinishLab/model2vec’s past year of commit activity
    Python 519 MIT 21 0 1 Updated Dec 22, 2024
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 150 MIT 5 0 1 Updated Dec 21, 2024
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 24 MIT 0 1 0 Updated Dec 14, 2024
  • korok Public

    Lightweight Hybrid Search and Reranking

    MinishLab/korok’s past year of commit activity
    Python 2 MIT 1 0 2 Updated Dec 10, 2024
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 0 MIT 0 0 0 Updated Dec 9, 2024
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 0 0 1 Updated Oct 30, 2024
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Oct 14, 2024
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 9 MIT 0 0 0 Updated Sep 25, 2024