Examples of code that use Outlines to enable structured text generation for LLMs running on Modal.
Is your computer only capable of running small, quantized LLMs? Would you love to have an Nvidia H100 at home? Keep reading, because we're about to make your dream come true.
The basic idea: We deploy a function to Modal, an easy-to-use cloud platform, then we call that function locally from a Jupyter notebook.
The unbelievable part: The function will run on a super powerful GPU in the cloud. Plus, we get to use free processing credits ($30 monthly).
Gated repo: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Yeah, I know Mistral is good, but what if I want to test different models?
Coming soon.
How to serve an inference endpoint for structured text generation. GPU-powered. Serverless. Automatically scales. Pay as you go.
conda create --name modal-outlines-examples -c conda-forge python=3.11
conda activate modal-outlines-examples
pip install -r requirements.txt
python -m modal setup