Skip to content

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

License

Notifications You must be signed in to change notification settings

tensorzero/tensorzero

Repository files navigation

TensorZero

TensorZero makes LLMs improve through experience.

  1. Integrate our model gateway
  2. Send metrics or feedback
  3. Optimize prompts, models, and inference-time strategies
  4. Watch your LLMs get smarter, cheaper, and faster over time

It provides a data & learning flywheel for LLMs by unifying:

  • Inference: one API for all LLMs, with <1ms P99 overhead
  • Observability: inference & feedback → your database
  • Optimization: from prompts to fine-tuning and RL (& even 🍓? )
  • Experimentation: built-in A/B testing, routing, fallbacks

Website · Docs · Twitter · Slack · Discord

Quick Start (5min) · Comprehensive Tutorial · Deployment Guide · API Reference · Configuration Reference

Overview


TensorZero Flywheel


  1. The TensorZero Gateway is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
  2. It handles structured schema-based inference with <1ms P99 latency overhead (see Benchmarks) and built-in observability, experimentation, and inference-time optimizations.
  3. It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
  4. Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
  5. Over time, TensorZero Recipes leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
  6. Finally, the gateway's experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.

Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience. Read more about our Vision & Roadmap.

Get Started

Next steps? The Quick Start shows it's easy to set up an LLM application with TensorZero. If you want to dive deeper, the Tutorial teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline.

Questions? Ask us on Slack or Discord.

Using TensorZero at work? Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team (free).

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Writing Haikus to Satisfy a Judge with Hidden Preferences

This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.

Improving Data Extraction (NER) by Fine-Tuning a Llama 3 Model

This example shows that an optimized Llama 3.1 8B model can be trained to outperform GPT-4o on a Named Entity Recognition (NER) task using a small amount of training data, and served by Fireworks at a fraction of the cost and latency.

Improving LLM Chess Ability with Best-of-N Sampling

This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Improving Data Extraction (NER) with Dynamic In-Context Learning

This example demonstrates how Dynamic In-Context Learning (DICL) can enhance Named Entity Recognition (NER) performance by leveraging relevant historical examples to improve data extraction accuracy and consistency without having to fine-tune a model.

Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)

TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows. But you can also easily create your own recipes and workflows! This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.

& many more on the way!