Skip to content

Commit

Permalink
feat: add mixedbread ai integration (run-llama#953)
Browse files Browse the repository at this point in the history
  • Loading branch information
juliuslipp authored Jul 9, 2024
1 parent a0f424e commit 2774681
Show file tree
Hide file tree
Showing 9 changed files with 11,631 additions and 14,941 deletions.
6 changes: 6 additions & 0 deletions .changeset/healthy-elephants-remain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
"llamaindex": patch
"docs": patch
---

Add mixedbread's embeddings and reranking API
100 changes: 100 additions & 0 deletions apps/docs/docs/modules/embeddings/available_embeddings/mixedbreadai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# MixedbreadAI

Welcome to the mixedbread embeddings guide! This guide will help you use the mixedbread ai's API to generate embeddings for your text documents, ensuring you get the most relevant information, just like picking the freshest bread from the bakery.

To find out more about the latest features, updates, and available models, visit [mixedbread.ai](https://mixedbread-ai.com/).

## Table of Contents

1. [Setup](#setup)
2. [Usage with LlamaIndex](#integration-with-llamaindex)
3. [Embeddings with Custom Parameters](#embeddings-with-custom-parameters)

## Setup

First, you will need to install the `llamaindex` package.

```bash
pnpm install llamaindex
```

Next, sign up for an API key at [mixedbread.ai](https://mixedbread.ai/). Once you have your API key, you can import the necessary modules and create a new instance of the `MixedbreadAIEmbeddings` class.

```ts
import { MixedbreadAIEmbeddings, Document, Settings } from "llamaindex";
```

## Usage with LlamaIndex

This section will guide you through integrating mixedbread embeddings with LlamaIndex for more advanced usage.

### Step 1: Load and Index Documents

For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index, like a variety of breads in a bakery.

```ts
Settings.embedModel = new MixedbreadAIEmbeddings({
apiKey: "<MIXEDBREAD_API_KEY>",
model: "mixedbread-ai/mxbai-embed-large-v1",
});

const document = new Document({
text: "The true source of happiness.",
id_: "bread",
});

const index = await VectorStoreIndex.fromDocuments([document]);
```

### Step 2: Create a Query Engine

Combine the retriever and the embed model to create a query engine. This setup ensures that your queries are processed to provide the best results, like arranging the bread in the order of freshness and quality.

Models can require prompts to generate embeddings for queries, in the 'mixedbread-ai/mxbai-embed-large-v1' model's case, the prompt is `Represent this sentence for searching relevant passages:`.

```ts
const queryEngine = index.asQueryEngine();

const query =
"Represent this sentence for searching relevant passages: What is bread?";

// Log the response
const results = await queryEngine.query(query);
console.log(results); // Serving up the freshest, most relevant results.
```

## Embeddings with Custom Parameters

This section will guide you through generating embeddings with custom parameters and usage with f.e. matryoshka and binary embeddings.

### Step 1: Create an Instance of MixedbreadAIEmbeddings

Create a new instance of the `MixedbreadAIEmbeddings` class with custom parameters. For example, to use the `mixedbread-ai/mxbai-embed-large-v1` model with a batch size of 64, normalized embeddings, and binary encoding format:

```ts
const embeddings = new MixedbreadAIEmbeddings({
apiKey: "<MIXEDBREAD_API_KEY>",
model: "mixedbread-ai/mxbai-embed-large-v1",
batchSize: 64,
normalized: true,
dimensions: 512,
encodingFormat: MixedbreadAI.EncodingFormat.Binary,
});
```

### Step 2: Define Texts

Define the texts you want to generate embeddings for.

```ts
const texts = ["Bread is life", "Bread is love"];
```

### Step 3: Generate Embeddings

Use the `embedDocuments` method to generate embeddings for the texts.

```ts
const result = await embeddings.embedDocuments(texts);
console.log(result); // Perfectly customized embeddings, ready to serve.
```
164 changes: 164 additions & 0 deletions apps/docs/docs/modules/node_postprocessors/mixedbreadiai_reranker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# MixedbreadAI

Welcome to the mixedbread ai reranker guide! This guide will help you use mixedbread ai's API to rerank search query results, ensuring you get the most relevant information, just like picking the freshest bread from the bakery.

To find out more about the latest features and updates, visit the [mixedbread.ai](https://mixedbread.ai/).

## Table of Contents

1. [Setup](#setup)
2. [Usage with LlamaIndex](#integration-with-llamaindex)
3. [Simple Reranking Guide](#simple-reranking-guide)
4. [Reranking with Objects](#reranking-with-objects)

## Setup

First, you will need to install the `llamaindex` package.

```bash
pnpm install llamaindex
```

Next, sign up for an API key at [mixedbread.ai](https://mixedbread.ai/). Once you have your API key, you can import the necessary modules and create a new instance of the `MixedbreadAIReranker` class.

```ts
import {
MixedbreadAIReranker,
Document,
OpenAI,
VectorStoreIndex,
Settings,
} from "llamaindex";
```

## Usage with LlamaIndex

This section will guide you through integrating mixedbread's reranker with LlamaIndex.

### Step 1: Load and Index Documents

For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index, like a variety of breads in a bakery.

```ts
const document = new Document({
text: "This is a sample document.",
id_: "sampleDoc",
});

Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0.1 });

const index = await VectorStoreIndex.fromDocuments([document]);
```

### Step 2: Increase Similarity TopK

The default value for `similarityTopK` is 2, which means only the most similar document will be returned. To get more results, like picking a variety of fresh breads, you can increase the value of `similarityTopK`.

```ts
const retriever = index.asRetriever();
retriever.similarityTopK = 5;
```

### Step 3: Create a MixedbreadAIReranker Instance

Create a new instance of the `MixedbreadAIReranker` class.

```ts
const nodePostprocessor = new MixedbreadAIReranker({
apiKey: "<MIXEDBREAD_API_KEY>",
topN: 4,
});
```

### Step 4: Create a Query Engine

Combine the retriever and node postprocessor to create a query engine. This setup ensures that your queries are processed and reranked to provide the best results, like arranging the bread in the order of freshness and quality.

```ts
const queryEngine = index.asQueryEngine({
retriever,
nodePostprocessors: [nodePostprocessor],
});

// Log the response
const response = await queryEngine.query("Where did the author grow up?");
console.log(response);
```

With mixedbread's Reranker, you're all set to serve up the most relevant and well-ordered results, just like a skilled baker arranging their best breads for eager customers. Enjoy the perfect blend of technology and culinary delight!

## Simple Reranking Guide

This section will guide you through a simple reranking process using mixedbread ai.

### Step 1: Create an Instance of MixedbreadAIReranker

Create a new instance of the `MixedbreadAIReranker` class, passing in your API key and the number of results you want to return. It's like setting up your bakery to offer a specific number of freshly baked items.

```ts
const reranker = new MixedbreadAIReranker({
apiKey: "<MIXEDBREAD_API_KEY>",
topN: 4,
});
```

### Step 2: Define Nodes and Query

Define the nodes (documents) you want to rerank and the query.

```ts
const nodes = [
{ node: new BaseNode("To bake bread you need flour") },
{ node: new BaseNode("To bake bread you need yeast") },
];
const query = "What do you need to bake bread?";
```

### Step 3: Perform Reranking

Use the `postprocessNodes` method to rerank the nodes based on the query.

```ts
const result = await reranker.postprocessNodes(nodes, query);
console.log(result); // Like pulling freshly baked nodes out of the oven.
```

## Reranking with Objects

This section will guide you through reranking when working with objects.

### Step 1: Create an Instance of MixedbreadAIReranker

Create a new instance of the `MixedbreadAIReranker` class, just like before.

```ts
const reranker = new MixedbreadAIReranker({
apiKey: "<MIXEDBREAD_API_KEY>",
model: "mixedbread-ai/mxbai-rerank-large-v1",
topK: 5,
rankFields: ["title", "content"],
returnInput: true,
maxRetries: 5,
});
```

### Step 2: Define Documents and Query

Define the documents (objects) you want to rerank and the query.

```ts
const documents = [
{ title: "Bread Recipe", content: "To bake bread you need flour" },
{ title: "Bread Recipe", content: "To bake bread you need yeast" },
];
const query = "What do you need to bake bread?";
```

### Step 3: Perform Reranking

Use the `rerank` method to reorder the documents based on the query.

```ts
const result = await reranker.rerank(documents, query);
console.log(result); // Perfectly customized results, ready to serve.
```
1 change: 1 addition & 0 deletions packages/llamaindex/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
"chromadb": "1.8.1",
"cohere-ai": "7.9.5",
"groq-sdk": "^0.5.0",
"@mixedbread-ai/sdk": "^2.2.11",
"js-tiktoken": "^1.0.12",
"lodash": "^4.17.21",
"magic-bytes.js": "^1.10.0",
Expand Down
Loading

0 comments on commit 2774681

Please sign in to comment.