Skip to content

Commit

Permalink
feat: anthropic batching code (#1064)
Browse files Browse the repository at this point in the history
  • Loading branch information
jxnl authored Oct 20, 2024
1 parent a6d0a39 commit 0a18c1a
Show file tree
Hide file tree
Showing 4 changed files with 306 additions and 157 deletions.
137 changes: 92 additions & 45 deletions docs/cli/batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,32 @@ title: Managing Batch Jobs with OpenAI CLI
description: Learn how to create, list, and cancel batch jobs using the OpenAI Command Line Interface (CLI) for efficient job management.
---

# Using the Command Line Interface
# Using the Command Line Interface for Batch Jobs

The instructor CLI provides functionalities for managing batch jobs on OpenAI

The instructor CLI provides functionalities for managing batch jobs on both OpenAI and Anthropic platforms. This dual support allows users to leverage the strengths of both providers for their batch processing needs.

## Supported Providers

- **OpenAI**: Utilizes OpenAI's robust batch processing capabilities.
- **Anthropic**: Leverages Anthropic's advanced language models for batch operations.

To switch between providers, use the `--use-anthropic` flag in the relevant commands.

```bash
$ instructor batch --help

Usage: instructor batch [OPTIONS] COMMAND [ARGS]...

Manage OpenAI Batch jobs
Manage OpenAI and Anthropic Batch jobs

╭─ Options ───────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────╮
│ cancel Cancel a batch job │
│ create-from-file Create a batch job from a file │
│ download-file Download the file associated with a batch job │
│ list See all existing batch jobs
╰─────────────────────────────────────────────────────────────────────────────────────╯
```
Expand All @@ -43,14 +52,16 @@ $ instructor batch list --help
│ [default: 10] │
│ --screen --no-screen Enable or disable screen output │
│ [default: no-screen] │
│ --use-anthropic Use Anthropic API instead of OpenAI │
│ [default: False] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────╯
```

This returns a list of jobs as seen below
This returns a list of jobs as seen below:

```
$ instructor batch list --limit 9
```bash
$ instructor batch list --limit 5

OpenAI Batch Jobs
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
Expand All @@ -61,58 +72,50 @@ $ instructor batch list --limit 9
│ batch_zsTSsWVLgpEan… │ 2024-06-19 15:06:05 │ completed │ 0 │ 15 │ 15 │
│ batch_igaa2j9VBVw2Z… │ 2024-06-19 15:01:59 │ completed │ 0 │ 300 │ 300 │
│ batch_HcjI2wG46Y1LY… │ 2024-06-12 15:45:37 │ completed │ 0 │ 3 │ 3 │
│ batch_YiRKLAmKBhwxM… │ 2024-06-12 15:09:44 │ completed │ 0 │ 3 │ 3 │
│ batch_hS0XGlXzTVS7S… │ 2024-06-12 15:05:59 │ completed │ 0 │ 3 │ 3 │
│ batch_6s4FmcaV7woam… │ 2024-06-12 14:26:34 │ completed │ 0 │ 3 │ 3 │
└──────────────────────┴─────────────────────┴───────────┴────────┴───────────┴───────┘
```

### Create From File

You'll need to supply a valid .jsonl file in order to be able to create a Batch job.

??? info "Don't have a `.jsonl` file on hand?"

You can use Instructor to create the `.jsonl` with nothing more than simple pydantic and our `BatchJob` object as seen below.

```python
from instructor.batch import BatchJob
from pydantic import BaseModel, Field
from typing import Literal

You'll need to supply a valid .jsonl file to create a Batch job. Here's how you can create one using Instructor:

class Classification(BaseModel):
label: Literal["SPAM", "NOT_SPAM"] = Field(
..., description="Whether the email is spam or not"
)
```python
from instructor.batch import BatchJob
from pydantic import BaseModel, Field
from typing import Literal

class Classification(BaseModel):
label: Literal["SPAM", "NOT_SPAM"] = Field(
..., description="Whether the email is spam or not"
)

emails = [
"Hello there I'm a Nigerian prince and I want to give you money",
"Meeting with Thomas has been set at Friday next week",
"Here are some weekly product updates from our marketing team",
emails = [
"Hello there I'm a Nigerian prince and I want to give you money",
"Meeting with Thomas has been set at Friday next week",
"Here are some weekly product updates from our marketing team",
]

messages = [
[
{
"role": "system",
"content": f"Classify the following email {email}",
}
]
for email in emails
]

messages = [
[
{
"role": "user",
"content": f"Classify the following email {email}",
}
]
for email in emails
]
import json

BatchJob.create_from_messages(
with open("output.jsonl", "w") as f:
for line in BatchJob.create_from_messages(
messages,
model="gpt-3.5-turbo",
response_model=Classification,
max_tokens=100,
file_path="output.jsonl"
)
```

You can then import in the .jsonl file using the `instructor batch create-from-file` command
):
f.write(json.dumps(line) + "\n")
```

```bash
$ instructor batch create-from-file --help
Expand All @@ -124,13 +127,21 @@ Usage: instructor batch create-from-file [OPTIONS]
╭─ Options ───────────────────────────────────────────────────────────────────────────╮
* --file-path TEXT File containing the batch job requests [default: None] │
│ [required] │
│ --use-anthropic Use Anthropic API instead of OpenAI │
│ [default: False] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────╯
```

Example usage:

```bash
$ instructor batch create-from-file --file-path output.jsonl
```

### Cancelling a Batch Job

You can also cancel an outstanding batch job by using the `cancel` command.
You can cancel an outstanding batch job using the `cancel` command:

```bash
$ instructor batch cancel --help
Expand All @@ -141,6 +152,42 @@ $ instructor batch cancel --help

╭─ Options ───────────────────────────────────────────────────────────────────────────╮
* --batch-id TEXT Batch job ID to cancel [default: None] [required] │
│ --help Show this message and exit. │
│ --use-anthropic Use Anthropic API instead of OpenAI │
│ [default: False] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────╯
```

Example usage:

```bash
$ instructor batch cancel --batch-id batch_BSMSiMMy8on2D
```

### Downloading Batch Job Results

To download the results of a completed batch job:

```bash
$ instructor batch download-file --help

Usage: instructor batch download-file [OPTIONS]

Download the file associated with a batch job

╭─ Options ───────────────────────────────────────────────────────────────────────────╮
* --batch-id TEXT Batch job ID to download [default: None] [required] │
* --download-file-path TEXT Path to download file to [default: None] [required] │
│ --use-anthropic Use Anthropic API instead of OpenAI │
│ [default: False] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────╯
```

Example usage:

```bash
$ instructor batch download-file --batch-id batch_pD5dqHmqjWYF5 --download-file-path results.jsonl
```

This comprehensive set of commands allows you to manage batch jobs efficiently, whether you're using OpenAI or Anthropic as your provider.
129 changes: 75 additions & 54 deletions instructor/batch.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,13 @@
from typing import Literal, Any, Union, TypeVar
from typing import Any, Union, TypeVar, Optional
from collections.abc import Iterable
from pydantic import BaseModel, Field
from instructor.process_response import handle_response_model
import instructor
import uuid
import json

T = TypeVar("T", bound=BaseModel)

openai_models = Literal[
"gpt-4o",
"gpt-4-turbo",
"gpt-4",
"gpt-4-32k",
"gpt-3.5-turbo",
"gpt-3.5-turbo-16k",
"gpt-4-turbo-preview",
"gpt-4-vision-preview",
"gpt-4-turbo-2024-04-09",
"gpt-4-0314",
"gpt-4-32k-0314",
"gpt-4-32k-0613",
"gpt-3.5-turbo-0301",
"gpt-3.5-turbo-16k-0613",
"gpt-3.5-turbo-1106",
"gpt-3.5-turbo-0613",
]


class Function(BaseModel):
name: str
Expand All @@ -39,19 +21,17 @@ class Tool(BaseModel):


class RequestBody(BaseModel):
model: Union[openai_models, str]
model: str
messages: list[dict[str, Any]]
max_tokens: int = Field(default=1000)
temperature: float = Field(default=1.0)
tools: list[Tool]
tool_choice: dict[str, Any]
max_tokens: Optional[int] = Field(default=1000)
temperature: Optional[float] = Field(default=1.0)
tools: Optional[list[Tool]]
tool_choice: Optional[dict[str, Any]]


class BatchModel(BaseModel):
custom_id: str
method: Literal["POST"]
url: Literal["/v1/chat/completions"]
body: RequestBody
params: RequestBody


class BatchJob:
Expand All @@ -65,15 +45,29 @@ def parse_from_file(
for line in file:
data = json.loads(line)
try:
res.append(
response_model(
**json.loads(
data["response"]["body"]["choices"][0]["message"][
"tool_calls"
][0]["function"]["arguments"]
if (
"tool_calls"
in data["response"]["body"]["choices"][0]["message"]
):
# OpenAI format
res.append(
response_model(
**json.loads(
data["response"]["body"]["choices"][0]["message"][
"tool_calls"
][0]["function"]["arguments"]
)
)
)
else:
# Anthropic format
res.append(
response_model(
**json.loads(
data["result"]["message"]["content"][0]["text"]
)
)
)
)
except Exception:
error_objs.append(data)

Expand All @@ -89,15 +83,26 @@ def parse_from_string(
for line in lines:
data = json.loads(line)
try:
res.append(
response_model(
**json.loads(
data["response"]["body"]["choices"][0]["message"][
"tool_calls"
][0]["function"]["arguments"]
if "tool_calls" in data["response"]["body"]["choices"][0]["message"]:
# OpenAI format
res.append(
response_model(
**json.loads(
data["response"]["body"]["choices"][0]["message"][
"tool_calls"
][0]["function"]["arguments"]
)
)
)
else:
# Anthropic format
res.append(
response_model(
**json.loads(
data["result"]["message"]["content"][0]["text"]
)
)
)
)
except Exception:
error_objs.append(data)

Expand All @@ -109,28 +114,44 @@ def create_from_messages(
messages_batch: Union[
list[list[dict[str, Any]]], Iterable[list[dict[str, Any]]]
],
model: Union[openai_models, str],
model: str,
response_model: type[BaseModel],
file_path: str,
max_tokens: int = 1000,
temperature: float = 1.0,
max_tokens: Optional[int] = 1000,
temperature: Optional[float] = 1.0,
):
_, kwargs = handle_response_model(response_model=response_model)
use_anthropic = "claude" in model.lower()

if use_anthropic:
_, kwargs = handle_response_model(
response_model=response_model, mode=instructor.Mode.ANTHROPIC_JSON
)
else:
_, kwargs = handle_response_model(
response_model=response_model, mode=instructor.Mode.TOOLS
)

with open(file_path, "w") as file:
for messages in messages_batch:
file.write(
BatchModel(
if use_anthropic:
batch_model = BatchModel(
custom_id=str(uuid.uuid4()),
method="POST",
url="/v1/chat/completions",
body=RequestBody(
params=RequestBody(

Check failure on line 139 in instructor/batch.py

View workflow job for this annotation

GitHub Actions / Pyright (ubuntu-latest, 3.9)

Arguments missing for parameters "tools", "tool_choice" (reportCallIssue)

Check failure on line 139 in instructor/batch.py

View workflow job for this annotation

GitHub Actions / Pyright (ubuntu-latest, 3.10)

Arguments missing for parameters "tools", "tool_choice" (reportCallIssue)

Check failure on line 139 in instructor/batch.py

View workflow job for this annotation

GitHub Actions / Pyright (ubuntu-latest, 3.11)

Arguments missing for parameters "tools", "tool_choice" (reportCallIssue)
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
),
)
else:
batch_model = BatchModel(
custom_id=str(uuid.uuid4()),
params=RequestBody(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
**kwargs,
),
).model_dump_json()
+ "\n"
)
)
file.write(batch_model.model_dump_json() + "\n")
Loading

0 comments on commit 0a18c1a

Please sign in to comment.