Track usage for OpenAI models even when streaming #591

simonw · 2024-10-29T00:35:27Z

OpenAI used to not return usage information in streams, but now they do:

https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

simonw · 2024-10-29T00:39:25Z

Testing my implementation manually:

llm -m gpt-4o-mini hi

llm logs -c --json

[
  {
    "id": "01jbav6h7k7gg9f486p9n0nw58",
    "model": "gpt-4o-mini",
    "prompt": "hi",
    "system": null,
    "prompt_json": {
      "messages": [
        {
          "role": "user",
          "content": "hi"
        }
      ]
    },
    "options_json": {},
    "response": "Hello! How can I assist you today?",
    "response_json": {
      "content": "Hello! How can I assist you today?",
      "finish_reason": "stop",
      "usage": {
        "completion_tokens": 9,
        "prompt_tokens": 8,
        "total_tokens": 17,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 0
        }
      },
      "id": "chatcmpl-ANUV2zARjPvOzBEJoeJAHMZztmyMA",
      "object": "chat.completion.chunk",
      "model": "gpt-4o-mini-2024-07-18",
      "created": 1730162148
    },
    "conversation_id": "01jbav6h7h5ek2fyykq19kb56y",
    "duration_ms": 1120,
    "datetime_utc": "2024-10-29T00:35:47.471634",
    "conversation_name": "hi",
    "conversation_model": "gpt-4o-mini",
    "attachments": []
  }
]

With --no-stream:

llm -m gpt-4o-mini hi --no-stream

[
  {
    "id": "01jbav9cg9ecktc06rtvmdm0gx",
    "model": "gpt-4o-mini",
    "prompt": "hi",
    "system": null,
    "prompt_json": {
      "messages": [
        {
          "role": "user",
          "content": "hi"
        }
      ]
    },
    "options_json": {},
    "response": "Hello! How can I assist you today?",
    "response_json": {
      "id": "chatcmpl-ANUWXkIeaElMNKblDP83pT9qk9t0m",
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "message": {
            "content": "Hello! How can I assist you today?",
            "role": "assistant"
          }
        }
      ],
      "created": 1730162241,
      "model": "gpt-4o-mini-2024-07-18",
      "object": "chat.completion",
      "system_fingerprint": "fp_f59a81427f",
      "usage": {
        "completion_tokens": 9,
        "prompt_tokens": 8,
        "total_tokens": 17,
        "prompt_tokens_details": {
          "cached_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 0
        }
      }
    },
    "conversation_id": "01jbav9cg7qntkka2hvca72yhd",
    "duration_ms": 603,
    "datetime_utc": "2024-10-29T00:37:21.450264",
    "conversation_name": "hi",
    "conversation_model": "gpt-4o-mini",
    "attachments": []
  }
]

For a completion model:

llm -m gpt-3.5-turbo-instruct 'capital of france is '

[
  {
    "id": "01jbavbrwsrtwgt5snjz6t7fgd",
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "capital of france is ",
    "system": null,
    "prompt_json": {
      "messages": [
        "capital of france is "
      ]
    },
    "options_json": {},
    "response": " Paris\n\n",
    "response_json": {
      "content": " Paris\n\n",
      "usage": {
        "completion_tokens": 2,
        "prompt_tokens": 5,
        "total_tokens": 7
      },
      "id": "cmpl-ANUXorptC5yuMOMLF1Z4JL2pCz2DF",
      "object": "text_completion",
      "model": "gpt-3.5-turbo-instruct",
      "created": 1730162320
    },
    "conversation_id": "01jbavbrwqfp0b0t22x9tk71at",
    "duration_ms": 633,
    "datetime_utc": "2024-10-29T00:38:39.644058",
    "conversation_name": "capital of france is ",
    "conversation_model": "gpt-3.5-turbo-instruct",
    "attachments": []
  }
]

And that with --no-stream:

[
  {
    "id": "01jbavcgc4f40zy4ynqnvbmv24",
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "capital of france is ",
    "system": null,
    "prompt_json": {
      "messages": [
        "capital of france is "
      ]
    },
    "options_json": {},
    "response": " paris\n\n\nThe capital of France is indeed Paris. It is located in the northern part of the country and is known for its historic landmarks, art, fashion, and cuisine. It is also a major global city, with a population of over 2 million people. The Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum are some of its most famous attractions. Paris is also known as the \"City of Love\" and is a popular tourist destination.",
    "response_json": {
      "id": "cmpl-ANUYBosUNuPgLZJZF20dxAHKUhN8b",
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "text": " paris\n\n\nThe capital of France is indeed Paris. It is located in the northern part of the country and is known for its historic landmarks, art, fashion, and cuisine. It is also a major global city, with a population of over 2 million people. The Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum are some of its most famous attractions. Paris is also known as the \"City of Love\" and is a popular tourist destination."
        }
      ],
      "created": 1730162343,
      "model": "gpt-3.5-turbo-instruct",
      "object": "text_completion",
      "usage": {
        "completion_tokens": 96,
        "prompt_tokens": 5,
        "total_tokens": 101
      }
    },
    "conversation_id": "01jbavcgc2y0vm77r63q0pz9x1",
    "duration_ms": 1643,
    "datetime_utc": "2024-10-29T00:39:02.677491",
    "conversation_name": "capital of france is ",
    "conversation_model": "gpt-3.5-turbo-instruct",
    "attachments": []
  }
]

simonw · 2024-10-29T00:40:16Z

The LLM_OPENAI_SHOW_RESPONSES=1 option was useful here too:

LLM_OPENAI_SHOW_RESPONSES=1 llm -m gpt-3.5-turbo-instruct 'capital of france is '
Request: POST https://api.openai.com/v1/completions
  Headers:
    host: api.openai.com
    connection: keep-alive
    accept: application/json
    content-type: application/json
    user-agent: OpenAI/Python 1.37.0
    x-stainless-lang: python
    x-stainless-package-version: 1.37.0
    x-stainless-os: MacOS
    x-stainless-arch: arm64
    x-stainless-runtime: CPython
    x-stainless-runtime-version: 3.10.4
    authorization: [...]
    x-stainless-async: false
    content-length: 148
  Body:
    {
      "model": "gpt-3.5-turbo-instruct",
      "prompt": "capital of france is ",
      "max_tokens": 256,
      "stream": true,
      "stream_options": {
        "include_usage": true
      }
    }
Response: status_code=200
  Headers:
    date: Tue, 29 Oct 2024 00:39:39 GMT
    content-type: text/event-stream
    transfer-encoding: chunked
    connection: keep-alive
    access-control-allow-origin: *
    access-control-expose-headers: X-Request-ID
    cache-control: no-cache, must-revalidate
    openai-model: gpt-3.5-turbo-instruct
    openai-organization: user-r3e61fpak04cbaokp5buoae4
    openai-processing-ms: 281
    openai-version: 2020-10-01
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    x-ratelimit-limit-requests: 3500
    x-ratelimit-limit-tokens: 90000
    x-ratelimit-remaining-requests: 3499
    x-ratelimit-remaining-tokens: 89739
    x-ratelimit-reset-requests: 17ms
    x-ratelimit-reset-tokens: 174ms
    x-request-id: req_8765615cd073205516a763fe3c4ffc0f
    cf-cache-status: DYNAMIC
    set-cookie: __cf_bm=...
    x-content-type-options: nosniff
    server: cloudflare
    cf-ray: 8d9f1c1639c8cf4d-SJC
    alt-svc: h3=":443"; ma=86400
  Body:
data: {"id":"cmpl-ANUYlvpi4kFs3nOhNDuyO14WST2ww","object":"text_completion","created":1730162379,"choices":[{"text":"","index":0,"logprobs":null,"finish_reason":"stop"}],"model":"gpt-3.5-turbo-instruct","usage":null}

data: {"id":"cmpl-ANUYlvpi4kFs3nOhNDuyO14WST2ww","object":"text_completion","created":1730162379,"model":"gpt-3.5-turbo-instruct","usage":{"prompt_tokens":5,"total_tokens":5},"choices":[]}

data: [DONE]

Refs #587, #590, #591

simonw added enhancement New feature or request openai labels Oct 29, 2024

simonw closed this as completed in 389acdf Oct 29, 2024

simonw added a commit that referenced this issue Oct 29, 2024

Release 0.17

a44ba49

Refs #587, #590, #591

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track usage for OpenAI models even when streaming #591

Track usage for OpenAI models even when streaming #591

simonw commented Oct 29, 2024

simonw commented Oct 29, 2024

simonw commented Oct 29, 2024

Track usage for OpenAI models even when streaming #591

Track usage for OpenAI models even when streaming #591

Comments

simonw commented Oct 29, 2024

simonw commented Oct 29, 2024

simonw commented Oct 29, 2024