Skip to content

Latest commit

 

History

History
372 lines (263 loc) · 20.4 KB

openai-metrics.md

File metadata and controls

372 lines (263 loc) · 20.4 KB

Semantic Conventions for OpenAI Matrics

Status: Experimental

This document defines semantic conventions for OpenAI client metrics.

Chat completions

Metric: openai.chat_completions.tokens

Status: Experimental

This metric is required.

Name Instrument Type Unit (UCUM) Description
llm.openai.chat_completions.tokens Counter token Number of tokens used in prompt and completions.
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Conditionally Required: if the operation ended in error
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Required
llm.usage.token_type string The type of token. prompt Recommended
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.

llm.usage.token_type MUST be one of the following:

Value Description
prompt prompt
completion completion

Metric: openai.chat_completions.choices

Status: Experimental

This metric is required.

Name Instrument Type Unit (UCUM) Description
llm.openai.chat_completions.choices Counter choice Number of choices returned by chat completions call
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Conditionally Required: if the operation ended in error
llm.response.finish_reason string The reason the model stopped generating tokens. stop Recommended
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Required
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.

Metric: openai.chat_completions.duration

Status: Experimental

This metric is required.

This metric SHOULD be specified with ExplicitBucketBoundaries of [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Name Instrument Type Unit (UCUM) Description
llm.openai.chat_completions.duration Histogram s Duration of chat completion operation
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Conditionally Required: if the operation ended in error
llm.response.finish_reason string The reason the model stopped generating tokens. stop Recommended
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Required
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.

Embeddings

Metric: openai.embeddings.tokens

Status: Experimental

This metric is required.

Name Instrument Type Unit (UCUM) Description
llm.openai.embeddings.tokens Counter token Number of tokens used in prompt and completions.
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Conditionally Required: if the operation ended in error
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Required
llm.usage.token_type string The type of token. prompt Recommended
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.

llm.usage.token_type MUST be one of the following:

Value Description
prompt prompt
completion completion

Metric: openai.embeddings.vector_size

Status: Experimental

This metric is required.

Name Instrument Type Unit (UCUM) Description
llm.openai.embeddings.vector_size Counter element he size of returned vector.
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Conditionally Required: if the operation ended in error
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Required
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.

Metric: openai.embeddings.duration

Status: Experimental

This metric is required.

This metric SHOULD be specified with ExplicitBucketBoundaries of [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Name Instrument Type Unit (UCUM) Description
llm.openai.embeddings.duration Histogram s Duration of embeddings operation
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Conditionally Required: if the operation ended in error
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Required
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.

Image generation

Metric: openai.image_generations.duration

Status: Experimental

This metric is required.

This metric SHOULD be specified with ExplicitBucketBoundaries of [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Name Instrument Type Unit (UCUM) Description
llm.openai.image_generations.duration Histogram s Duration of image generations operation
Attribute Type Description Examples Requirement Level
error.type string Describes a class of error the operation ended with. [1] timeout; java.net.UnknownHostException; server_certificate_invalid; 500 Recommended
llm.response.model string The name of the LLM a response is being made to. gpt-4-0613 Conditionally Required: if the operation ended in error
server.address string Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] example.com; 10.1.2.80; /tmp/my.sock Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

  • Use a domain-specific attribute
  • Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value Description
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value.