Semantic Conventions for OpenAI Matrics

This document defines semantic conventions for OpenAI client metrics.

Chat completions
Embeddings
Image generation
- Metric: openai.image_generations.duration

Chat completions

Metric: `openai.chat_completions.tokens`

This metric is required.

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.chat_completions.tokens`	Counter	`token`	Number of tokens used in prompt and completions.

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Conditionally Required: if the operation ended in error
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Required
`llm.usage.token_type`	string	The type of token.	`prompt`	Recommended
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

llm.usage.token_type MUST be one of the following:

Value	Description
`prompt`	prompt
`completion`	completion

Metric: `openai.chat_completions.choices`

Status: Experimental

This metric is required.

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.chat_completions.choices`	Counter	`choice`	Number of choices returned by chat completions call

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Conditionally Required: if the operation ended in error
`llm.response.finish_reason`	string	The reason the model stopped generating tokens.	`stop`	Recommended
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Required
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

Metric: `openai.chat_completions.duration`

Status: Experimental

This metric is required.

This metric SHOULD be specified with ExplicitBucketBoundaries of [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.chat_completions.duration`	Histogram	`s`	Duration of chat completion operation

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Conditionally Required: if the operation ended in error
`llm.response.finish_reason`	string	The reason the model stopped generating tokens.	`stop`	Recommended
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Required
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

Embeddings

Metric: `openai.embeddings.tokens`

Status: Experimental

This metric is required.

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.embeddings.tokens`	Counter	`token`	Number of tokens used in prompt and completions.

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Conditionally Required: if the operation ended in error
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Required
`llm.usage.token_type`	string	The type of token.	`prompt`	Recommended
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

llm.usage.token_type MUST be one of the following:

Value	Description
`prompt`	prompt
`completion`	completion

Metric: `openai.embeddings.vector_size`

Status: Experimental

This metric is required.

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.embeddings.vector_size`	Counter	`element`	he size of returned vector.

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Conditionally Required: if the operation ended in error
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Required
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

Metric: `openai.embeddings.duration`

Status: Experimental

This metric is required.

This metric SHOULD be specified with ExplicitBucketBoundaries of [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.embeddings.duration`	Histogram	`s`	Duration of embeddings operation

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Conditionally Required: if the operation ended in error
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Required
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

Image generation

Metric: `openai.image_generations.duration`

Status: Experimental

This metric is required.

This metric SHOULD be specified with ExplicitBucketBoundaries of [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Name	Instrument Type	Unit (UCUM)	Description
`llm.openai.image_generations.duration`	Histogram	`s`	Duration of image generations operation

Attribute	Type	Description	Examples	Requirement Level
`error.type`	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`	Recommended
`llm.response.model`	string	The name of the LLM a response is being made to.	`gpt-4-0613`	Conditionally Required: if the operation ended in error
`server.address`	string	Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2]	`example.com`; `10.1.2.80`; `/tmp/my.sock`	Required

[1]: The error.type SHOULD be predictable and SHOULD have low cardinality. Instrumentations SHOULD document the list of errors they report.

The cardinality of error.type within one instrumentation library SHOULD be low. Telemetry consumers that aggregate data from multiple instrumentation libraries and applications should be prepared for error.type to have high cardinality at query time when no additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set error.type.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), it's RECOMMENDED to:

Use a domain-specific attribute
Set error.type to capture all errors, regardless of whether they are defined within the domain-specific set or not.

[2]: When observed from the client side, and when communicating through an intermediary, server.address SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Value	Description
`_OTHER`	A fallback error value to be used when the instrumentation doesn't define a custom value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai-metrics.md

openai-metrics.md

Semantic Conventions for OpenAI Matrics

Chat completions

Metric: `openai.chat_completions.tokens`

Metric: `openai.chat_completions.choices`

Metric: `openai.chat_completions.duration`

Embeddings

Metric: `openai.embeddings.tokens`

Metric: `openai.embeddings.vector_size`

Metric: `openai.embeddings.duration`

Image generation

Metric: `openai.image_generations.duration`

Files

openai-metrics.md

Latest commit

History

openai-metrics.md

File metadata and controls

Semantic Conventions for OpenAI Matrics

Chat completions

Metric: openai.chat_completions.tokens

Metric: openai.chat_completions.choices

Metric: openai.chat_completions.duration

Embeddings

Metric: openai.embeddings.tokens

Metric: openai.embeddings.vector_size

Metric: openai.embeddings.duration

Image generation

Metric: openai.image_generations.duration

Metric: `openai.chat_completions.tokens`

Metric: `openai.chat_completions.choices`

Metric: `openai.chat_completions.duration`

Metric: `openai.embeddings.tokens`

Metric: `openai.embeddings.vector_size`

Metric: `openai.embeddings.duration`

Metric: `openai.image_generations.duration`