Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute names: unicode on OTLP, only [a-z0-9._] in OTel semcov #1124

Closed
lmolkova opened this issue Jun 5, 2024 · 2 comments · Fixed by #1302
Closed

Attribute names: unicode on OTLP, only [a-z0-9._] in OTel semcov #1124

lmolkova opened this issue Jun 5, 2024 · 2 comments · Fixed by #1302
Assignees

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

Attribute names can be any unicode sequence

Every name MUST be a valid Unicode sequence.

It makes sense for user apps using OTel API and OTLP, but is not accepted by our (build-tools) tooling

ID_RE = re.compile("([a-z](\\.?[a-z0-9_-]+)+)")
"""Identifiers must start with a lowercase ASCII letter and
contain only lowercase, digits 0-9, underscore, dash (not recommended) and dots.
Each dot must be followed by at least one allowed non-dot character."""

We should document and enforce the rules that we have for semantic convention definitions in this repo:

  • only a-z, 0-9, . and _ are accepted
  • starts with a letter
  • ends with a letter or number
  • (no dashes - there are no existing attributes with it)

These rules are necessary for code-generation. They should also apply to metric names, units, event names, event payload fields, or other properties that are likely to be represented as a code.

We can expand the list of allowed characters if we can find a way to support code generation for them.

@lmolkova lmolkova changed the title Attribute naming: unicode on OTLP, [a-z0-9._] in OTel semcov Attribute naming: unicode on OTLP, only [a-z0-9._] in OTel semcov Jun 5, 2024
@lmolkova lmolkova changed the title Attribute naming: unicode on OTLP, only [a-z0-9._] in OTel semcov Attribute names: unicode on OTLP, only [a-z0-9._] in OTel semcov Jun 5, 2024
@jsuereth jsuereth moved this to Migrate to weaver in Semantic Conventions Tooling Jul 3, 2024
@tigrannajaryan
Copy link
Member

The use of [a-z0-9._] is currently merely a recommendation, not a strict restriction. I think it is fine if we want to rely on that recommendation and make it the default behavior for our tools but we may need the tools to be able to deal with exceptions.

I think use cases like this show that strictly prohibiting it may create problems with interoperability with other standards.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 11, 2024

Agreed. Our existing tooling imposes such limitations - CI checks would flag it and fail.
One of the reasons to have this limitation is to be able to translate attribute/metrics/etc names to constant names in the code.

To support other characters, we'd need some mechanism to define a code-friendly name for such identifiers. We might need a similar mechanism for #1118 (comment) (phase 2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants