process.command_line accepts multiple value types #831

anuraaga · 2020-08-19T09:39:38Z

I noticed process.command_line accepts different types of values. I don't think we generally do this but was this intended? I would just use a string for it in all cases.

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/resource/semantic_conventions/process.md#process

The text was updated successfully, but these errors were encountered:

Oberon00 · 2020-08-19T09:46:03Z

This was intentional and was discussed at #635 (comment). In some environments (Windows) a single string (space separated or however it was typed in the shell) is the raw form of this. In others (Linux) a NUL-separated string that is easily translated to a list is the raw form. Additionally, some runtimes support only obtaining the commandline as list of arguments, which would need to be escaped (according to which shell's syntax?) if you want them as single string.
If we only support a single string, we would additionally need an attribute that tells us the format the string is in.

Oberon00 · 2020-08-19T09:49:15Z

Actually, a much nicer summary of the pros & cons is at #635 (comment) by @james-bebbington :

There is also concern around using two different data types to represent the same attribute value: #635 (comment). It might create additional complexity on back end systems to have to manage potentially different data types for the same attribute, though I'm not sure how much of a concern this would be? I guess parsing null-characters on the back end would likely be equivalently complex.

Note this isn't strictly ambiguous btw. If that was a command with no arguments, and you were looking at the Unix representation, it would have a null-character as the last character in the string. On Windows it would not. Admittedly that is not a great representational difference, and could be difficult to distinguish in some programming languages.

After thinking about this some more, I think I do prefer the list of strings approach, but I'm not sure I'm aware of all the pros/cons. For now I have changed it back to that, and updated the example for each OS accordingly. But for the record here's the three options considered and pros/cons:

Join arguments with a string:
Advantages: Most consistent representation across OSs.
Disadvantages: Potentially lossy/non-trivial conversion on Unix - do we need to enquote parameters with spaces in them (and escape internal quotes), etc?

Pass through string returned from OS as is (null-delimited on Unix):
Advantages: No processing needed by code producing telemetry, can just provide what OS returns.
Disadvantages: Back end needs to parse null-characters.

Convert to list of strings (for Unix only):
Advantages: Clean ("properly typed") representation.
Disadvantages: Inconsistent data type for the same attribute between OSs, and back end needs to handle this differently.

anuraaga · 2020-08-19T09:53:15Z

Yeah I can definitely see the motivation for this. At the same time, instrumentation and backend does get quite a bit more complex because of the dual-type so I don't know if this will be easy to implement - generating code from the semantic spec yaml seems especially problematic.

@thisthat Can you check if your semantic spans work be able to cover this case? #571

Oberon00 · 2020-08-19T09:55:15Z

At the same time, instrumentation and backend does get quite a bit more complex because of the dual-type

Instrumentation gets simpler because it has a choice, and can do whatever is easier. Or am I missing something?

Oberon00 · 2020-08-19T09:56:58Z

@thisthat Can you check if your semantic spans work be able to cover this case? #571

Currently the tool doesn't support this, but it can be extended. In fact we had the possibility of such union types in mind from the beginning.

anuraaga · 2020-08-19T09:59:15Z

I see - currently in Java we use an abstraction like StringAttributeSetter for operating on attributes. We don't yet use it for resource attributes but I think we should and probably will eventually. So we would need to introduce a clunky StringOrArrayAttributeSetter - it's not the end of the world, but this is what I meant by instrumentation getting more complex (probably only applies to typesafe language).

Oberon00 · 2020-08-19T10:02:07Z

You can also have two constants for this semantic attribute, e.g. a StringAttributeSetter PROCESS_COMMANDLINE_STR and PROCESS_COMMANDLINE_LIST. Though I begin to see your point. We could instead have two alternative semantic attributes with two different names in the semantic conventions. It would be a bit clunky in the spec then though.

anuraaga · 2020-08-19T10:04:00Z

Anyways it's a very small point - I was surprised to see it but since it was already discussed it's fine with me. Thanks for clarifying it.

anuraaga · 2020-08-19T10:17:13Z

Actually let me keep this open. The spec text says either primitive or array

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/common/common.md#attributes

So I think we need to tweak that to more formalize the possibility of union types.

Oberon00 · 2020-08-19T10:23:30Z

The actual Attribute will not have an union type, it will have either one or the other type. Only the semantic convention has the union type.

anuraaga · 2020-08-19T10:49:26Z

It took me a minute to parse that but I understand what you mean :) It's true, but I don't think it hurts to clarify this example explicitly. I think dual type convention is going to cause confusion when a newcomer reads and tries to understand so we can probably help. That newcomer this time was me, hence the issue filed ;)

thisthat · 2020-08-19T11:04:20Z

I might miss something but, what is the reason for not using array as the only available type and in windows put it as a single element array?

jmacd · 2020-08-19T18:02:42Z

@thisthat I think the opposite argument could also be made. Why not convert the array representation into a string in the backend, if it's so important to have a consistent type?

@Oberon00 wrote:

Instrumentation gets simpler because it has a choice, and can do whatever is easier. Or am I missing something?

and I agree. I don't think OTel should be in the business of saying what is and is not a valid data representation for these semantic conventions. The conventions are describing semantics. There is no loss of semantics by allowing multiple representations, but there are well-documented benefits to the instrumentation modules that can simply observe their inputs as opposed to transforming their inputs to match a specified data representation.

anuraaga · 2020-08-20T06:36:27Z

I don't think OTel should be in the business of saying what is and is not a valid data representation for these semantic conventions.

I'm not against having conventions with multiple types like this if we clearly lay it out. I'm not sure I'd go so far as saying we aren't in the business of describing the data representation though. If we are ok with numeric values being reported as strings, for example, all processors of data whether collector or backend would have to check both types or risk missing the data, which is unnecessary complexity.

Oberon00 · 2020-09-24T08:59:55Z

I might miss something but, what is the reason for not using array as the only available type and in windows put it as a single element array?

Because the meaning of a single array element currently is: After splitting the argument list, this was the only argument.

E.g. when you have a value of ["myapp myarg"] that would actually mean that the original commandline invocation was something like $ myapp\ myarg (note the escaped space).

Oberon00 · 2020-10-23T14:52:39Z

@open-telemetry/technical-committee I think this issue should be allowed-for-ga. EDIT: I made a PR #1137 that would fix this.

andrewhsu · 2020-10-27T15:10:33Z

from the spec sig mtg today, re-triaged for allowed-for-ga

anuraaga added the spec:resource Related to the specification/resource directory label Aug 19, 2020

Oberon00 added area:semantic-conventions Related to semantic conventions question Question for discussion labels Aug 19, 2020

anuraaga closed this as completed Aug 19, 2020

anuraaga reopened this Aug 19, 2020

andrewhsu added the release:after-ga Not required before GA release, and not going to work on before GA label Aug 21, 2020

Oberon00 mentioned this issue Oct 23, 2020

Overhaul process command-line resource conventions. #1137

Merged

Oberon00 removed the release:after-ga Not required before GA release, and not going to work on before GA label Oct 23, 2020

pmm-sumo mentioned this issue Oct 25, 2020

Include attribute name pluralization guidelines (#1115) #1140

Merged

andrewhsu added release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs priority:p3 Lowest priority level labels Oct 27, 2020

arminru closed this as completed in #1137 Nov 3, 2020

Oberon00 mentioned this issue Aug 30, 2021

Capture all request and response headers #1061

Closed

Oberon00 mentioned this issue Sep 17, 2021

Capture HTTP request/response headers as span attributes #1898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process.command_line accepts multiple value types #831

process.command_line accepts multiple value types #831

anuraaga commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

anuraaga commented Aug 19, 2020 •

edited

Loading

Oberon00 commented Aug 19, 2020

Oberon00 commented Aug 19, 2020 •

edited

Loading

anuraaga commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

anuraaga commented Aug 19, 2020

anuraaga commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

anuraaga commented Aug 19, 2020

thisthat commented Aug 19, 2020

jmacd commented Aug 19, 2020 •

edited

Loading

anuraaga commented Aug 20, 2020

Oberon00 commented Sep 24, 2020 •

edited

Loading

Oberon00 commented Oct 23, 2020 •

edited

Loading

andrewhsu commented Oct 27, 2020

process.command_line accepts multiple value types #831

process.command_line accepts multiple value types #831

Comments

anuraaga commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

anuraaga commented Aug 19, 2020 • edited Loading

Oberon00 commented Aug 19, 2020

Oberon00 commented Aug 19, 2020 • edited Loading

anuraaga commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

anuraaga commented Aug 19, 2020

anuraaga commented Aug 19, 2020

Oberon00 commented Aug 19, 2020

anuraaga commented Aug 19, 2020

thisthat commented Aug 19, 2020

jmacd commented Aug 19, 2020 • edited Loading

anuraaga commented Aug 20, 2020

Oberon00 commented Sep 24, 2020 • edited Loading

Oberon00 commented Oct 23, 2020 • edited Loading

andrewhsu commented Oct 27, 2020

anuraaga commented Aug 19, 2020 •

edited

Loading

Oberon00 commented Aug 19, 2020 •

edited

Loading

jmacd commented Aug 19, 2020 •

edited

Loading

Oberon00 commented Sep 24, 2020 •

edited

Loading

Oberon00 commented Oct 23, 2020 •

edited

Loading