Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Decode function for decoding []byte based on a configured text encoding #32493

Closed
jsirianni opened this issue Apr 17, 2024 · 13 comments
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium

Comments

@jsirianni
Copy link
Member

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Log receivers can output raw bytes (See Azure Event Hub). OTTL does not have the ability to parse these bytes into something human readable.

Describe the solution you'd like

A functionality similar to pkg/stanza/decode, which can take string or []byte input, and output it in a human readable format using a configured encoding (ascii, utf-8, etc).

Describe alternatives you've considered

Additional context

No response

@jsirianni jsirianni added enhancement New feature or request needs triage New item requiring triage labels Apr 17, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

We opted via #31543 to not make a generic Decode function as it would get bloated and there'd be edge cases. Which kind of decoding do you need to do?

@TylerHelmuth TylerHelmuth added discussion needed Community discussion needed and removed needs triage New item requiring triage labels Apr 17, 2024
@jsirianni
Copy link
Member Author

We opted via #31543 to not make a generic Decode function as it would get bloated and there'd be edge cases. Which kind of decoding do you need to do?

The problem I am trying to solve starts with the Azure receiver outputting bytes. If you decode the output, you get human readable json that is perfectly usable. The problem is that OTEL does not offer the ability to decode []byte into anything useful.

Previously I asked to have the Azure Event Hub receiver output something human readable #32382, however it was explained that any receiver could output bytes, therefore the decode ability should live somewhere else.

@djaglowski
Copy link
Member

We opted via #31543 to not make a generic Decode function as it would get bloated and there'd be edge cases. Which kind of decoding do you need to do?

Typically I'd agree on having targeted functions but I think in this case you're just basically moving a parameter into the function name and therefore causing bloat in the function set as well as friction while trying to add support for each encoding.

In pkg/stanza, the decode functionality has been pretty straightforward and stable. I don't see any discussion on #31543 regarding how bloat or edge cases would be problematic. Can you elaborate on this?

@TylerHelmuth
Copy link
Member

My fear was that we'd end up needing to decode something that would take more parameters than just the encoding name. Maybe that fear is unwarranted? I see that Stanza supports these options, all as single parameters, which makes me feel better about adding a general Decode converter.

@djaglowski is there a reason stanza doesn't support base64 decoding in that list?

If my fear of edge-case parameters is off-base I'd support deprecating Base64Decode in favor of a generic function. The Converter has not existed for very long, so now is the time to do it.

/cc @evan-bradley

@djaglowski
Copy link
Member

Thanks for considering @TylerHelmuth.

is there a reason stanza doesn't support base64 decoding in that list?

Not that I know of, just seems to have never been requested. I'm surprised it isn't listed in the golang.org/x/text/encoding/ianaindex list, but it would simple enough to add by just importing encoding/base64 and adding one more entry to the encodings map, "base64": base64.Encoding.

@TylerHelmuth
Copy link
Member

I see that Go's implementation for base64 encode includes base64.StdEncoding, base64.URLEncoding, base64.RawStdEncoding, and base64.RawURLEncoding. The Base64Decode hardcodes base64.StdEncoding today, but could be made to specify which type via an optional parameter if that need ever came up.

We could also build that into a generic Decode function via base64, base64/url and so on.

@djaglowski
Copy link
Member

We could also build that into a generic Decode function via base64, base64/url and so on.

Yeah, exactly. It's just a matter of articulating the strings and corresponding encoding.Encoding implementations which users will use to refer to each.

If my fear of edge-case parameters is off-base I'd support deprecating Base64Decode in favor of a generic function. The Converter has not existed for very long, so now is the time to do it.

I can't promise there won't ever be edge cases but the encoding.Encoding interface is very simple and robust for []byte => []byte and string => string conversions. I think in most cases if we find an unsupported encoding that we want to pick up, it'll just be a matter of adding it to the map.

@TylerHelmuth
Copy link
Member

If @evan-bradley agrees, I am ok moving forward with this direction.

@evan-bradley
Copy link
Contributor

I was mostly worried about adding complexity through per-encoding optional parameters. If we're reasonably confident we can keep Decode to a single parameter for the foreseeable future, I'm also okay with this.

@evan-bradley evan-bradley added priority:p2 Medium and removed discussion needed Community discussion needed labels Apr 18, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bacherfl
Copy link
Contributor

bacherfl commented Jul 4, 2024

hi, I would like to look into implementing this. CC @evan-bradley

@watercraft
Copy link

My ideal would be a filelog configuration that detects the Windows UTF16 header and converts it to UTF8 on the fly to "normalize" with other log file data. This header is detected by Emacs and displays in a human readable display:SQL EXPRESS $ xxd ERRORLOG | head 00000000: fffe 3200 3000 3200 3400 2d00 3000 3500 ..2.0.2.4.-.0.5.

TylerHelmuth pushed a commit that referenced this issue Sep 5, 2024
**Description:** This PR adds a `Decode` function that accepts a string
or byte array, and a encoding name as an input and transforms it to
UTF-8

**Link to tracking Issue:** #32493

**Testing:** Added unit and e2e tests

**Documentation:** Added an entry in the readme to describe the new
function

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
**Description:** This PR adds a `Decode` function that accepts a string
or byte array, and a encoding name as an input and transforms it to
UTF-8

**Link to tracking Issue:** open-telemetry#32493

**Testing:** Added unit and e2e tests

**Documentation:** Added an entry in the readme to describe the new
function

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium
Projects
None yet
Development

No branches or pull requests

6 participants