Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ottl truncate editor #23880

Closed
wants to merge 6 commits into from

Conversation

jack-berg
Copy link
Member

Description:

Add ottl truncate editor for truncating strings. E.g.:

    log_statements:
      - context: log
        statements:
        - truncate(body, 10)

Link to tracking Issue: Resolves #23847.

Testing: Unit tests, local verification with running collector.

Documentation: Added documentation to pkg/ottl/ottlfuncs/README.md.

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an Enhancement changelog entry for pkg/ottl.

pkg/ottl/ottlfuncs/func_truncate.go Outdated Show resolved Hide resolved
pkg/ottl/ottlfuncs/func_truncate.go Show resolved Hide resolved
.chloggen/add-ottl-truncate-editor.yaml Outdated Show resolved Hide resolved
@kentquirk
Copy link
Member

Hmm, I know I'm swooping in late to this, but string editing is a bit more general than just truncate. Should we consider a general slice(start, end) function instead?

@jack-berg
Copy link
Member Author

Hmm, I know I'm swooping in late to this, but string editing is a bit more general than just truncate. Should we consider a general slice(start, end) function instead?

Sounds like competing concerns between consistency with existing truncate_all function and the increased flexibility of a more generic function. Could always start with truncate and add slice later. Adds a bit of bloat but maintaining these functions doesn't seem like too much overhead.

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
@jack-berg
Copy link
Member Author

Bumping this. Wondering if any other @open-telemetry/collector-contrib-maintainer can review this and / or have opinions about a truncate vs. a more generic slice.

@TylerHelmuth
Copy link
Member

I think @kentquirk brings up a good point that truncate might be too generic of a name for string-only truncation. It is feasible that other types such as slices or time.Time may need truncated and if we reserve truncate for strings only that will make discoverability of the functions more difficult.

The easiest solution is to name it truncate_str so that the type is clear and we keep consistency with the word truncate and the parameter pattern used by truncate_all (which should've been called truncate_all_strs).

@kentquirk @evan-bradley what do you think?

@evan-bradley
Copy link
Contributor

I think @kentquirk brings up a good point that truncate might be too generic of a name for string-only truncation. It is feasible that other types such as slices or time.Time may need truncated and if we reserve truncate for strings only that will make discoverability of the functions more difficult.

I understood Kent's point a little differently: that truncate only trims the end of a string, as where we may also want to trim the beginning, which we currently support with the Substring function. What you said is also true though and I think is important to keep in mind.

Looking at the original issue, I think having a Len Converter that outputs the length of a value would be the best immediate solution. I'm hesitant to add a truncate editor since we can already accomplish the same with set(body, Substring(body, 0, 10) where IsString(body). Adding a truncate editor would make this case just a little bit easier by making it truncate(body, 10) where IsString(body), but I think I'd prefer for OTTL to generally have one right way of doing something.

As another way of solving the issue, should Substring return an error when the length exceeds the string length? Looking at a few other languages, JavaScript and Python both just return the full string, and Go only returns an error because it treats the string as a slice type. What is actionable by the user if they hit this case?

@jack-berg
Copy link
Member Author

set(body, Substring(body, 0, 10) where IsString(body)

Just to confirm, that's not a solution for the problem today without a Len function since Substring will error when the length is shorter than 10 (in this case).

As another way of solving the issue, should Substring return an error when the length exceeds the string length?

If Len was added it could be actionable since the user would have a way of bounding the argument to always produce correct results. Its not currently actionable when you want to use it to perform truncation.

but I think I'd prefer for OTTL to generally have one right way of doing something.

👍

If folks agree on adding Len, I'm happy to open an alternative PR with that implementation.

@evan-bradley
Copy link
Contributor

Just to confirm, that's not a solution for the problem today without a Len function since Substring will error when the length is shorter than 10 (in this case).

Yes, that's a good point. The full correct statement would be set(body, Substring(body, 0, 10) where IsString(body) and Len(body) >= 10.

If Len was added it could be actionable since the user would have a way of bounding the argument to always produce correct results. Its not currently actionable when you want to use it to perform truncation.

Could you elaborate on this? If Substring("OpenTelemetry", 0, 100) == "OpenTelemetry" and Substring doesn't throw any errors, I think it would be valid to say that "OpenTelemetry" has been truncated to be under 100 characters. For the more general case of string manipulation, it's not clear to me whether Substring("OpenTelemetry", 4, 100) == "Telemetry" is invalid output.

@TylerHelmuth
Copy link
Member

I agree that Substring("OpenTelemetry", 4, 100) == "Telemetry" feels a little weird. Java would error in that situation like Go does. Allowing that situation does feel generally more user-friendly, but hurts users looking for more explicit errors.

With Converters, OTTL prefers explicit errors, so I'm leaning towards Len based on the recent discussion. Len has been a requested function in the past, handles this situation explicitly, and keeps the door open to update Substring to be more generous in the future if we hear user feedback that Len is too cumbersome.

I felt ok about truncate since it mimics the pattern we have of duplicating functionality between a singular function and a multi-function (delete_key, delete_all_keys, replace_pattern, replace_all_patterns, etc.), but @evan-bradley you've convinced me that it is too similar to Substring and the reduction of a single int param doesn't justify a new function.

@evan-bradley @kentquirk @jack-berg if you all agree lets handle this issue via a Len Converter.

@jack-berg
Copy link
Member Author

Closing in favor of #24420.

@jack-berg jack-berg closed this Jul 20, 2023
TylerHelmuth added a commit that referenced this pull request Jul 27, 2023
**Description:** 

Add ottl `Len` converter to make `Substring` useful for truncating.
E.g.:
```
    log_statements:
      - context: log
        statements:
        - set(body, Substring(body, 0, 10)) where Len(body) >= 10
```

**Link to tracking Issue:**  Resolves #23847.

**Testing:** Unit tests, local verification with running collector.

**Documentation:** Added documentation to
`pkg/ottl/ottlfuncs/README.md`.

This PR is the result of
[conversation](#23880 (comment))
on #23880.

---------

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[processor/transform] not enough preconditions to guard against warnings
5 participants