Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safely truncate over-length string attributes #3156

Merged
merged 6 commits into from
Sep 12, 2022

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Sep 9, 2022

This fixes invalid UTF-8 introduced by our truncation logic.
This does not address the problem of invalid UTF-8 being passed in by users, but it does at least ensure the SDK is not responsible for invalid UTF-8 itself.

Fixes #3021

@jmacd
Copy link
Contributor Author

jmacd commented Sep 9, 2022

This needs at least one more test. ;)

@open-telemetry open-telemetry deleted a comment from codecov bot Sep 9, 2022
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #3156 (41af575) into main (7aba25d) will increase coverage by 0.0%.
The diff coverage is 94.7%.

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #3156   +/-   ##
=====================================
  Coverage   76.3%   76.3%           
=====================================
  Files        180     180           
  Lines      11992   12009   +17     
=====================================
+ Hits        9153    9169   +16     
- Misses      2597    2598    +1     
  Partials     242     242           
Impacted Files Coverage Δ
sdk/trace/span.go 88.2% <94.7%> (+0.2%) ⬆️

@jmacd
Copy link
Contributor Author

jmacd commented Sep 9, 2022

PTAL. I had to rewrite the fallback case; when there is invalid UTF-8 it calls strings.ToValidUTF8 and then re-applies the same (safe) truncation logic.

@MadVikingGod
Copy link
Contributor

The costly operation in this call is the ToValidUTF8, because it has to iterate over the entire string. So this means how you have it now we iterate over the string once, try to apply the limit, and iterate a second time.

I think an alternative but still valid interpretation would be to apply the limit, and then filter bad unicode. This would mean that if there bad points in the beginning of the string they would count toward the limit, even if they aren't in the output string.

@jmacd
Copy link
Contributor Author

jmacd commented Sep 9, 2022

The way the code is organized, strings.ToValidUTF8() (which copies the data) will only be called when the user has supplied invalid code points. Scanning the input only happens when the input is over-limit, but it won't cause an allocation unless the attribute is over-limit and also contains invalid points. I believe the allocation is the expensive part of this, not the scanning part. This seems like a good compromise, although I think OTel may have to address the problem of users supplying invalid UTF8 anyway.

sdk/trace/span.go Show resolved Hide resolved
sdk/trace/span.go Show resolved Hide resolved
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
@MadVikingGod MadVikingGod mentioned this pull request Sep 9, 2022
@MadVikingGod MadVikingGod added this to the Release v1.10.0 milestone Sep 9, 2022
@jmacd
Copy link
Contributor Author

jmacd commented Sep 9, 2022

Personally, I consider this an urgent matter and I would be willing to waive our ordinary wait period for PRs to merge after approvals. I suggest this because it's been about 4 weeks since the last release, and if we don't include this fix there will be immediate pressure to make another release. @open-telemetry/go-maintainers what do you think?

@MrAlias
Copy link
Contributor

MrAlias commented Sep 9, 2022

Personally, I consider this an urgent matter and I would be willing to waive our ordinary wait period for PRs to merge after approvals. I suggest this because it's been about 4 weeks since the last release, and if we don't include this fix there will be immediate pressure to make another release. @open-telemetry/go-maintainers what do you think?

I'm fine merging this early. That said, @MadVikingGod has decided to hold off the release until Monday to make sure this is included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid UTF-8 causes export failure while marshaling
7 participants