-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handle different encodings #316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. Couple of minor comments, and would suggest adding some unit tests with a variety of inputs for #utf8_encode
.
lib/ddtrace/utils.rb
Outdated
elsif options[:binary] | ||
# This option is useful for "gracefully" displaying binary data that | ||
# often contains text such as marshalled objects | ||
str.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking comment.
I think as its written, this is readable and acceptable.
However, I would suggest that perhaps, given the different combinations of encodings and options, the responsibilites of #utf8_encode
is actually broader than it looks, and that binary encoding itself might warrant its own function to simplify #utf8_encode
's responsibilities.
Breaking binary encoding out into its own function would provide the benefit of directly testing binary encoding itself via unit tests. I do realize the proposed function would be one line, and likely not contribute anything to the readability of this already satisfactory code, so I defer to your judgment whether you think this separation of responsibility is warranted by its benefits or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons we're experiencing encoding issues is that we often resort to different approaches on how to solve this. I don't think we should have more than one method in the public API doing that.
Considering that, I don't understand what's the gain of having another method for testing purposes as opposed to assertions with binary: true
options. Both can provide the same amount of test granularity.
lib/ddtrace/utils.rb
Outdated
|
||
reset! | ||
STRING_PLACEHOLDER = ''.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but I think maybe this could be defined at the top of the file? (As is more conventional.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
df3b6e3
to
d8385df
Compare
505301b
to
ffc8613
Compare
rescue ::Encoding::CompatibilityError | ||
"#{operation} BLOB (OMITTED)" | ||
rescue => e | ||
Tracer.log.error("Error sanitizing Dalli operation: #{e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you set this to debug
? otherwise we may hit a critical path with a lot of log entries.
lib/ddtrace/utils.rb
Outdated
str.encode(::Encoding::UTF_8) | ||
end | ||
rescue => e | ||
Tracer.log.error("Error encoding string in UTF-8: #{e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
ffc8613
to
cda0d3d
Compare
7684d73
to
76bfea9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! We're going to merge this one as a part of our 0.11.0 release!
value.encode(::Encoding::UTF_8) | ||
@type = Utils.utf8_encode(type) | ||
@message = Utils.utf8_encode(message) | ||
@backtrace = Utils.utf8_encode(backtrace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
assert_equal(::Encoding::UTF_8, Datadog::Utils.utf8_encode(str).encoding) | ||
|
||
# we don't allocate new objects when a valid UTF-8 string is provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test! this ensures that we don't introduce any performance slowdown for common cases.
This PR provides
Datadog::Utils.utf8_encode
utility method.We're also refactoring parts of the code that used to perform encoding operations.