-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dogstatsd] handle properly utf8 packets #1279
Conversation
e5d0121
to
1d35512
Compare
# network socket, but if submit_packets is used | ||
# programatically and packets is unicode already | ||
# then do not decode! | ||
if not isinstance(packets, unicode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use:
if not type(packets) == unicode:
instead ?
It's slightly faster:
topdog@i-1b871af7(prod):~$ python -m timeit 'not isinstance("sdfsdfsd" , unicode)'
10000000 loops, best of 3: 0.197 usec per loop
topdog@i-1b871af7(prod):~$ python -m timeit 'not type("sdfsdfsd") == unicode'
10000000 loops, best of 3: 0.151 usec per loop
and this code is run in a pretty performance sensitive loop.
Looks great besides the comment! |
what's the perf impact? |
Looks like the extra Benchmark utf8 handling (only the code in this branch, raises on the master branch)
Benchmark ascii text with this branch
Benchmark ascii text with master branch
|
Also @remh I tried your suggestion but looks like it results in slower code on my laptop:
|
30% is a lot. Does the statsd protocol mention anything about accepting UTF-8 in the payloads ? |
@remh Using The performance impact is only 7.1 % now. About the statsd protocol, didn't find much in the protocol, however I found a few
|
18e2104
to
a7ab7fc
Compare
So 7% is still pretty high i think. It might be worth making that optional ? @clutchski any additional thought ? |
a7ab7fc
to
0ee4f16
Compare
0ee4f16
to
d610cc4
Compare
@remh this one is ready to go too, however I merged the histogram min branch in it before because it contained a refactor of aggregator.py, thus that one should go out first. |
Fixes #1256. We should always consider that dogstatsd receives a utf-8 encoded string through its socket, though it comes with a non-negligeable performance overhead over ~7% in simple benchmarks. The default behavior of the server is thus considering packets as ASCII-only content, flipping the utf8_decoding flag in the config will allow you to parse such packets correctly. Most (dog)statsd clients should already encode their packets in utf-8 when sending data.
d610cc4
to
ebf3625
Compare
[dogstatsd] handle properly utf8 packets
Fixes #1256. We should always consider that dogstatsd
receives a utf-8 encoded string through its socket,
but still support unicode python strings in case we
submit things programatically (e.g. useful for tests)