-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: encoding/hex: use SIMD instructions for large blobs #68188
Comments
Looked through past comments and found that there was an attempt in the past to upstream it, but @bradfitz shot it down: https://go-review.googlesource.com/c/go/+/110195. Wondering whether there's still such opposition against it? I mean, I would be happy if the compiler was smart enough, but obviously it will not be :P |
The current policy for accepting assembly is https://go.dev/wiki/AssemblyPolicy |
I wonder what has changed since Brad offered his rationale (in 2018):
If anything, the objection is stronger today as Go supports even more machine targets now, and production machines are increasingly using arm64 where amd64 used to be the norm. The third-party go-hex package seems like a reasonable solution for the (I hope) vanishingly small number of applications where encoding large blobs of data as hex is a significant bottleneck. Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table? |
The problem with adding assembly code to the standard library is maintenance. Is encoding/hex a significant enough package that there is a benefit that is worth this cost? I understand that for your particular program there is a big benefit. But for your particular program you can presumably use your own optimized encoding/hex packages. Are there are other real programs out there where the performance of encoding/hex matters? |
Ugh, turns out my issue is even messier. Hex encoding is half my problem, but the forced JSON escaper is the other part of the issue. I can make my hex strings generate/parse fast, but Go's |
For what it's worth, Go's JSON string encoding overhead is 110x compared to simply copying the data into the output buffer. Opened a new issue for that #68203 |
I've rolled one yesterday, initialized a string with 256x256x2 bytes to use as a lookup and then tried to iterate my source byte slice two bytes at a time and look up the encodings. Myeah, 2x slower than stdlib :) Haven't spent time further on it since it was so far underperforming. Just a memo. |
After investigating our issues a lot more, it seems it's a combination of hex being slow(-er than we'd like) and json also doing a lot of post-processing on strings, the combo of the two hitting us out of latency requirements. I've opened a bunch of issues across the board on various json packages, but decided that there are too many philosophical roadblocks in place to meaningfully reduce the latencies for us, so we'll bite the bullet and roll out a binary protocol across 9 orgs/teams. |
Before you switch to a binary protocol, you should evaluate using a different JSON encoder/decoder. A number of alternative ones exist, some motivated by greater control over the encoding (for example, a hundred details such as whether empty slices should be emitted as |
Proposal Details
Currently the
hex
package uses a rather trivial implementation for encoding and decoding hex strings (well, ok, it's not exactly a complex thing). However, due to processing the input byte-by-byte, it can only run so fast. If the caller only wants to convert a few bytes to hex, then there's not much to improve. However, if the caller has big blobs of hex data (e.g. KBs to MBs), then using SIMD instructions could speed things up by at least an order of magnitude.(Yes, I'm fully aware that having megabytes of hex text feels like solving the wrong problem in the first place. Unfortunately that's how a certain API encodes all its binary data in JSON, so I try to work with what I have.)
I have found some older projects doing similar things (e.g. go-hex), and was wondering whether there would be an appetite to upstream a newer SIMD implementation into mainline Go if we were to contribute it? We'd probably focus on amd64 and arm64.
(Again, yes, we could just use that library or something we write ourselves and leave Go stdlib out of it, but it seems like an improvement to the standard libs that doesn't really have any relevant drawbacks.)
The text was updated successfully, but these errors were encountered: