Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/hex: use SIMD instructions for large blobs #68188

Closed
karalabe opened this issue Jun 26, 2024 · 10 comments
Closed

proposal: encoding/hex: use SIMD instructions for large blobs #68188

karalabe opened this issue Jun 26, 2024 · 10 comments

Comments

@karalabe
Copy link
Contributor

Proposal Details

Currently the hex package uses a rather trivial implementation for encoding and decoding hex strings (well, ok, it's not exactly a complex thing). However, due to processing the input byte-by-byte, it can only run so fast. If the caller only wants to convert a few bytes to hex, then there's not much to improve. However, if the caller has big blobs of hex data (e.g. KBs to MBs), then using SIMD instructions could speed things up by at least an order of magnitude.

(Yes, I'm fully aware that having megabytes of hex text feels like solving the wrong problem in the first place. Unfortunately that's how a certain API encodes all its binary data in JSON, so I try to work with what I have.)

I have found some older projects doing similar things (e.g. go-hex), and was wondering whether there would be an appetite to upstream a newer SIMD implementation into mainline Go if we were to contribute it? We'd probably focus on amd64 and arm64.

(Again, yes, we could just use that library or something we write ourselves and leave Go stdlib out of it, but it seems like an improvement to the standard libs that doesn't really have any relevant drawbacks.)

@gopherbot gopherbot added this to the Proposal milestone Jun 26, 2024
@karalabe karalabe changed the title proposal: encoding/hex: add SIMD and similar instructions proposal: encoding/hex: use SIMD instructions for large blobs Jun 26, 2024
@karalabe
Copy link
Contributor Author

Looked through past comments and found that there was an attempt in the past to upstream it, but @bradfitz shot it down: https://go-review.googlesource.com/c/go/+/110195. Wondering whether there's still such opposition against it?

I mean, I would be happy if the compiler was smart enough, but obviously it will not be :P

@seankhliao
Copy link
Member

The current policy for accepting assembly is https://go.dev/wiki/AssemblyPolicy

@adonovan
Copy link
Member

adonovan commented Jun 26, 2024

I wonder what has changed since Brad offered his rationale (in 2018):

Sorry, this is a nice speed-up, but this isn't a package where we want assembly.
We reserve assembly for the deep, gross, dark corners of Go.
A package like encoding/hex is supposed to be a high-level and readable package.
If these speed-ups matter for read applications, we need to achieve this somewhere else: moving some optimized functions into a gross package and using them from encoding/hex, or probably modifying the compiler to recognize more patterns and generate better code.
But this package should remain pure Go with no unsafe.

If anything, the objection is stronger today as Go supports even more machine targets now, and production machines are increasingly using arm64 where amd64 used to be the norm.

The third-party go-hex package seems like a reasonable solution for the (I hope) vanishingly small number of applications where encoding large blobs of data as hex is a significant bottleneck.

Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table?

@karalabe
Copy link
Contributor Author

karalabe commented Jun 26, 2024

I cannot say if anything changed, but the reason I brought it up is because in 2018 there was a very strong push (if I remember correctly) on making the Go assembler super smart. My SSE work on SHA3 was pending for 2 years because of it. However, after the optimizations landed, it turned out that the Go assembler isn't as good as people hoped and hand assembly still outperformed it so far that my SHA3 got merged.

Due to that, I'm wondering if in the light of those experiences, there would be more willingness to accept optimizations into the standard lib (the answer might be no, and that's ok, just figured I'd ask).

Point in case, hex encoding a fairly large blob (5MB say) takes immensely more with the stdlib than the SIMD one.

  • On the top chart, RLP is the binary size and JSON is the hex encoded size (weird enveloping, bear with me)
  • On the middle and bottom charts are the encoding and decoding speed, where the green part (RLP) is contained in all cases, on top of which yellow is stdlib hex encoding and blue is simd hex encoding (i.e. you want the diff between color X and green).

The difference is immense.

If anything, the objection is stronger today as Go supports even more machine targets now, and production machines are increasingly using arm64 where amd64 used to be the norm.

My point exactly, that's why I'd like something upstream that can do both amd and arm.

The third-party go-hex package seems like a reasonable solution for the (I hope) vanishingly small number of applications where encoding large blobs of data as hex is a significant bottleneck.

My counter question would be why reject an obvious immense performance improvement from the standard lib? My reasoning (tongue in cheek, no offence meant) would be that if Go ships hex, it should be as fast as possible. If Go doens't want to optimize it's hex library, then it should be removed from the stdlib because it's not production ready any more.

Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table?

That's actually a nice idea, will try it.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Jun 26, 2024
@ianlancetaylor
Copy link
Contributor

The problem with adding assembly code to the standard library is maintenance. Is encoding/hex a significant enough package that there is a benefit that is worth this cost? I understand that for your particular program there is a big benefit. But for your particular program you can presumably use your own optimized encoding/hex packages. Are there are other real programs out there where the performance of encoding/hex matters?

@karalabe
Copy link
Contributor Author

Ugh, turns out my issue is even messier. Hex encoding is half my problem, but the forced JSON escaper is the other part of the issue. I can make my hex strings generate/parse fast, but Go's json package will still iterate every hex character one by one, trying to see if it needs escaping or not. Diverging here a bit, but isn't there a way to avoid doing that (or... ha... doing it via SIMD to make it meaningfully fast)?

@karalabe
Copy link
Contributor Author

For what it's worth, Go's JSON string encoding overhead is 110x compared to simply copying the data into the output buffer. Opened a new issue for that #68203

@karalabe
Copy link
Contributor Author

Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table?

I've rolled one yesterday, initialized a string with 256x256x2 bytes to use as a lookup and then tried to iterate my source byte slice two bytes at a time and look up the encodings. Myeah, 2x slower than stdlib :) Haven't spent time further on it since it was so far underperforming. Just a memo.

@karalabe
Copy link
Contributor Author

karalabe commented Jul 1, 2024

After investigating our issues a lot more, it seems it's a combination of hex being slow(-er than we'd like) and json also doing a lot of post-processing on strings, the combo of the two hitting us out of latency requirements. I've opened a bunch of issues across the board on various json packages, but decided that there are too many philosophical roadblocks in place to meaningfully reduce the latencies for us, so we'll bite the bullet and roll out a binary protocol across 9 orgs/teams.

@adonovan
Copy link
Member

adonovan commented Jul 5, 2024

Before you switch to a binary protocol, you should evaluate using a different JSON encoder/decoder. A number of alternative ones exist, some motivated by greater control over the encoding (for example, a hundred details such as whether empty slices should be emitted as null or []) and others strictly by performance, including, for example, whether maps should be iterated in key order. We are considering a large collection of such concerns together as the basis for a future v2 implementation of encoding/json; I don't expect that process to conclude any time soon, but in the meantime you may find that one of the existing codecs implements the standard semantics with sufficiently improved performance that it could be a viable drop-in replacement for your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants