proposal: encoding/hex: use SIMD instructions for large blobs #68188

karalabe · 2024-06-26T08:46:40Z

Proposal Details

Currently the hex package uses a rather trivial implementation for encoding and decoding hex strings (well, ok, it's not exactly a complex thing). However, due to processing the input byte-by-byte, it can only run so fast. If the caller only wants to convert a few bytes to hex, then there's not much to improve. However, if the caller has big blobs of hex data (e.g. KBs to MBs), then using SIMD instructions could speed things up by at least an order of magnitude.

(Yes, I'm fully aware that having megabytes of hex text feels like solving the wrong problem in the first place. Unfortunately that's how a certain API encodes all its binary data in JSON, so I try to work with what I have.)

I have found some older projects doing similar things (e.g. go-hex), and was wondering whether there would be an appetite to upstream a newer SIMD implementation into mainline Go if we were to contribute it? We'd probably focus on amd64 and arm64.

(Again, yes, we could just use that library or something we write ourselves and leave Go stdlib out of it, but it seems like an improvement to the standard libs that doesn't really have any relevant drawbacks.)

The text was updated successfully, but these errors were encountered:

karalabe · 2024-06-26T08:51:48Z

Looked through past comments and found that there was an attempt in the past to upstream it, but @bradfitz shot it down: https://go-review.googlesource.com/c/go/+/110195. Wondering whether there's still such opposition against it?

I mean, I would be happy if the compiler was smart enough, but obviously it will not be :P

seankhliao · 2024-06-26T08:54:00Z

The current policy for accepting assembly is https://go.dev/wiki/AssemblyPolicy

adonovan · 2024-06-26T14:23:39Z

I wonder what has changed since Brad offered his rationale (in 2018):

Sorry, this is a nice speed-up, but this isn't a package where we want assembly.
We reserve assembly for the deep, gross, dark corners of Go.
A package like encoding/hex is supposed to be a high-level and readable package.
If these speed-ups matter for read applications, we need to achieve this somewhere else: moving some optimized functions into a gross package and using them from encoding/hex, or probably modifying the compiler to recognize more patterns and generate better code.
But this package should remain pure Go with no unsafe.

If anything, the objection is stronger today as Go supports even more machine targets now, and production machines are increasingly using arm64 where amd64 used to be the norm.

The third-party go-hex package seems like a reasonable solution for the (I hope) vanishingly small number of applications where encoding large blobs of data as hex is a significant bottleneck.

Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table?

karalabe · 2024-06-26T15:04:55Z

I cannot say if anything changed, but the reason I brought it up is because in 2018 there was a very strong push (if I remember correctly) on making the Go assembler super smart. My SSE work on SHA3 was pending for 2 years because of it. However, after the optimizations landed, it turned out that the Go assembler isn't as good as people hoped and hand assembly still outperformed it so far that my SHA3 got merged.

Due to that, I'm wondering if in the light of those experiences, there would be more willingness to accept optimizations into the standard lib (the answer might be no, and that's ok, just figured I'd ask).

Point in case, hex encoding a fairly large blob (5MB say) takes immensely more with the stdlib than the SIMD one.

On the top chart, RLP is the binary size and JSON is the hex encoded size (weird enveloping, bear with me)
On the middle and bottom charts are the encoding and decoding speed, where the green part (RLP) is contained in all cases, on top of which yellow is stdlib hex encoding and blue is simd hex encoding (i.e. you want the diff between color X and green).

The difference is immense.

If anything, the objection is stronger today as Go supports even more machine targets now, and production machines are increasingly using arm64 where amd64 used to be the norm.

My point exactly, that's why I'd like something upstream that can do both amd and arm.

The third-party go-hex package seems like a reasonable solution for the (I hope) vanishingly small number of applications where encoding large blobs of data as hex is a significant bottleneck.

My counter question would be why reject an obvious immense performance improvement from the standard lib? My reasoning (tongue in cheek, no offence meant) would be that if Go ships hex, it should be as fast as possible. If Go doens't want to optimize it's hex library, then it should be removed from the stdlib because it's not production ready any more.

Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table?

That's actually a nice idea, will try it.

ianlancetaylor · 2024-06-26T16:00:30Z

The problem with adding assembly code to the standard library is maintenance. Is encoding/hex a significant enough package that there is a benefit that is worth this cost? I understand that for your particular program there is a big benefit. But for your particular program you can presumably use your own optimized encoding/hex packages. Are there are other real programs out there where the performance of encoding/hex matters?

karalabe · 2024-06-26T18:44:54Z

Ugh, turns out my issue is even messier. Hex encoding is half my problem, but the forced JSON escaper is the other part of the issue. I can make my hex strings generate/parse fast, but Go's json package will still iterate every hex character one by one, trying to see if it needs escaping or not. Diverging here a bit, but isn't there a way to avoid doing that (or... ha... doing it via SIMD to make it meaningfully fast)?

karalabe · 2024-06-26T19:32:47Z

For what it's worth, Go's JSON string encoding overhead is 110x compared to simply copying the data into the output buffer. Opened a new issue for that #68203

karalabe · 2024-06-27T07:55:49Z

Has anyone measured a portable pure-Go implementation that uses as 64KiB lookup table?

I've rolled one yesterday, initialized a string with 256x256x2 bytes to use as a lookup and then tried to iterate my source byte slice two bytes at a time and look up the encodings. Myeah, 2x slower than stdlib :) Haven't spent time further on it since it was so far underperforming. Just a memo.

karalabe · 2024-07-01T07:47:19Z

After investigating our issues a lot more, it seems it's a combination of hex being slow(-er than we'd like) and json also doing a lot of post-processing on strings, the combo of the two hitting us out of latency requirements. I've opened a bunch of issues across the board on various json packages, but decided that there are too many philosophical roadblocks in place to meaningfully reduce the latencies for us, so we'll bite the bullet and roll out a binary protocol across 9 orgs/teams.

adonovan · 2024-07-05T08:29:16Z

Before you switch to a binary protocol, you should evaluate using a different JSON encoder/decoder. A number of alternative ones exist, some motivated by greater control over the encoding (for example, a hundred details such as whether empty slices should be emitted as null or []) and others strictly by performance, including, for example, whether maps should be iterated in key order. We are considering a large collection of such concerns together as the basis for a future v2 implementation of encoding/json; I don't expect that process to conclude any time soon, but in the meantime you may find that one of the existing codecs implements the standard semantics with sufficiently improved performance that it could be a viable drop-in replacement for your needs.

karelbilek · 2025-04-03T11:35:45Z

@karalabe I am WAY late into the discussion but have you tried bytedance's sonic for the JSON stuff? would that help?

https://github.com/bytedance/sonic

karalabe added the Proposal label Jun 26, 2024

gopherbot added this to the Proposal milestone Jun 26, 2024

karalabe changed the title ~~proposal: encoding/hex: add SIMD and similar instructions~~ proposal: encoding/hex: use SIMD instructions for large blobs Jun 26, 2024

seankhliao added the Performance label Jun 26, 2024

earthboundkid mentioned this issue Jun 26, 2024

proposal: simd: new package for intrinsics #67520

Open

ianlancetaylor added this to Proposals Jun 26, 2024

ianlancetaylor moved this to Incoming in Proposals Jun 26, 2024

karalabe mentioned this issue Jun 26, 2024

common/hexutil: use assembly hex encoder on amd64 ethereum/go-ethereum#30075

Closed

karalabe closed this as completed Jul 1, 2024

ianlancetaylor removed this from Proposals Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: encoding/hex: use SIMD instructions for large blobs #68188

proposal: encoding/hex: use SIMD instructions for large blobs #68188

karalabe commented Jun 26, 2024

karalabe commented Jun 26, 2024

seankhliao commented Jun 26, 2024

adonovan commented Jun 26, 2024 •

edited

Loading

karalabe commented Jun 26, 2024 •

edited

Loading

ianlancetaylor commented Jun 26, 2024

karalabe commented Jun 26, 2024

karalabe commented Jun 26, 2024

karalabe commented Jun 27, 2024

karalabe commented Jul 1, 2024

adonovan commented Jul 5, 2024

karelbilek commented Apr 3, 2025

proposal: encoding/hex: use SIMD instructions for large blobs #68188

proposal: encoding/hex: use SIMD instructions for large blobs #68188

Comments

karalabe commented Jun 26, 2024

Proposal Details

karalabe commented Jun 26, 2024

seankhliao commented Jun 26, 2024

adonovan commented Jun 26, 2024 • edited Loading

karalabe commented Jun 26, 2024 • edited Loading

ianlancetaylor commented Jun 26, 2024

karalabe commented Jun 26, 2024

karalabe commented Jun 26, 2024

karalabe commented Jun 27, 2024

karalabe commented Jul 1, 2024

adonovan commented Jul 5, 2024

karelbilek commented Apr 3, 2025

adonovan commented Jun 26, 2024 •

edited

Loading

karalabe commented Jun 26, 2024 •

edited

Loading