cmd/compile: use hashing for switch statements #34381

mdempsky · 2019-09-18T22:23:19Z

Currently we implement switch statements with binary search. This works well for values that fit into a register, but for strings it might be better to compute a hash function.

Some thoughts:

Even in the simple case of using a full hash function, exactly two passes over the switched value is probably better than O(log N) possible passes over it.
Since we have the expected keys statically, we can use a simple, cheap universal hash (e.g., FNV? djb2?) and keep trying different seeds until we get one without any collisions.
Instead of hashing all of the input bytes, we can hash just enough of them to avoid collisions.
We can try using a minimal (or near minimal) perfect hash to emit a jump table instead of binary search. (This might even be useful for sparse integer values too.)

If someone can help figure out code that can take a []string and figures out an appropriate and cheap hash function to use, I can help wire it into swt.go.

The text was updated successfully, but these errors were encountered:

josharian · 2019-09-19T03:27:05Z

Drive-by comment from phone: It wouldn’t surprise me if “use the first word of the string data” turns out to in practice to be a decent first-cut hash, and it is certainly cheap (can be generated inline, should be just a few instructions).

pierrec · 2019-09-19T08:32:28Z

After reading minimal perfect hash function which does have an implementation go-mph, maybe that can just be used as is?

lranjbar · 2019-11-03T21:30:48Z

@mdempsky @josharian @pierrec @toothrot Has anyone started on this issue? It seems interesting.

mdempsky · 2019-11-03T21:37:55Z

@lranjbar I don't think anyone has started on it yet. Happy to answer any quick questions to get you started. (I probably won't be able to give any more serious guidance/feedback for another week though.)

lranjbar · 2019-11-05T04:53:15Z

@mdempsky I okay with waiting a little bit for more guidance. (I do actually have a pretty busy week myself.) I can still dig into it a bit more on my own, I was just curious before I started. :)

jupj · 2020-09-04T17:28:33Z

About the jump table, we probably prefer a near minimal perfect hash function. A strict minimal perfect hash function would always lead to a collision for the default case of the switch. One option is to extend the size of the jump table from minimal to the next power of 2, and jump to the default case for indexes that don't collide with a pre-computed hash. At the same time, that would allow us to use cheaper bitwise operators instead of the more expensive % operator when calculating the hash/index for the jump table.

A question is how to evaluate the performance of binary search vs jump table. Constructing the near minimal hash probably requires some additional cycles, compared to a cheap hash (or the even cheaper "use the first word of the string data", as mentioned by @josharian). I suppose that the performance benefits of the jump table comes with large n. For small n, the difference might be less clear?

jupj · 2020-09-19T16:56:07Z

I've implemented a hash function using fnv, based on string length and minimal unique prefix (@lranjbar I hope you don't mind). Also have a working implementation of a near minimal perfect hash function for a jump table.

This finds a unique hash reasonably fast (typically within the first 1-2 seeds) and hash/jumptable benchmarks indicate performance close to maphash on linux/amd64. (I've used all string-typed switch cases from the golang/go repo as testdata.)

@mdempsky how/where should I submit this code?

mdempsky · 2020-09-19T17:48:08Z

@jupj Great! Thanks for working on this. Do you have the code available somewhere that I can take a look?

What would probably be best is if you can make a standalone Go package that we can include in the standard repo as something like "cmd/compile/internal/phash" (or ".../perfect hash" or whatever; open to suggestions).

With that done, I can help wire it into gc/swt.go. Or give you pointers on doing this yourself if you'd like.

jupj · 2020-09-20T10:53:25Z

@mdempsky please find the code here: https://github.com/jupj/go-issue-34381 (findhash.go contains the hash functions)
I'll modify this to a standalone package that could be included in the standard repo.

I'm new to compiler work, so I appreciate any help and/or pointers to wire it into gc/swt.go!

Edit: added pointer to findhash.go

lranjbar · 2020-09-20T16:58:05Z

@jupj No worries. It's been almost a year and clearly I didn't have time outside of work to focus on this. 😅 Good luck with your submission.

gopherbot · 2020-09-23T20:01:34Z

Change https://golang.org/cl/256898 mentions this issue: cmd/compile: add internal package phash

josharian · 2020-10-02T17:49:32Z

I think we're on a good path for this, but one thing to double-check as we integrate is that if a giant string arrives for a switch statement only composed of small strings, we don't end up doing O(giant) work to calculate a hash. This protection could have many forms (hash function design, checking max length, checking exact length before hashing, etc.), but we should make sure it's there.

jupj · 2020-10-03T13:56:57Z

I think we're on a good path for this, but one thing to double-check as we integrate is that if a giant string arrives for a switch statement only composed of small strings, we don't end up doing O(giant) work to calculate a hash. This protection could have many forms (hash function design, checking max length, checking exact length before hashing, etc.), but we should make sure it's there.

Thanks, I updated the CL mentioned above, re-added the check for the minimum unique prefix length. This limits the work to O(prefixLen), with the maximum case string length in the switch statement as the upper bound of prefixLen. Is this sufficient to protect from giant strings?

Any tips for how to wire the perfect hashes into swt.go? I think this could be the next step, to validate the perfect hash functionality. I have two options in mind:

For each switch statement, generate an AST that implements the ad-hoc hash as an anonymous function. So switch s { will be rewritten to switch func(s string) uint32 { /* generated AST */ }(s) {, where the AST implements the selected perfect hash function with given seed and prefix length.
Generate functions (go source code) for all different combinations of cheap hashes and include them in the runtime library as runtime.phash_Len, runtime.phash_First, runtime.phash_LenFirst, etc. The compiler will rewrite the switch statements to use these, eg. statement switch s { will be rewritten to switch runtime.phash_LenFirst(s) {. The compiler will include seed and prefix parameters to the hash function where needed.

I don't have any experience generating AST, so I feel the latter option would be more approachable for me. There are 31 different combinations (fnv fallback included), so I think it would make sense to generate the functions into the runtime library, at least for programs with more than 31 switch cases.

Comments?

josharian · 2020-10-03T14:55:57Z

We should generate AST.

Or leave it alone in swt.go and generate SSA in gc/ssa.go. I find generating SSA easier than generating AST, but since this would be the first switch-to-SSA direct lowering there's no sample code to work from and you're more likely to hit unexpected stumbling blocks.

gopherbot · 2020-10-07T21:37:00Z

Change https://golang.org/cl/260597 mentions this issue: cmd/compile: add -d=swthash flag for string switch hashing

mdempsky · 2020-10-07T21:49:48Z

As an update here: On Sunday, I played with integrating phash into the Go compiler and got it mostly working. CL is 260597 above. (If you're really curious, I streamed the compiler development process on Twitch at https://www.twitch.tv/videos/761008858; string hashing stuff starts at 37m20s and runs to the end of the stream. Beware Twitch only stores videos for 14 days.)

You can build targets with -gcflags=all=-d=swthash to enable the optimization. I haven't done any performance/size benchmarking yet.

jupj · 2020-10-10T10:16:55Z

Thanks for the stream, very insightful to watch!

jupj · 2021-04-05T19:16:43Z

Update: I've updated phash to generate ir for computing the hash (https://golang.org/cl/256898), and updated mdempsky's proof-of-concept to use the new ir nodes in https://golang.org/cl/307191

Benchmarked with compilecmp:

compilecmp master -> HEAD
master (c40dc677be): go/doc: avoid panic on references to functions with no body
HEAD (49958c0094): swthash always on

benchstat -geomean  /tmp/951139350 /tmp/784215165

completed 100 of 100, estimated time remaining 0s (ETA 8:37PM)      
name                      old time/op       new time/op       delta
Template                        327ms ± 4%        326ms ± 3%    ~     (p=0.957 n=98+100)
Unicode                         141ms ± 9%        141ms ± 9%    ~     (p=0.474 n=98+100)
GoTypes                         1.78s ± 2%        1.78s ± 2%    ~     (p=0.329 n=99+100)
Compiler                        153ms ± 9%        152ms ± 8%    ~     (p=0.114 n=97+95)
SSA                             12.9s ± 1%        12.8s ± 2%  -0.27%  (p=0.002 n=100+100)
Flate                           205ms ± 5%        205ms ± 5%    ~     (p=0.366 n=100+98)
GoParser                        306ms ± 4%        305ms ± 5%    ~     (p=0.207 n=92+99)
Reflect                         723ms ± 2%        723ms ± 2%    ~     (p=0.428 n=99+97)
Tar                             285ms ± 5%        285ms ± 6%    ~     (p=0.876 n=99+100)
XML                             368ms ± 3%        369ms ± 4%    ~     (p=0.179 n=98+98)
LinkCompiler                    628ms ± 3%        630ms ± 3%    ~     (p=0.124 n=97+95)
ExternalLinkCompiler            1.72s ± 2%        1.72s ± 2%    ~     (p=0.733 n=94+96)
LinkWithoutDebugCompiler        394ms ± 4%        395ms ± 4%    ~     (p=0.592 n=99+100)
[Geo mean]                      539ms             539ms       -0.02%

name                      old user-time/op  new user-time/op  delta
Template                        409ms ± 6%        408ms ± 8%    ~     (p=0.515 n=99+100)
Unicode                         186ms ±14%        185ms ±12%    ~     (p=0.752 n=99+100)
GoTypes                         2.19s ± 5%        2.19s ± 5%    ~     (p=0.415 n=100+100)
Compiler                        186ms ±12%        184ms ±14%    ~     (p=0.220 n=100+100)
SSA                             17.4s ± 4%        17.2s ± 5%  -0.90%  (p=0.002 n=100+100)
Flate                           249ms ± 9%        247ms ±10%    ~     (p=0.380 n=98+99)
GoParser                        359ms ± 9%        358ms ± 9%    ~     (p=0.200 n=96+99)
Reflect                         914ms ± 4%        914ms ± 4%    ~     (p=0.948 n=97+99)
Tar                             345ms ± 9%        347ms ±11%    ~     (p=0.430 n=99+100)
XML                             449ms ± 7%        449ms ± 6%    ~     (p=0.904 n=100+100)
LinkCompiler                    1.09s ± 5%        1.10s ± 4%  +0.63%  (p=0.032 n=100+100)
ExternalLinkCompiler            2.09s ± 3%        2.09s ± 4%    ~     (p=0.220 n=100+100)
LinkWithoutDebugCompiler        511ms ± 8%        515ms ± 6%  +0.92%  (p=0.043 n=99+97)
[Geo mean]                      689ms             689ms       -0.06%

name                      old text-bytes    new text-bytes    delta
HelloSize                       814kB ± 0%        813kB ± 0%  -0.09%  (p=0.000 n=100+100)

name                      old data-bytes    new data-bytes    delta
HelloSize                      13.1kB ± 0%       13.1kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       206kB ± 0%        206kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.21MB ± 0%       1.21MB ± 0%  -0.03%  (p=0.000 n=100+100)



file      before    after     Δ       %       
addr2line 3696741   3696773   +32     +0.001% 
api       5166358   5170862   +4504   +0.087% 
asm       4734867   4738515   +3648   +0.077% 
buildid   2445407   2445631   +224    +0.009% 
cgo       4453043   4453171   +128    +0.003% 
compile   23377375  23428593  +51218  +0.219% 
cover     4565624   4566136   +512    +0.011% 
dist      3288992   3295920   +6928   +0.211% 
doc       3742131   3745099   +2968   +0.079% 
fix       3154314   3159106   +4792   +0.152% 
link      6561029   6562917   +1888   +0.029% 
nm        3666294   3669582   +3288   +0.090% 
objdump   4093865   4095577   +1712   +0.042% 
pack      2218057   2217913   -144    -0.006% 
pprof     13600084  13606156  +6072   +0.045% 
test2json 2521324   2521068   -256    -0.010% 
trace     10223351  10227271  +3920   +0.038% 
vet       7033010   7040450   +7440   +0.106% 
total     108541866 108640740 +98874  +0.091%

gopherbot · 2021-04-05T19:20:43Z

Change https://golang.org/cl/307191 mentions this issue: cmd/compile: hash strings in walk/switch.go

shaneHowearth · 2021-06-21T00:16:05Z

I streamed the compiler development process on Twitch at https://www.twitch.tv/videos/761008858; string hashing stuff starts at 37m20s and runs to the end of the stream. Beware Twitch only stores videos for 14 days.)

Could this content have been stored on another service (eg. Youtube/Vimeo/etc) because, 8 months later and I'm reading this issue gaining some fantastic learnings, and am not able to see that stream, which has obviously been helpful to the creation of the code

gopherbot · 2021-10-20T16:22:41Z

Change https://golang.org/cl/357330 mentions this issue: cmd/compile: implement jump tables

gopherbot · 2022-03-24T20:39:48Z

Change https://go.dev/cl/395714 mentions this issue: cmd/compile: modify switches of strings to use jump table for lengths

Performance is kind of hard to exactly quantify. One big difference between jump tables and the old binary search scheme is that there's only 1 branch statement instead of O(n) of them. That can be both a blessing and a curse, and can make evaluating jump tables very hard to do. The single branch can become a choke point for the hardware branch predictor. A branch table jump must fit all of its state in a single branch predictor entry (technically, a branch target predictor entry). With binary search that predictor state can be spread among lots of entries. In cases where the case selection is repetitive and thus predictable, binary search can perform better. The big win for a jump table is that it doesn't consume so much of the branch predictor's resources. But that benefit is essentially never observed in microbenchmarks, because the branch predictor can easily keep state for all the binary search branches in a microbenchmark. So that benefit is really hard to measure. So predictable switch microbenchmarks are ~useless - they will almost always favor the binary search scheme. Fully unpredictable switch microbenchmarks are better, as they aren't lying to us quite so much. In a perfectly unpredictable situation, a jump table will expect to incur 1-1/N branch mispredicts, where a binary search would incur lg(N)/2 of them. That makes the crossover point at about N=4. But of course switches in real programs are seldom fully unpredictable, so we'll use a higher crossover point. Beyond the branch predictor, jump tables tend to execute more instructions per switch but have no additional instructions per case, which also argues for a larger crossover. As far as code size goes, with this CL cmd/go has a slightly smaller code segment and a slightly larger overall size (from the jump tables themselves which live in the data segment). This is a case where some FDO (feedback-directed optimization) would be really nice to have. #28262 Some large-program benchmarks might help make the case for this CL. Especially if we can turn on branch mispredict counters so we can see how much using jump tables can free up branch prediction resources that can be gainfully used elsewhere in the program. name old time/op new time/op delta Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10) Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9) Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9) Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10) Fixes #5496 Update #34381 Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f Reviewed-on: https://go-review.googlesource.com/c/go/+/357330 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>

Reorganize the way we rewrite expression switches on strings, so that jump tables are naturally used for the outer switch on the string length. The changes to the prove pass in this CL are required so as to not repeat the test for string length in each case. name old time/op new time/op delta SwitchStringPredictable 2.28ns ± 9% 2.08ns ± 5% -9.04% (p=0.000 n=10+10) SwitchStringUnpredictable 10.5ns ± 1% 9.5ns ± 1% -9.08% (p=0.000 n=9+10) Update #5496 Update #34381 Change-Id: Ie6846b1dd27f3e472f7c30dfcc598c68d440b997 Reviewed-on: https://go-review.googlesource.com/c/go/+/395714 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>

mdempsky added the help wanted label Sep 18, 2019

mdempsky added this to the Unplanned milestone Sep 18, 2019

toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 19, 2019

mdempsky mentioned this issue Jun 30, 2021

cmd/compile: constant indexing into map literals should be folded #10848

Closed

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

mknyszek added this to Go Compiler / Runtime Jul 13, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: use hashing for switch statements #34381

cmd/compile: use hashing for switch statements #34381

mdempsky commented Sep 18, 2019

josharian commented Sep 19, 2019

pierrec commented Sep 19, 2019

lranjbar commented Nov 3, 2019

mdempsky commented Nov 3, 2019

lranjbar commented Nov 5, 2019

jupj commented Sep 4, 2020

jupj commented Sep 19, 2020

mdempsky commented Sep 19, 2020

jupj commented Sep 20, 2020 •

edited

Loading

lranjbar commented Sep 20, 2020

gopherbot commented Sep 23, 2020

josharian commented Oct 2, 2020

jupj commented Oct 3, 2020

josharian commented Oct 3, 2020

gopherbot commented Oct 7, 2020

mdempsky commented Oct 7, 2020

jupj commented Oct 10, 2020

jupj commented Apr 5, 2021

gopherbot commented Apr 5, 2021

shaneHowearth commented Jun 21, 2021

gopherbot commented Oct 20, 2021

gopherbot commented Mar 24, 2022

cmd/compile: use hashing for switch statements #34381

cmd/compile: use hashing for switch statements #34381

Comments

mdempsky commented Sep 18, 2019

josharian commented Sep 19, 2019

pierrec commented Sep 19, 2019

lranjbar commented Nov 3, 2019

mdempsky commented Nov 3, 2019

lranjbar commented Nov 5, 2019

jupj commented Sep 4, 2020

jupj commented Sep 19, 2020

mdempsky commented Sep 19, 2020

jupj commented Sep 20, 2020 • edited Loading

lranjbar commented Sep 20, 2020

gopherbot commented Sep 23, 2020

josharian commented Oct 2, 2020

jupj commented Oct 3, 2020

josharian commented Oct 3, 2020

gopherbot commented Oct 7, 2020

mdempsky commented Oct 7, 2020

jupj commented Oct 10, 2020

jupj commented Apr 5, 2021

gopherbot commented Apr 5, 2021

shaneHowearth commented Jun 21, 2021

gopherbot commented Oct 20, 2021

gopherbot commented Mar 24, 2022

jupj commented Sep 20, 2020 •

edited

Loading