Use a slice for InstMap instead of std.HashMap #11977

Hejsil · 2022-07-01T14:17:09Z

The sema.inst_map datastructure is very often accessed. All instructions that reference the result of other instructions does a lookup into this field. Because of this, a significant amount of time, is spent in std.HashMap.get.

This commit replaces the HashMap with a simpler data structure that simply uses the zir indexes to index into a slice for the result. See the data structure doc comment for more info.

Performance results:

> $(which time) -v perf stat -r 10 stage2-release/bin/zig build-exe \
      ../zig2/src/main.zig --pkg-begin build_options options.zig \
      --pkg-end -lc -fno-emit-bin

                master           new-inst-map
instructions  15,565,458,749   14,235,238,027   ~8.5% less
    branches   2,547,101,394    2,294,805,250   ~9.9% less
        time        2.40819s         2.25372s   ~6.4% less
                                               (~6.8% faster)

This might not be exactly how we want to do this. Someone might want to explore making HashMap faster (though it is hard to imagine it beating the performance of map.items[key-map.start]). Maybe this is a more general data structure (IntIndexMap) that we can put into the standard library.

I'll just put this PR here for discussion

Hejsil · 2022-07-01T14:20:48Z

src/Module.zig

+    for (fn_info.param_body) |inst|
+        try sema.inst_map.ensureSpaceForKey(gpa, inst);


ensureSpaceForKey in a loop here and calling putAssumeCapacity* in the loop below ends up producing faster and better code. Probably because it makes each of the two loops "tighter" which is probably both easier to optimize and for the CPU to predict

Hejsil · 2022-07-01T14:43:25Z

Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to put could not fail.

InKryption · 2022-07-01T16:40:28Z

Could this not be solved by just using a more optimized version of std.HashMap, in place of std.AutoHashMap?

Hejsil · 2022-07-01T16:48:17Z

Could this not be solved by just using a more optimized version of std.HashMap, in place of std.AutoHashMap?

Maybe, but I'm unsure how this would be done. In the end, HashMap.get does some none trivial work, and I have a hard time imagining any tuning that is going to beat:

if (key < map.start)
    return null;

const index = key - map.start;
if (map.items.len <= index or map.items[index] == .none)
    return null;
return map.items[index];

No loop, no hashing, no collision, no nothing. Three checks, one subtraction and one access at some offset.

Edit: Edited the code to actually be correct

InKryption · 2022-07-01T17:13:42Z

the hash function needn't actually hash, if all you're doing is using the integers as actual indexes, you just have the function return the integer. But anyway, this looks good enough to me.

Hejsil · 2022-07-01T17:19:53Z

the hash function needn't actually hash, if all you're doing is using the integers as actual indexes, you just have the function return the integer.

Well, we don't really want to allocate a map for the entire zir.instructions, so the current implementation keeps track of a start index. This is updated as the map is resized with new input.

If we could allocate all the space we need upfront, maybe. See:

Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to put could not fail.

But at that point, why use hashmap, and make the optimizers job more complicated, when you now don't even have to check if you are in bounds (you always would be). This data structure when then just be:

const InstMap = struct {
    items: []Air.Inst.Ref,
    start: Zir.Inst.Index,

    fn get(map: InstMap, key: Zir.Inst.Index) ?Air.Inst.Ref {
        if (map.items[key - map.start] == .none) return null;
        return map.items[key - map.start];
    }

    fn put(map: InstMap, key: Zir.Inst.Index, value: Air.Inst.Ref) void {
        map.items[key - map.start] = value;
    }
};

The `sema.inst_map` datastructure is very often accessed. All instructions that reference the result of other instructions does a lookup into this field. Because of thisi, a significant amount of time, is spent in `std.HashMap.get`. This commit replaces the `HashMap` with a simpler data structure that uses the zir indexes to index into a slice for the result. See the data structure doc comment for more info. Performance results: > $(which time) -v perf stat -r 10 stage2-release/bin/zig build-exe \ ../zig2/src/main.zig --pkg-begin build_options options.zig \ --pkg-end -lc -fno-emit-bin master new-inst-map instructions 15,565,458,749 13,987,614,034 ~10.1% less branches 2,547,101,394 2,255,015,269 ~11.5% less time 2.40819s 2.19619s ~8.8% less (~9.7% faster)

Hejsil · 2022-07-03T10:14:31Z

Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to put could not fail.

Figured out a way. New numbers:

                master           new-inst-map
  instructions  15,565,458,749   13,987,614,034  ~10.1% less
      branches   2,547,101,394    2,255,015,269  ~11.5% less
          time        2.40819s         2.19619s   ~8.8% less
                                                 (~9.7% faster)

ifreund · 2022-07-03T11:24:46Z

Got any numbers comparing the (peak) memory usage with this change?

Hejsil · 2022-07-03T14:34:15Z

Got any numbers comparing the (peak) memory usage with this change?

I did run the time -v to measure this, but didn't write it down because it seemed like an insignificant difference. Probably should have done so, though. I'll get some numbers tomorrow when I figure out why cicd failed for these new changes.

andrewrk · 2022-07-03T18:45:39Z

A degenerate case for this would look something like:

fn foo() void {
    const S = struct {
        // ... a hundred thousand lines of code
    };
    // the function body of foo
}

Seems like a pretty uncommon situation, however, so perhaps it would make sense to do this optimization in order to enhance the more common case. I did have it this way originally btw, and begrudgingly changed from a slice to a hash map as part of #8554.

One wild idea I've been kicking around in my head would be to introduce a ZIR pass that rewrites ZIR to have optimized memory layout for Sema purposes. This might have payoffs since such a pass would be embarrassingly parallel, and Sema is much trickier to parallelize - so any work that can be moved from Sema into AstGen is a win. Furthermore, ZIR is cached per file, so usually any memory layout optimization would be done once and then benefits reaped multiple times. Among things such as removing "holes" that appear from patch-ups, removing unreferenced instructions, it could also re-number the ZIR indexes so that every function body instruction was sequential.

Hejsil · 2022-07-04T14:22:19Z

Got any numbers comparing the (peak) memory usage with this change?

Average over 20 runs of each (in kbytes)

master  new-inst-map
348621  348726

Basically insignificant

Also, unsure why cicd fails. The stack trace points to out of date line numbers for the lines printed. Unsure what it is doing in the pass where it fails

andrewrk · 2022-09-01T02:13:53Z

I have attempted to rebase this against master branch on your behalf in order to re-run the failing CI checks. However there are conflicts. If you would like to continue working on this, please open a new PR against master branch.

Hejsil commented Jul 1, 2022

View reviewed changes

Hejsil force-pushed the faster-inst-map branch from 250e512 to cbf724c Compare July 1, 2022 16:31

Hejsil force-pushed the faster-inst-map branch from cbf724c to 7011dc9 Compare July 3, 2022 10:13

andrewrk closed this Sep 1, 2022

Hejsil mentioned this pull request Nov 25, 2022

Use a slice for InstMap instead of std.HashMap #13650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use a slice for InstMap instead of std.HashMap #11977

Use a slice for InstMap instead of std.HashMap #11977

Uh oh!

Hejsil commented Jul 1, 2022

Uh oh!

Hejsil Jul 1, 2022

Uh oh!

Hejsil commented Jul 1, 2022

Uh oh!

InKryption commented Jul 1, 2022

Uh oh!

Hejsil commented Jul 1, 2022 •

edited

Loading

Uh oh!

InKryption commented Jul 1, 2022 •

edited

Loading

Uh oh!

Hejsil commented Jul 1, 2022

Uh oh!

Hejsil commented Jul 3, 2022

Uh oh!

ifreund commented Jul 3, 2022

Uh oh!

Hejsil commented Jul 3, 2022

Uh oh!

andrewrk commented Jul 3, 2022 •

edited

Loading

Uh oh!

Hejsil commented Jul 4, 2022

Uh oh!

andrewrk commented Sep 1, 2022

Uh oh!

Uh oh!

		for (fn_info.param_body) \|inst\|
		try sema.inst_map.ensureSpaceForKey(gpa, inst);

Uh oh!

Use a slice for InstMap instead of std.HashMap #11977

Use a slice for InstMap instead of std.HashMap #11977

Uh oh!

Conversation

Hejsil commented Jul 1, 2022

Uh oh!

Hejsil Jul 1, 2022

Choose a reason for hiding this comment

Uh oh!

Hejsil commented Jul 1, 2022

Uh oh!

InKryption commented Jul 1, 2022

Uh oh!

Hejsil commented Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

InKryption commented Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hejsil commented Jul 1, 2022

Uh oh!

Hejsil commented Jul 3, 2022

Uh oh!

ifreund commented Jul 3, 2022

Uh oh!

Hejsil commented Jul 3, 2022

Uh oh!

andrewrk commented Jul 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hejsil commented Jul 4, 2022

Uh oh!

andrewrk commented Sep 1, 2022

Uh oh!

Uh oh!

Hejsil commented Jul 1, 2022 •

edited

Loading

InKryption commented Jul 1, 2022 •

edited

Loading

andrewrk commented Jul 3, 2022 •

edited

Loading