Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a slice for InstMap instead of std.HashMap #11977

Closed
wants to merge 1 commit into from

Conversation

Hejsil
Copy link
Contributor

@Hejsil Hejsil commented Jul 1, 2022

The sema.inst_map datastructure is very often accessed. All instructions that reference the result of other instructions does a lookup into this field. Because of this, a significant amount of time, is spent in std.HashMap.get.

This commit replaces the HashMap with a simpler data structure that simply uses the zir indexes to index into a slice for the result. See the data structure doc comment for more info.

Performance results:

> $(which time) -v perf stat -r 10 stage2-release/bin/zig build-exe \
      ../zig2/src/main.zig --pkg-begin build_options options.zig \
      --pkg-end -lc -fno-emit-bin

                master           new-inst-map
instructions  15,565,458,749   14,235,238,027   ~8.5% less
    branches   2,547,101,394    2,294,805,250   ~9.9% less
        time        2.40819s         2.25372s   ~6.4% less
                                               (~6.8% faster)

This might not be exactly how we want to do this. Someone might want to explore making HashMap faster (though it is hard to imagine it beating the performance of map.items[key-map.start]). Maybe this is a more general data structure (IntIndexMap) that we can put into the standard library.

I'll just put this PR here for discussion

src/Module.zig Outdated
Comment on lines 4978 to 4979
for (fn_info.param_body) |inst|
try sema.inst_map.ensureSpaceForKey(gpa, inst);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureSpaceForKey in a loop here and calling putAssumeCapacity* in the loop below ends up producing faster and better code. Probably because it makes each of the two loops "tighter" which is probably both easier to optimize and for the CPU to predict

@Hejsil
Copy link
Contributor Author

Hejsil commented Jul 1, 2022

Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to put could not fail.

@Hejsil Hejsil force-pushed the faster-inst-map branch from 250e512 to cbf724c Compare July 1, 2022 16:31
@InKryption
Copy link
Contributor

Could this not be solved by just using a more optimized version of std.HashMap, in place of std.AutoHashMap?

@Hejsil
Copy link
Contributor Author

Hejsil commented Jul 1, 2022

Could this not be solved by just using a more optimized version of std.HashMap, in place of std.AutoHashMap?

Maybe, but I'm unsure how this would be done. In the end, HashMap.get does some none trivial work, and I have a hard time imagining any tuning that is going to beat:

if (key < map.start)
    return null;

const index = key - map.start;
if (map.items.len <= index or map.items[index] == .none)
    return null;
return map.items[index];

No loop, no hashing, no collision, no nothing. Three checks, one subtraction and one access at some offset.

Edit: Edited the code to actually be correct

@InKryption
Copy link
Contributor

InKryption commented Jul 1, 2022

the hash function needn't actually hash, if all you're doing is using the integers as actual indexes, you just have the function return the integer. But anyway, this looks good enough to me.

@Hejsil
Copy link
Contributor Author

Hejsil commented Jul 1, 2022

the hash function needn't actually hash, if all you're doing is using the integers as actual indexes, you just have the function return the integer.

Well, we don't really want to allocate a map for the entire zir.instructions, so the current implementation keeps track of a start index. This is updated as the map is resized with new input.

If we could allocate all the space we need upfront, maybe. See:

Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to put could not fail.

But at that point, why use hashmap, and make the optimizers job more complicated, when you now don't even have to check if you are in bounds (you always would be). This data structure when then just be:

const InstMap = struct {
    items: []Air.Inst.Ref,
    start: Zir.Inst.Index,

    fn get(map: InstMap, key: Zir.Inst.Index) ?Air.Inst.Ref {
        if (map.items[key - map.start] == .none) return null;
        return map.items[key - map.start];
    }

    fn put(map: InstMap, key: Zir.Inst.Index, value: Air.Inst.Ref) void {
        map.items[key - map.start] = value;
    }
};

The `sema.inst_map` datastructure is very often accessed. All
instructions that reference the result of other instructions does a
lookup into this field. Because of thisi, a significant amount of time,
is spent in `std.HashMap.get`.

This commit replaces the `HashMap` with a simpler data structure that
uses the zir indexes to index into a slice for the result. See the data
structure doc comment for more info.

Performance results:

  > $(which time) -v perf stat -r 10 stage2-release/bin/zig build-exe \
        ../zig2/src/main.zig --pkg-begin build_options options.zig \
        --pkg-end -lc -fno-emit-bin

                master           new-inst-map
  instructions  15,565,458,749   13,987,614,034  ~10.1% less
      branches   2,547,101,394    2,255,015,269  ~11.5% less
          time        2.40819s         2.19619s   ~8.8% less
                                                 (~9.7% faster)
@Hejsil Hejsil force-pushed the faster-inst-map branch from cbf724c to 7011dc9 Compare July 3, 2022 10:13
@Hejsil
Copy link
Contributor Author

Hejsil commented Jul 3, 2022

Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to put could not fail.

Figured out a way. New numbers:

                master           new-inst-map
  instructions  15,565,458,749   13,987,614,034  ~10.1% less
      branches   2,547,101,394    2,255,015,269  ~11.5% less
          time        2.40819s         2.19619s   ~8.8% less
                                                 (~9.7% faster)

@ifreund
Copy link
Member

ifreund commented Jul 3, 2022

Got any numbers comparing the (peak) memory usage with this change?

@Hejsil
Copy link
Contributor Author

Hejsil commented Jul 3, 2022

Got any numbers comparing the (peak) memory usage with this change?

I did run the time -v to measure this, but didn't write it down because it seemed like an insignificant difference. Probably should have done so, though. I'll get some numbers tomorrow when I figure out why cicd failed for these new changes.

@andrewrk
Copy link
Member

andrewrk commented Jul 3, 2022

A degenerate case for this would look something like:

fn foo() void {
    const S = struct {
        // ... a hundred thousand lines of code
    };
    // the function body of foo
}

Seems like a pretty uncommon situation, however, so perhaps it would make sense to do this optimization in order to enhance the more common case. I did have it this way originally btw, and begrudgingly changed from a slice to a hash map as part of #8554.

One wild idea I've been kicking around in my head would be to introduce a ZIR pass that rewrites ZIR to have optimized memory layout for Sema purposes. This might have payoffs since such a pass would be embarrassingly parallel, and Sema is much trickier to parallelize - so any work that can be moved from Sema into AstGen is a win. Furthermore, ZIR is cached per file, so usually any memory layout optimization would be done once and then benefits reaped multiple times. Among things such as removing "holes" that appear from patch-ups, removing unreferenced instructions, it could also re-number the ZIR indexes so that every function body instruction was sequential.

@Hejsil
Copy link
Contributor Author

Hejsil commented Jul 4, 2022

Got any numbers comparing the (peak) memory usage with this change?

Average over 20 runs of each (in kbytes)

master  new-inst-map
348621  348726

Basically insignificant

Also, unsure why cicd fails. The stack trace points to out of date line numbers for the lines printed. Unsure what it is doing in the pass where it fails

@andrewrk
Copy link
Member

andrewrk commented Sep 1, 2022

I have attempted to rebase this against master branch on your behalf in order to re-run the failing CI checks. However there are conflicts. If you would like to continue working on this, please open a new PR against master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants