-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a slice for InstMap instead of std.HashMap #11977
Conversation
src/Module.zig
Outdated
for (fn_info.param_body) |inst| | ||
try sema.inst_map.ensureSpaceForKey(gpa, inst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensureSpaceForKey
in a loop here and calling putAssumeCapacity*
in the loop below ends up producing faster and better code. Probably because it makes each of the two loops "tighter" which is probably both easier to optimize and for the CPU to predict
Also, if there is some way to determine the smallest and biggest zir index before doing any analysis, then we could simplify the structure even further. We could allocate the entire map ahead of time, making it so all calls to |
Could this not be solved by just using a more optimized version of |
Maybe, but I'm unsure how this would be done. In the end, if (key < map.start)
return null;
const index = key - map.start;
if (map.items.len <= index or map.items[index] == .none)
return null;
return map.items[index]; No loop, no hashing, no collision, no nothing. Three checks, one subtraction and one access at some offset. Edit: Edited the code to actually be correct |
the hash function needn't actually hash, if all you're doing is using the integers as actual indexes, you just have the function return the integer. But anyway, this looks good enough to me. |
Well, we don't really want to allocate a map for the entire If we could allocate all the space we need upfront, maybe. See:
But at that point, why use hashmap, and make the optimizers job more complicated, when you now don't even have to check if you are in bounds (you always would be). This data structure when then just be: const InstMap = struct {
items: []Air.Inst.Ref,
start: Zir.Inst.Index,
fn get(map: InstMap, key: Zir.Inst.Index) ?Air.Inst.Ref {
if (map.items[key - map.start] == .none) return null;
return map.items[key - map.start];
}
fn put(map: InstMap, key: Zir.Inst.Index, value: Air.Inst.Ref) void {
map.items[key - map.start] = value;
}
};
|
The `sema.inst_map` datastructure is very often accessed. All instructions that reference the result of other instructions does a lookup into this field. Because of thisi, a significant amount of time, is spent in `std.HashMap.get`. This commit replaces the `HashMap` with a simpler data structure that uses the zir indexes to index into a slice for the result. See the data structure doc comment for more info. Performance results: > $(which time) -v perf stat -r 10 stage2-release/bin/zig build-exe \ ../zig2/src/main.zig --pkg-begin build_options options.zig \ --pkg-end -lc -fno-emit-bin master new-inst-map instructions 15,565,458,749 13,987,614,034 ~10.1% less branches 2,547,101,394 2,255,015,269 ~11.5% less time 2.40819s 2.19619s ~8.8% less (~9.7% faster)
Figured out a way. New numbers:
|
Got any numbers comparing the (peak) memory usage with this change? |
I did run the |
A degenerate case for this would look something like: fn foo() void {
const S = struct {
// ... a hundred thousand lines of code
};
// the function body of foo
} Seems like a pretty uncommon situation, however, so perhaps it would make sense to do this optimization in order to enhance the more common case. I did have it this way originally btw, and begrudgingly changed from a slice to a hash map as part of #8554. One wild idea I've been kicking around in my head would be to introduce a ZIR pass that rewrites ZIR to have optimized memory layout for Sema purposes. This might have payoffs since such a pass would be embarrassingly parallel, and Sema is much trickier to parallelize - so any work that can be moved from Sema into AstGen is a win. Furthermore, ZIR is cached per file, so usually any memory layout optimization would be done once and then benefits reaped multiple times. Among things such as removing "holes" that appear from patch-ups, removing unreferenced instructions, it could also re-number the ZIR indexes so that every function body instruction was sequential. |
Average over 20 runs of each (in kbytes)
Basically insignificant Also, unsure why cicd fails. The stack trace points to out of date line numbers for the lines printed. Unsure what it is doing in the pass where it fails |
I have attempted to rebase this against master branch on your behalf in order to re-run the failing CI checks. However there are conflicts. If you would like to continue working on this, please open a new PR against master branch. |
The
sema.inst_map
datastructure is very often accessed. All instructions that reference the result of other instructions does a lookup into this field. Because of this, a significant amount of time, is spent instd.HashMap.get
.This commit replaces the
HashMap
with a simpler data structure that simply uses the zir indexes to index into a slice for the result. See the data structure doc comment for more info.Performance results:
This might not be exactly how we want to do this. Someone might want to explore making
HashMap
faster (though it is hard to imagine it beating the performance ofmap.items[key-map.start]
). Maybe this is a more general data structure (IntIndexMap
) that we can put into the standard library.I'll just put this PR here for discussion