Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.ArrayHashMap: base linear_scan_max on cache line size #22861

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

andrewrk
Copy link
Member

I have a hunch that this will be a better default. Will post perf data points shortly.

Comment on lines 608 to 611
const linear_scan_max = @as(comptime_int, @max(1, @as(comptime_int, @min(
std.atomic.cache_line / @as(comptime_int, @max(1, @sizeOf(Hash))),
std.atomic.cache_line / @as(comptime_int, @max(1, @sizeOf(K))),
std.atomic.cache_line / @as(comptime_int, @max(1, @sizeOf(V))),
))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It seems to produce identical result

Suggested change
const linear_scan_max = @as(comptime_int, @max(1, @as(comptime_int, @min(
std.atomic.cache_line / @as(comptime_int, @max(1, @sizeOf(Hash))),
std.atomic.cache_line / @as(comptime_int, @max(1, @sizeOf(K))),
std.atomic.cache_line / @as(comptime_int, @max(1, @sizeOf(V))),
))));
const linear_scan_max: comptime_int = @max(1, @min(
std.atomic.cache_line / @max(1, @sizeOf(Hash)),
std.atomic.cache_line / @max(1, @sizeOf(K)),
std.atomic.cache_line / @max(1, @sizeOf(V)),
));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile error because @max and @min (incorrectly) return a sized integer even when the value is comptime-known

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comptime_int is an awkward type which is fundamentally at odds with the range refinement we want @min/@max to do -- our saviour will hopefully be #3806

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree; comptime_int is basically a ranged integer with the min and max set to the same value. Strongly believe that @max and @min should return comptime_int when the value is comptime known given that #3806 is not implemented

Copy link
Member

@mlugg mlugg Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would lead to a major behavioral inconsistency between runtime and comptime.

EDIT: oh, to be fair, I guess @min/@max already has this issue, so disregard that point. TBH, its range refinement is necessarily kind of weird in the absence of #3806.

@andrewrk
Copy link
Member Author

Pushed an update to only consider hash and key sizes since that's what is iterated over when doing linear scans. Performance seems to be insignificant difference compared to status quo.

build zig with itself:

Benchmark 1 (149 runs): master/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.5ms ± 3.91ms    24.3ms … 44.5ms          0 ( 0%)        0%
  peak_rss           94.5MB ±  745KB    93.0MB … 96.6MB          6 ( 4%)        0%
  cpu_cycles         51.4M  ±  888K     48.8M  … 53.6M           3 ( 2%)        0%
  instructions       69.4M  ± 7.09K     69.3M  … 69.4M           1 ( 1%)        0%
  cache_references   3.62M  ± 31.2K     3.54M  … 3.71M           2 ( 1%)        0%
  cache_misses        537K  ± 17.2K      499K  …  581K           0 ( 0%)        0%
  branch_misses       355K  ± 2.75K      347K  …  365K           1 ( 1%)        0%
Benchmark 2 (153 runs): branch/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          32.7ms ± 3.85ms    24.9ms … 43.4ms          0 ( 0%)          -  2.2% ±  2.6%
  peak_rss           94.4MB ±  670KB    92.7MB … 96.7MB          3 ( 2%)          -  0.1% ±  0.2%
  cpu_cycles         51.5M  ±  968K     49.5M  … 55.3M           7 ( 5%)          +  0.3% ±  0.4%
  instructions       69.3M  ± 7.11K     69.3M  … 69.4M           1 ( 1%)          -  0.1% ±  0.0%
  cache_references   3.61M  ± 33.7K     3.52M  … 3.73M           1 ( 1%)          -  0.4% ±  0.2%
  cache_misses        532K  ± 17.7K      495K  …  583K           1 ( 1%)          -  0.8% ±  0.7%
  branch_misses       357K  ± 3.67K      349K  …  379K           2 ( 1%)          +  0.7% ±  0.2%

wasm linker hello world (wasm linker uses a lot of array hash map):

Benchmark 1 (148 runs): master/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.7ms ± 4.61ms    22.3ms … 43.4ms          0 ( 0%)        0%
  peak_rss           94.4MB ±  676KB    92.9MB … 96.2MB          3 ( 2%)        0%
  cpu_cycles         51.2M  ±  985K     48.7M  … 53.9M           6 ( 4%)        0%
  instructions       69.4M  ± 6.37K     69.3M  … 69.4M           1 ( 1%)        0%
  cache_references   3.63M  ± 30.4K     3.57M  … 3.73M           3 ( 2%)        0%
  cache_misses        537K  ± 15.1K      505K  …  589K           2 ( 1%)        0%
  branch_misses       355K  ± 2.80K      350K  …  365K           2 ( 1%)        0%
Benchmark 2 (153 runs): branch/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          32.8ms ± 4.13ms    22.2ms … 44.9ms          0 ( 0%)          -  2.8% ±  2.9%
  peak_rss           94.4MB ±  689KB    92.7MB … 96.1MB          2 ( 1%)          -  0.0% ±  0.2%
  cpu_cycles         51.5M  ±  972K     49.2M  … 55.8M           2 ( 1%)          +  0.5% ±  0.4%
  instructions       69.3M  ± 6.05K     69.3M  … 69.3M           0 ( 0%)          -  0.1% ±  0.0%
  cache_references   3.61M  ± 30.1K     3.54M  … 3.71M           2 ( 1%)          -  0.5% ±  0.2%
  cache_misses        532K  ± 17.7K      504K  …  582K           1 ( 1%)          -  0.9% ±  0.7%
  branch_misses       357K  ± 3.02K      350K  …  367K           2 ( 1%)          +  0.6% ±  0.2%

I'm going to keep the change because it feels good 🌈

@andrewrk andrewrk enabled auto-merge (rebase) February 12, 2025 00:12
@andrewrk andrewrk merged commit 53216d2 into master Feb 12, 2025
9 checks passed
@andrewrk andrewrk deleted the linear-scan-max branch February 12, 2025 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants