Skip to content

Conversation

@mateusz834
Copy link
Contributor

See #25810

Copy link
Member

@squeek502 squeek502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I jumped the gun on merging #25810 for a number of reasons.

The @memsets should also be added to shrinkRetainingCapacity and clearRetainingCapacity in Aligned too; right now they are only in AlignedManaged.

@mateusz834 mateusz834 force-pushed the actually-memset-to-undefiined branch from c406505 to 75e4034 Compare November 6, 2025 07:57
@mateusz834
Copy link
Contributor Author

@squeek502 updated

@mateusz834 mateusz834 requested a review from squeek502 November 6, 2025 07:58
@squeek502
Copy link
Member

Double checked that it behaves as expected with this code and some printing in shrinkRetainingCapacity/clearRetainingCapacity to print how many bytes are being memset:

const std = @import("std");

pub fn main() !void {
    var arena_state = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena_state.deinit();
    const arena = arena_state.allocator();

    // a few more bytes than we're going to populate so there's
    // some undefined bytes remaining after the initial append
    var list: std.ArrayList(u8) = try .initCapacity(arena, 30);

    try list.appendSlice(arena, "abcdefghijklmnopqrstuvwxyz");
    std.debug.dumpHex(list.allocatedSlice());

    list.shrinkRetainingCapacity(8);
    std.debug.dumpHex(list.allocatedSlice());

    list.clearRetainingCapacity();
    std.debug.dumpHex(list.allocatedSlice());
}
000073d0d6724010  61 62 63 64 65 66 67 68  69 6A 6B 6C 6D 6E 6F 70  abcdefghijklmnop
000073d0d6724020  71 72 73 74 75 76 77 78  79 7A AA AA AA AA        qrstuvwxyz....
18 bytes memset
000073d0d6724010  61 62 63 64 65 66 67 68  AA AA AA AA AA AA AA AA  abcdefgh........
000073d0d6724020  AA AA AA AA AA AA AA AA  AA AA AA AA AA AA        ..............
8 bytes memset
000073d0d6724010  AA AA AA AA AA AA AA AA  AA AA AA AA AA AA AA AA  ................
000073d0d6724020  AA AA AA AA AA AA AA AA  AA AA AA AA AA AA        ..............

Performance data points:

Compiler built in Debug mode with the self-hosted x86_64 backend compiling hello world:

Benchmark 1 (3 runs): ./stage4/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.6s  ± 88.4ms    10.5s  … 10.7s           0 ( 0%)        0%
  peak_rss            253MB ± 1.61MB     252MB …  255MB          0 ( 0%)        0%
  cpu_cycles         44.8G  ± 39.8M     44.8G  … 44.8G           0 ( 0%)        0%
  instructions       40.9G  ± 5.83M     40.9G  … 41.0G           0 ( 0%)        0%
  cache_references   3.60G  ± 2.12M     3.60G  … 3.60G           0 ( 0%)        0%
  cache_misses       1.00G  ± 3.84M      997M  … 1.00G           0 ( 0%)        0%
  branch_misses      96.0M  ±  272K     95.7M  … 96.2M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./stage4-memset/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.8s  ± 69.8ms    10.7s  … 10.9s           0 ( 0%)          +  1.7% ±  1.7%
  peak_rss            253MB ±  523KB     252MB …  253MB          0 ( 0%)          -  0.2% ±  1.1%
  cpu_cycles         45.2G  ±  334M     44.8G  … 45.5G           0 ( 0%)          +  0.8% ±  1.2%
  instructions       41.3G  ± 2.80M     41.3G  … 41.3G           0 ( 0%)          +  0.7% ±  0.0%
  cache_references   3.62G  ± 20.2M     3.60G  … 3.63G           0 ( 0%)          +  0.5% ±  0.9%
  cache_misses       1.00G  ± 5.75M      995M  … 1.01G           0 ( 0%)          +  0.1% ±  1.1%
  branch_misses      98.2M  ± 1.14M     96.9M  … 99.0M           0 ( 0%)          +  2.3% ±  2.0%

Compiler built in Debug mode with the LLVM backend compiling hello world:

Benchmark 1 (3 runs): ./stage4-llvm/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.3s  ±  110ms    10.2s  … 10.4s           0 ( 0%)        0%
  peak_rss            231MB ±  734KB     231MB …  232MB          0 ( 0%)        0%
  cpu_cycles         41.9G  ± 56.6M     41.8G  … 41.9G           0 ( 0%)        0%
  instructions       49.8G  ± 4.93M     49.8G  … 49.8G           0 ( 0%)        0%
  cache_references   3.15G  ± 3.54M     3.15G  … 3.15G           0 ( 0%)        0%
  cache_misses        829M  ± 2.55M      826M  …  831M           0 ( 0%)        0%
  branch_misses       102M  ±  246K      101M  …  102M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./stage4-llvm-memset/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.8s  ± 32.1ms    10.8s  … 10.8s           0 ( 0%)        💩+  4.6% ±  1.8%
  peak_rss            230MB ± 1.21MB     228MB …  231MB          0 ( 0%)          -  0.6% ±  1.0%
  cpu_cycles         41.3G  ±  285M     41.1G  … 41.6G           0 ( 0%)          -  1.4% ±  1.1%
  instructions       49.6G  ± 8.72M     49.6G  … 49.6G           0 ( 0%)          -  0.3% ±  0.0%
  cache_references   3.08G  ± 14.2M     3.07G  … 3.10G           0 ( 0%)        ⚡-  2.1% ±  0.7%
  cache_misses        813M  ± 5.34M      810M  …  819M           0 ( 0%)          -  1.8% ±  1.1%
  branch_misses      99.9M  ± 1.01M     99.2M  …  101M           0 ( 0%)          -  1.8% ±  1.6%

Standard library tests built in Debug mode with the self-hosted x86_64 backend:

Benchmark 1 (3 runs): ./test-std
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          18.0s  ± 26.7ms    18.0s  … 18.0s           0 ( 0%)        0%
  peak_rss           95.1MB ± 63.6KB    95.0MB … 95.1MB          0 ( 0%)        0%
  cpu_cycles         84.8G  ± 56.1M     84.7G  … 84.9G           0 ( 0%)        0%
  instructions        119G  ± 3.72M      119G  …  119G           0 ( 0%)        0%
  cache_references   3.25G  ± 5.27M     3.24G  … 3.25G           0 ( 0%)        0%
  cache_misses       40.6M  ± 1.13M     39.9M  … 41.9M           0 ( 0%)        0%
  branch_misses      30.2M  ±  125K     30.1M  … 30.3M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./test-std-memset
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          18.1s  ± 44.5ms    18.0s  … 18.1s           0 ( 0%)          +  0.4% ±  0.5%
  peak_rss           95.2MB ± 64.8KB    95.1MB … 95.3MB          0 ( 0%)          +  0.1% ±  0.2%
  cpu_cycles         85.0G  ±  236M     84.7G  … 85.1G           0 ( 0%)          +  0.2% ±  0.5%
  instructions        119G  ± 2.77M      119G  …  119G           0 ( 0%)          -  0.0% ±  0.0%
  cache_references   3.04G  ± 3.06M     3.04G  … 3.05G           0 ( 0%)        ⚡-  6.4% ±  0.3%
  cache_misses       41.4M  ± 1.23M     40.1M  … 42.5M           0 ( 0%)          +  2.1% ±  6.6%
  branch_misses      30.1M  ±  104K     30.0M  … 30.2M           0 ( 0%)          -  0.1% ±  0.9%

Standard library tests built in Debug mode with the LLVM backend:

Benchmark 1 (3 runs): ./test-std-llvm
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          11.2s  ± 3.37ms    11.2s  … 11.2s           0 ( 0%)        0%
  peak_rss           91.4MB ±  109KB    91.3MB … 91.5MB          0 ( 0%)        0%
  cpu_cycles         50.2G  ± 92.2M     50.1G  … 50.3G           0 ( 0%)        0%
  instructions       89.8G  ± 16.9M     89.8G  … 89.8G           0 ( 0%)        0%
  cache_references   2.77G  ± 16.2M     2.76G  … 2.79G           0 ( 0%)        0%
  cache_misses       32.9M  ±  800K     32.3M  … 33.8M           0 ( 0%)        0%
  branch_misses      37.2M  ±  293K     37.0M  … 37.6M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./test-std-llvm-memset
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          11.1s  ± 58.1ms    11.0s  … 11.2s           0 ( 0%)          -  0.7% ±  0.8%
  peak_rss           91.4MB ± 91.1KB    91.3MB … 91.5MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles         49.9G  ±  136M     49.8G  … 50.1G           0 ( 0%)          -  0.5% ±  0.5%
  instructions       89.8G  ± 3.95M     89.8G  … 89.8G           0 ( 0%)          -  0.0% ±  0.0%
  cache_references   2.75G  ± 6.09M     2.74G  … 2.75G           0 ( 0%)          -  0.8% ±  1.0%
  cache_misses       34.0M  ±  676K     33.4M  … 34.7M           0 ( 0%)          +  3.4% ±  5.1%
  branch_misses      31.6M  ±  129K     31.5M  … 31.7M           0 ( 0%)        ⚡- 15.0% ±  1.4%

@squeek502 squeek502 enabled auto-merge (rebase) November 6, 2025 08:28
@squeek502 squeek502 merged commit b2895f3 into ziglang:master Nov 6, 2025
9 checks passed
mateusz834 added a commit to mateusz834/zig that referenced this pull request Nov 8, 2025
…ist.

After ziglang#25832 it has safetey features (memset to undefined), we can take
use of that here, since scratch is a temporary storage.
mateusz834 added a commit to mateusz834/zig that referenced this pull request Nov 8, 2025
…ist.

After ziglang#25832 it has safety features (memset to undefined), we can take
use of that here, since scratch is a temporary storage.
mateusz834 added a commit to mateusz834/zig that referenced this pull request Nov 8, 2025
After ziglang#25832 it has safety features (memset to undefined), we can take
use of that here, since scratch is a temporary storage.
mateusz834 added a commit to mateusz834/zig that referenced this pull request Nov 8, 2025
After ziglang#25832 it has safety features (memset to undefined), we can take
use of that here, since scratch is a temporary storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants