Add zero-compromise directory iteration #457

SUPERCILEX · 2022-11-21T01:48:25Z

Closes #451

Background reading:

Notes

Based on the above background reading, the following assumptions appear to be sound:

The kernel will align returned dirents and they will be contiguous.
Seeking into garbage offsets is safe as the kernel will not return partial dirents. In fact, I've discovered that the d_off is actually a cookie that you can use to seek to the next dirent. It has no relation to byte offsets.

Benchmarks

Benchmark 1: ./nix-raw /tmp/ftzz-test
  Time (mean ± σ):     197.4 ms ±   5.2 ms    [User: 5.8 ms, System: 190.1 ms]
  Range (min … max):   191.2 ms … 208.3 ms    15 runs
 
Benchmark 2: ./rustix-raw /tmp/ftzz-test
  Time (mean ± σ):     190.9 ms ±   4.1 ms    [User: 2.4 ms, System: 188.0 ms]
  Range (min … max):   186.7 ms … 202.1 ms    15 runs
 
Benchmark 3: ./nix /tmp/ftzz-test
  Time (mean ± σ):     228.7 ms ±   7.2 ms    [User: 39.9 ms, System: 188.7 ms]
  Range (min … max):   223.0 ms … 250.2 ms    13 runs
 
Benchmark 4: ./rustix /tmp/ftzz-test
  Time (mean ± σ):     246.3 ms ±   6.9 ms    [User: 40.9 ms, System: 201.8 ms]
  Range (min … max):   235.5 ms … 256.5 ms    11 runs
 
Benchmark 5: ./stdlib /tmp/ftzz-test
  Time (mean ± σ):     237.6 ms ±   5.2 ms    [User: 56.0 ms, System: 180.0 ms]
  Range (min … max):   232.0 ms … 246.8 ms    12 runs
 
Summary
  './rustix-raw /tmp/ftzz-test' ran
    1.03 ± 0.04 times faster than './nix-raw /tmp/ftzz-test'
    1.20 ± 0.05 times faster than './nix /tmp/ftzz-test'
    1.24 ± 0.04 times faster than './stdlib /tmp/ftzz-test'
    1.29 ± 0.05 times faster than './rustix /tmp/ftzz-test'

Optimal buf size analysis

I set up benchmarks that used power-of-two sized buffers. 2^13 seemed optimal and it's also what the stdlib uses in its BufWriter and BufReader implementations.

Benchmark 1: ./rustix5 /tmp/ftzz-1
  Time (mean ± σ):     345.7 ms ±   2.8 ms    [User: 35.5 ms, System: 309.9 ms]
  Range (min … max):   343.4 ms … 353.1 ms    10 runs
 
Benchmark 2: ./rustix6 /tmp/ftzz-1
  Time (mean ± σ):     258.5 ms ±   1.9 ms    [User: 24.9 ms, System: 233.4 ms]
  Range (min … max):   255.3 ms … 260.8 ms    11 runs
 
Benchmark 3: ./rustix7 /tmp/ftzz-1
  Time (mean ± σ):     217.8 ms ±   3.4 ms    [User: 17.4 ms, System: 200.2 ms]
  Range (min … max):   214.6 ms … 228.4 ms    13 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 4: ./rustix8 /tmp/ftzz-1
  Time (mean ± σ):     194.1 ms ±   0.9 ms    [User: 15.9 ms, System: 178.1 ms]
  Range (min … max):   192.9 ms … 196.1 ms    15 runs
 
Benchmark 5: ./rustix9 /tmp/ftzz-1
  Time (mean ± σ):     181.3 ms ±   0.8 ms    [User: 13.9 ms, System: 167.2 ms]
  Range (min … max):   179.0 ms … 182.7 ms    16 runs
 
Benchmark 6: ./rustix10 /tmp/ftzz-1
  Time (mean ± σ):     175.1 ms ±   1.6 ms    [User: 12.4 ms, System: 162.6 ms]
  Range (min … max):   171.0 ms … 177.6 ms    17 runs
 
Benchmark 7: ./rustix11 /tmp/ftzz-1
  Time (mean ± σ):     170.3 ms ±   2.2 ms    [User: 12.8 ms, System: 157.3 ms]
  Range (min … max):   167.6 ms … 173.1 ms    17 runs
 
Benchmark 8: ./rustix12 /tmp/ftzz-1
  Time (mean ± σ):     170.4 ms ±   0.6 ms    [User: 9.1 ms, System: 161.3 ms]
  Range (min … max):   170.0 ms … 172.1 ms    17 runs
 
Benchmark 9: ./rustix13 /tmp/ftzz-1
  Time (mean ± σ):     169.2 ms ±   1.8 ms    [User: 8.8 ms, System: 160.2 ms]
  Range (min … max):   165.4 ms … 173.1 ms    17 runs
 
Benchmark 10: ./rustix15 /tmp/ftzz-1
  Time (mean ± σ):     167.9 ms ±   2.2 ms    [User: 11.2 ms, System: 156.7 ms]
  Range (min … max):   164.7 ms … 172.5 ms    17 runs
 
Benchmark 11: ./rustix20 /tmp/ftzz-1
  Time (mean ± σ):     168.4 ms ±   1.0 ms    [User: 9.5 ms, System: 158.8 ms]
  Range (min … max):   167.7 ms … 171.8 ms    17 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  './rustix15 /tmp/ftzz-1' ran
    1.00 ± 0.01 times faster than './rustix20 /tmp/ftzz-1'
    1.01 ± 0.02 times faster than './rustix13 /tmp/ftzz-1'
    1.01 ± 0.02 times faster than './rustix11 /tmp/ftzz-1'
    1.01 ± 0.01 times faster than './rustix12 /tmp/ftzz-1'
    1.04 ± 0.02 times faster than './rustix10 /tmp/ftzz-1'
    1.08 ± 0.01 times faster than './rustix9 /tmp/ftzz-1'
    1.16 ± 0.02 times faster than './rustix8 /tmp/ftzz-1'
    1.30 ± 0.03 times faster than './rustix7 /tmp/ftzz-1'
    1.54 ± 0.02 times faster than './rustix6 /tmp/ftzz-1'
    2.06 ± 0.03 times faster than './rustix5 /tmp/ftzz-1'

sunfishcode

Thanks for working on this! I still need to read through the main implementation code another time, but here are a few initial review comments.