-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On a machine with many cores, snapshotting large repos is very CPU-intensive #4508
Comments
Interesting. Can't look now, but it's possible we serialize all tree updates into a single channel and it's contending when there are more threads. It would be in the working copy snapshot code somewhere.
To fix your immediate issue, you can also try enabling the Watchman fsmonitor. |
I heard (iirc when I was working on Mercurial) that it's sometimes faster to scan directory entries sequentially than splitting jobs to worker processes, which tends to lead to random access. I don't know this is the case, though. |
Not even a little bit urgent -- I was mostly bewildered at what I could possibly have broken on the big machine to make (watchman is currently broken in MacPorts, which is how I ended up here) |
I can't reproduce this on my 32 core Zen 1 machine (Linux 6.10) with I suspect two things:
I don't have a Studio but I do have a M2 Air, which coincidentally dual boots Fedora. So, if I get a chance I can see how it all shakes out on both systems, Linux vs macOS, but it's only 4P+4E, so it's not going to be as big a deal I suspect. If it turns out that some other core configuration gives big improvements we can probably make a change to the scheduling policy somehow before we use Rayon so Note that I couldn't reliably clone
|
Yes, this is a 16P+8E Mac Studio. I noticed while testing this that OS caches seem to get evicted pretty quickly; after not that many seconds, a re-run is noticeably slower. I don't understand why, but thought it was interesting. I've not figured out how to control QoS to the degree you describe, but
|
I'm seeing the same issue on my M3 Pro mac The repo I'm testing with is https://github.com/NixOS/nixpkgs/
|
This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue jj-vcs#4508, but it's slightly faster on my Linux machine (except for the "git" repo which doesn't have many directories to parallelize jobs.) The implementation is also simpler. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. This change will allow us to build a set of deleted files per directory, which will reduce the overhead to send "present" files over the channel. Benchmark: - jj-0: original - jj-1: this patch gecko-dev (large): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R ~/mirrors/gecko-dev debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 707.4 ms ± 12.3 ms [User: 2880.2 ms, System: 1939.1 ms] Range (min … max): 684.4 ms … 721.7 ms 10 runs Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 659.6 ms ± 11.8 ms [User: 2673.2 ms, System: 1882.1 ms] Range (min … max): 649.2 ms … 690.1 ms 10 runs Relative speed comparison 1.61 ± 0.05 target/release-with-debug/jj-0 -R ~/mirrors/gecko-dev debug snapshot 1.50 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot ``` linux (mid-size): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R ~/mirrors/linux debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 256.3 ms ± 6.2 ms [User: 573.0 ms, System: 459.2 ms] Range (min … max): 248.5 ms … 266.0 ms 10 runs Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 225.5 ms ± 4.2 ms [User: 563.3 ms, System: 475.8 ms] Range (min … max): 218.7 ms … 231.4 ms 10 runs Relative speed comparison 1.35 ± 0.04 target/release-with-debug/jj-0 -R ~/mirrors/linux debug snapshot 1.19 ± 0.03 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot ``` git (small, flat): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 100 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R ~/mirrors/git debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.6 ms [User: 38.8 ms, System: 40.6 ms] Range (min … max): 28.1 ms … 43.8 ms 100 runs Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 34.0 ms ± 1.2 ms [User: 33.5 ms, System: 47.5 ms] Range (min … max): 31.5 ms … 39.2 ms 100 runs Relative speed comparison 1.00 target/release-with-debug/jj-0 -R ~/mirrors/git debug snapshot 1.11 ± 0.07 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot ``` jj (small, structured): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 100 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R . debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R . debug snapshot Time (mean ± σ): 30.9 ms ± 1.6 ms [User: 16.3 ms, System: 28.0 ms] Range (min … max): 28.9 ms … 41.8 ms 100 runs Benchmark 2: target/release-with-debug/jj-1 -R . debug snapshot Time (mean ± σ): 31.2 ms ± 1.9 ms [User: 15.3 ms, System: 27.0 ms] Range (min … max): 29.0 ms … 43.3 ms 100 runs Relative speed comparison 1.00 target/release-with-debug/jj-0 -R . debug snapshot 1.01 ± 0.08 target/release-with-debug/jj-1 -R . debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
…threshold (WIP) This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue jj-vcs#4508, but it's slightly faster on my Linux machine (except for the "git" repo which doesn't have many directories to parallelize jobs.) The implementation is also simpler. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. This change will allow us to build a set of deleted files per directory, which will reduce the overhead to send "present" files over the channel. TODO: update benchmark Benchmark: - jj-0: original - jj-1: this patch gecko-dev (large): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R ~/mirrors/gecko-dev debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 707.4 ms ± 12.3 ms [User: 2880.2 ms, System: 1939.1 ms] Range (min … max): 684.4 ms … 721.7 ms 10 runs Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 659.6 ms ± 11.8 ms [User: 2673.2 ms, System: 1882.1 ms] Range (min … max): 649.2 ms … 690.1 ms 10 runs Relative speed comparison 1.61 ± 0.05 target/release-with-debug/jj-0 -R ~/mirrors/gecko-dev debug snapshot 1.50 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot ``` linux (mid-size): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R ~/mirrors/linux debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 256.3 ms ± 6.2 ms [User: 573.0 ms, System: 459.2 ms] Range (min … max): 248.5 ms … 266.0 ms 10 runs Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 225.5 ms ± 4.2 ms [User: 563.3 ms, System: 475.8 ms] Range (min … max): 218.7 ms … 231.4 ms 10 runs Relative speed comparison 1.35 ± 0.04 target/release-with-debug/jj-0 -R ~/mirrors/linux debug snapshot 1.19 ± 0.03 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot ``` git (small, flat): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 100 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R ~/mirrors/git debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.6 ms [User: 38.8 ms, System: 40.6 ms] Range (min … max): 28.1 ms … 43.8 ms 100 runs Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 34.0 ms ± 1.2 ms [User: 33.5 ms, System: 47.5 ms] Range (min … max): 31.5 ms … 39.2 ms 100 runs Relative speed comparison 1.00 target/release-with-debug/jj-0 -R ~/mirrors/git debug snapshot 1.11 ± 0.07 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot ``` jj (small, structured): ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 100 -L bin jj-0,jj-1,jj-2 \ 'target/release-with-debug/{bin} -R . debug snapshot' Benchmark 1: target/release-with-debug/jj-0 -R . debug snapshot Time (mean ± σ): 30.9 ms ± 1.6 ms [User: 16.3 ms, System: 28.0 ms] Range (min … max): 28.9 ms … 41.8 ms 100 runs Benchmark 2: target/release-with-debug/jj-1 -R . debug snapshot Time (mean ± σ): 31.2 ms ± 1.9 ms [User: 15.3 ms, System: 27.0 ms] Range (min … max): 29.0 ms … 43.3 ms 100 runs Relative speed comparison 1.00 target/release-with-debug/jj-0 -R . debug snapshot 1.01 ± 0.08 target/release-with-debug/jj-1 -R . debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
…threshold This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue jj-vcs#4508, but it's slightly faster on my Linux machine. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. Benchmark: 1. original 2. per-directory spawn (this patch) 3. per-directory deleted files (omitted) 4. shorter path comparison (omitted) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 764.9 ms ± 16.7 ms [User: 3274.7 ms, System: 2183.3 ms] Range (min … max): 731.9 ms … 814.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 710.7 ms ± 9.1 ms [User: 3070.7 ms, System: 2142.6 ms] Range (min … max): 695.9 ms … 740.1 ms 30 runs Relative speed comparison 1.89 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot 1.76 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 268.2 ms ± 11.3 ms [User: 636.6 ms, System: 518.5 ms] Range (min … max): 247.5 ms … 295.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 242.3 ms ± 3.3 ms [User: 656.8 ms, System: 538.0 ms] Range (min … max): 236.9 ms … 252.3 ms 30 runs Relative speed comparison 1.40 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot 1.27 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 201.0 ms ± 8.5 ms [User: 929.3 ms, System: 917.6 ms] Range (min … max): 170.3 ms … 218.5 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 190.7 ms ± 4.1 ms [User: 859.3 ms, System: 881.1 ms] Range (min … max): 184.6 ms … 202.4 ms 30 runs Relative speed comparison 1.24 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot 1.18 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.3 ms ± 1.1 ms [User: 40.5 ms, System: 39.4 ms] Range (min … max): 28.3 ms … 35.7 ms 50 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.1 ms [User: 33.8 ms, System: 39.0 ms] Range (min … max): 29.0 ms … 35.0 ms 50 runs Relative speed comparison 1.05 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot 1.06 ± 0.05 target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
…threshold This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue jj-vcs#4508, but it's slightly faster on my Linux machine. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. Benchmark: 1. original 2. per-directory spawn (this patch) 3. per-directory deleted files (omitted) 4. shorter path comparison (omitted) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 764.9 ms ± 16.7 ms [User: 3274.7 ms, System: 2183.3 ms] Range (min … max): 731.9 ms … 814.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 710.7 ms ± 9.1 ms [User: 3070.7 ms, System: 2142.6 ms] Range (min … max): 695.9 ms … 740.1 ms 30 runs Relative speed comparison 1.89 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot 1.76 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 268.2 ms ± 11.3 ms [User: 636.6 ms, System: 518.5 ms] Range (min … max): 247.5 ms … 295.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 242.3 ms ± 3.3 ms [User: 656.8 ms, System: 538.0 ms] Range (min … max): 236.9 ms … 252.3 ms 30 runs Relative speed comparison 1.40 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot 1.27 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 201.0 ms ± 8.5 ms [User: 929.3 ms, System: 917.6 ms] Range (min … max): 170.3 ms … 218.5 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 190.7 ms ± 4.1 ms [User: 859.3 ms, System: 881.1 ms] Range (min … max): 184.6 ms … 202.4 ms 30 runs Relative speed comparison 1.24 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot 1.18 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.3 ms ± 1.1 ms [User: 40.5 ms, System: 39.4 ms] Range (min … max): 28.3 ms … 35.7 ms 50 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.1 ms [User: 33.8 ms, System: 39.0 ms] Range (min … max): 29.0 ms … 35.0 ms 50 runs Relative speed comparison 1.05 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot 1.06 ± 0.05 target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
…threshold This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue jj-vcs#4508, but it's slightly faster on my Linux machine. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. Benchmark: 1. original 2. per-directory spawn (this patch) 3. per-directory deleted files (omitted) 4. shorter path comparison (omitted) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 764.9 ms ± 16.7 ms [User: 3274.7 ms, System: 2183.3 ms] Range (min … max): 731.9 ms … 814.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 710.7 ms ± 9.1 ms [User: 3070.7 ms, System: 2142.6 ms] Range (min … max): 695.9 ms … 740.1 ms 30 runs Relative speed comparison 1.89 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot 1.76 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 268.2 ms ± 11.3 ms [User: 636.6 ms, System: 518.5 ms] Range (min … max): 247.5 ms … 295.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 242.3 ms ± 3.3 ms [User: 656.8 ms, System: 538.0 ms] Range (min … max): 236.9 ms … 252.3 ms 30 runs Relative speed comparison 1.40 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot 1.27 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 201.0 ms ± 8.5 ms [User: 929.3 ms, System: 917.6 ms] Range (min … max): 170.3 ms … 218.5 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 190.7 ms ± 4.1 ms [User: 859.3 ms, System: 881.1 ms] Range (min … max): 184.6 ms … 202.4 ms 30 runs Relative speed comparison 1.24 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot 1.18 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.3 ms ± 1.1 ms [User: 40.5 ms, System: 39.4 ms] Range (min … max): 28.3 ms … 35.7 ms 50 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.1 ms [User: 33.8 ms, System: 39.0 ms] Range (min … max): 29.0 ms … 35.0 ms 50 runs Relative speed comparison 1.05 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot 1.06 ± 0.05 target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
…threshold This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue #4508, but it's slightly faster on my Linux machine. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. Benchmark: 1. original 2. per-directory spawn (this patch) 3. per-directory deleted files (omitted) 4. shorter path comparison (omitted) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 764.9 ms ± 16.7 ms [User: 3274.7 ms, System: 2183.3 ms] Range (min … max): 731.9 ms … 814.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 710.7 ms ± 9.1 ms [User: 3070.7 ms, System: 2142.6 ms] Range (min … max): 695.9 ms … 740.1 ms 30 runs Relative speed comparison 1.89 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot 1.76 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 268.2 ms ± 11.3 ms [User: 636.6 ms, System: 518.5 ms] Range (min … max): 247.5 ms … 295.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 242.3 ms ± 3.3 ms [User: 656.8 ms, System: 538.0 ms] Range (min … max): 236.9 ms … 252.3 ms 30 runs Relative speed comparison 1.40 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot 1.27 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 201.0 ms ± 8.5 ms [User: 929.3 ms, System: 917.6 ms] Range (min … max): 170.3 ms … 218.5 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 190.7 ms ± 4.1 ms [User: 859.3 ms, System: 881.1 ms] Range (min … max): 184.6 ms … 202.4 ms 30 runs Relative speed comparison 1.24 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot 1.18 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.3 ms ± 1.1 ms [User: 40.5 ms, System: 39.4 ms] Range (min … max): 28.3 ms … 35.7 ms 50 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.1 ms [User: 33.8 ms, System: 39.0 ms] Range (min … max): 29.0 ms … 35.0 ms 50 runs Relative speed comparison 1.05 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot 1.06 ± 0.05 target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
Description
On a machine with many cores, snapshotting large repos is very CPU-intensive
Steps to Reproduce the Problem
jj st
(or justjj
) and measure the time it takes for the command to complete.export RAYON_NUM_THREADS=4
Expected Behavior
Similar performance in both cases.
Actual Behavior
jj
's default behavior:2.28 real 0.34 user 32.26 sys
jj
limited to four threads:1.13 real 0.16 user 0.49 sys
This one took some doing and profiling to figure out, as it didn't immediately make sense that the same working copy is so much faster to work with on a much smaller machine.
Specifications
The text was updated successfully, but these errors were encountered: