Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded evaluator #10938

Draft
wants to merge 71 commits into
base: master
Choose a base branch
from

Conversation

edolstra
Copy link
Member

@edolstra edolstra commented Jun 19, 2024

Motivation

This PR makes the evaluator thread-safe. Currently, only nix flake search and nix flake show make use of multi-threaded evaluation to achieve a speedup on multicore systems.

Unlike the previous attempt at a multi-threaded evaluator, this one locks thunks to prevent them from being evaluated more than once. The life of a thunk is now:

  • On the first forceValue() call, the thunk type goes from tThunk to tPending.
  • If another thread does a forceValue() on a thunk in the tPending state, it acquires a lock to register itself as "awaiting" that value, and sets the type to tAwaited.
  • Once the first thread finished the value and its type is tAwaited, it updates the value and wakes up the threads that are waiting. If the type is tPending, it just updates the value normally.

Also, there now is a tFailed value type that stores an exception pointer to represent the case where thunk evaluation throws an exception. In that case, every thread that forces the thunk should get the same exception.

To enable multi-threaded evaluation, you need to set the NR_CORES environment variable to the number of threads to use. You can also set NIX_SHOW_THREAD_STATS=1 to get some debug statistics.

Some benchmark results on a Ryzen 5900X with 12 cores and 24 hyper-threads:

  • NR_CORES=12 GC_INITIAL_HEAP_SIZE=8G nix flake show --no-eval-cache --all-systems --json github:NixOS/nix/afdd12be5e19c0001ff3297dea544301108d298 went from 23.70s to 5.77s.
  • NR_CORES=16 GC_INITIAL_HEAP_SIZE=6G time nix search --no-eval-cache github:NixOS/nixpkgs/bf8462aeba50cc753971480f613fbae0747cffc0?narHash=sha256-bPyv7hsbtuxyL6LLKtOYL6QsmPeFWP839BZQMd3RoUg%3D ^ went from 11.82s to 3.88s.

Note: it's good to set GC_INITIAL_HEAP_SIZE to a high value because stop-the-world garbage collection is expensive.

To do:

  • Infinite recursion detection through blackholing is currently disabled.
  • More commands should be multi-threaded, in particular nix flake check.
  • We should have some auto-parallelization of single evaluations (like NixOS system configurations). One way to do this would be to evaluate all attributes of a derivation in parallel.
  • This PR make some high contention data structures (in particular the symbol table) more multi-thread friendly, but there is more that can be done.
  • The Executor class currently executes work items in random order to reduce the probability that we execute a bunch of items at the same time that all depend on the same thunk, causing all but one to be blocked. This can probably be improved.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

edolstra added 30 commits May 20, 2024 10:09
This is a mapping from paths to "resolved" paths (i.e. with
`default.nix` added, if appropriate). `fileParseCache` and
`fileEvalCache` are now keyed on the resolved path *only*.
Previously, the optimistic concurrency approach in `evalFile()` meant
that a `nix search nixpkgs ^` would do hundreds of duplicated
parsings/evaluations. Now, we reuse the thunk locking mechanism to
ensure it's done only once.
This refactoring allows the symbol table to be stored as something
other than std::strings.
This allows symbol IDs to be offsets into an arena whose base offset
never moves, and can therefore be dereferenced without any locks.
This makes it less likely that we concurrently execute tasks that
would block on a common subtask, e.g. evaluating `libfoo` and
`libfoo_variant` are likely to have common dependencies.
@RossComputerGuy
Copy link
Member

This overestimates the consumption. In the 64 core case, which takes about 10 seconds, .user + .system is only 130 seconds, or about 20% utilization.

Yeah, @tomberek, @djacu, and I were testing this last night and that was what came up to being what to graph.

The 51 threads case is remarkably fast. What's up with that?

I believe it was a failure but hyperfine didn't mark it as one. Rerunning the test for 51 cores gave similar numbers as 50 and 52.

It seems like a good idea to also check that each run gives a correct result in terms of the actual evaluation output.

Yeah, idk how nix reports errors and how hyperfine marks runs as errors. Eventually one time it did fail but adding --ignore-failures to hyperfine made hyperfine continue. I've noticed there's weird errors with string builtins sometimes and spurious errors printed a lot. This is one such error I ran into:

evaluating 'packages.aarch64-linux.nix-util'...
error (ignored): error:
       … while evaluating the attribute 'aarch64-darwin.cross.riscv64-unknown-linux-gnu.nix'
         at /nix/store/i4nv0mdcx8iifh3r71qd0pbp8al8kp1z-source/lib/attrsets.nix:984:20:
          983|     value:
          984|     { inherit name value; };
             |                    ^
          985|

       … while evaluating the attribute 'aarch64-darwin.cross.riscv64-unknown-linux-gnu'
         at /nix/store/i4nv0mdcx8iifh3r71qd0pbp8al8kp1z-source/lib/attrsets.nix:984:20:
          983|     value:
          984|     { inherit name value; };
             |                    ^
          985|

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: cannot coerce the partially applied built-in function 'concatStringsSep' to a string: «partially applied primop concatStringsSep»

This error only sometimes pops up and appears to be random, only 1 run seems to get it every time I ran the complete benchmark sequence.

@roberth
Copy link
Member

roberth commented Aug 8, 2024

That's not normal and likely caused by race conditions that this PR hasn't addressed yet.

@RossComputerGuy
Copy link
Member

Yeah, that's the assumption. Unfortunately, I haven't found a common way to reliably reproduce the error. However, I have seen it at 10, 14, 32, and maybe 51 core counts. I wonder if there's some sort of commonality between those core counts which caused a race condition.

@roberth
Copy link
Member

roberth commented Aug 8, 2024

I would expect the process to be sufficiently chaotic that the number doesn't really matter.

@djacu
Copy link
Member

djacu commented Aug 9, 2024

I did not have errors with a script setup similar to @RossComputerGuy. For reference I am running a AMD Ryzen Threadripper 2950X 16-Core Processor. I did not increase NR_CORES to the maximum amount (32) but stopped at 24.

However, when I increased the number of runs for hyperfine I started getting spurious errors.

script

hyperfine \
  --runs 10 \
  --parameter-scan num_threads 1 24 \
  --export-json output.json \
  --show-output \
  "env NR_CORES={num_threads} GC_INITIAL_HEAP_SIZE=17G /nix/store/g4yy6s999kdbwqcjdb26rvnw18njc986-nix-2.24.0pre20240726_67ff326/bin/nix flake show --no-eval-cache --all-systems --json github:NixOS/nix/afdd12be5e19c0001ff3297dea544301108d298"

errors

SPURIOUS 0x7f7659435200
SPURIOUS 0x7f7659435200
SPURIOUS 0x7f7659435200
SPURIOUS 0x7f7659435200
SPURIOUS 0x7f77e7cd63a0
SPURIOUS 0x7f7659418700
error (ignored): error:
       … in the left operand of the update (//) operator
         at /nix/store/8ghgpja4h0rpxhhcldv94hzsr923bl8n-source/tests/nixos/default.nix:19:5:
           18|     })
           19|     // {
             |     ^
           20|       # allow running tests against older nix versions via `nix eval --apply`while evaluating the attribute 'value'
         at /nix/store/i4nv0mdcx8iifh3r71qd0pbp8al8kp1z-source/lib/modules.nix:809:9:
          808|     in warnDeprecation opt //
          809|       { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
             |         ^
          810|         inherit (res.defsFinal') highestPrio;while evaluating the option `result':

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: expected a list but found the partially applied built-in function 'map': «partially applied primop map»

@djacu
Copy link
Member

djacu commented Aug 9, 2024

I see lines like SPURIOUS 0x7f7659435200 scattered throughout the output with different values of NR_CORES. At 9 cores it finally threw the error mentioned above (#10938 (comment)). I do have the data up to that point and it does look promising.

$ cat output.json | jq -r '.results | .[] | [.parameters.num_threads, .median] | @tsv' | uplot lineplot
      ┌─────────────────────────────────────────────────┐ 
   40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠑⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠘⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠈⢆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠈⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠑⠢⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠢⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠢⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠑⠢⠤⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠉⠒⠒⠢⠤⠤⠤⠤⢄⣀⣀⡀⠀⠀⠀⠀⠀⠀│ 
   10 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠉⠑⠒⠒⠒│ 
      └─────────────────────────────────────────────────┘ 
      1                                                 9
$ cat output.json | jq -r '.results | .[] | [.parameters.num_threads, .median] | @tsv' | uplot barplot
     ┌                                        ┐ 
   1 ┤■■■■■■■■■■■■■■■■■■■■ 32.359621253680004   
   2 ┤■■■■■■■■■■■■■■■ 23.72844723518            
   3 ┤■■■■■■■■■■■■ 19.23701058818               
   4 ┤■■■■■■■■■■ 15.88976078368                 
   5 ┤■■■■■■■■■ 14.07459731268                  
   6 ┤■■■■■■■■ 13.020973145180001               
   7 ┤■■■■■■■■ 12.56286847868                   
   8 ┤■■■■■■■ 11.62728899418                    
   9 ┤■■■■■■■ 11.13275702618                    
     └                                        ┘ 

output.json

@djacu
Copy link
Member

djacu commented Aug 9, 2024

The 51 threads case is remarkably fast. What's up with that? 🛸
It seems like a good idea to also check that each run gives a correct result in terms of the actual evaluation output.

I found that 2 runs isn't sufficient to produce statistically meaningful results and occasionally a there would be runs that have wildly different times. This is why I tried to go for 10 runs. It would be nice if the output data could output system and user for each run instead of what I assume is the mean for all the runs. Here is a similar output to what you generated @roberth but using the 10 run data from above.

$ cat process.sh 
cat output.json \
| jq -r '
  .results 
  | .[0].user as $first_user 
  | .[0].system as $first_system 
  | .[]
  | [.parameters.num_threads, (.user + .system) / ($first_user + $first_system)]
  | @tsv
' \
| uplot lineplot
$ ./process.sh 
     ┌─────────────────────────────────────────────────┐ 
   3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡠│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣀⠤⠤⠔⠊⠉⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡠⠤⠒⠊⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠔⠒⠊⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠤⠒⠊⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⣀⠤⠒⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⡠⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   1 │⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     └─────────────────────────────────────────────────┘ 
     1                                                 9

switched to barplot

     ┌                                        ┐
   1 ┤■■■■■■■■■■ 1.0
   2 ┤■■■■■■■■■■■■■ 1.339834980540174
   3 ┤■■■■■■■■■■■■■■■ 1.5154225766675506
   4 ┤■■■■■■■■■■■■■■■■ 1.6041827253996215
   5 ┤■■■■■■■■■■■■■■■■■ 1.7081109410792572
   6 ┤■■■■■■■■■■■■■■■■■■ 1.8005707699572884
   7 ┤■■■■■■■■■■■■■■■■■■■ 1.9191750807373242
   8 ┤■■■■■■■■■■■■■■■■■■■■ 1.9985132513328854
   9 ┤■■■■■■■■■■■■■■■■■■■■■ 2.121231560486825
     └                                        ┘

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Aug 12, 2024
We should never call reset() on a value (such as vRes) than can be
seen by another thread.

This was causing random failures about 'partially applied built-in
function' etc.
@edolstra
Copy link
Member Author

@djacu @RossComputerGuy The random "partially applied built-in function" failures should be fixed now (839aec2).

@RossComputerGuy
Copy link
Member

The random "partially applied built-in function" failures should be fixed now

Sweet, thank you for fixing that. I'll take a look tonight.

@djacu
Copy link
Member

djacu commented Aug 14, 2024

@edolstra updated numbers
3 runs/iteration
1-24 threads
0 errors

$ cat bench.sh 
hyperfine \
  --runs 3 \
  --parameter-scan num_threads 1 24 \
  --export-json output-2024-08-13-runs3-threads1-24.json \
  --show-output \
  "env NR_CORES={num_threads} GC_INITIAL_HEAP_SIZE=17G /nix/store/cvfbjr9kw0piqd8w740z9s2aplr2xbsp-nix-2.25.0pre20240813_d36ea2e/bin/nix flake show --no-eval-cache --all-systems --json github:NixOS/nix/afdd12be5e19c0001ff3297dea544301108d298"
$ cat process.sh 
cat $1 \
| jq -r '
  .results 
  | .[0].user as $first_user 
  | .[0].system as $first_system 
  | .[]
  | [.parameters.num_threads, (.user + .system) / ($first_user + $first_system)]
  | @tsv
' \
| uplot $2
$ ./process.sh output-2024-08-13-runs3-threads1-24.json lineplot
     ┌────────────────────────────────────────┐ 
   4 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⢄⡀⣀⣀⡠⠊⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⠢⠤⠒⠁⠀⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⡔⠒⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡜⠈⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠚⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⠀⠀⡠⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⠀⠀⡤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⠀⡠⠎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     │⠀⠀⡜⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   1 │⠀⣰⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
     └────────────────────────────────────────┘ 
     0                                       30
$ ./process.sh output-2024-08-13-runs3-threads1-24.json barplot
      ┌                                        ┐ 
    1 ┤■■■■■■ 1.0                                
    2 ┤■■■■■■■■ 1.3639814060404196               
    3 ┤■■■■■■■■■ 1.480349952258833               
    4 ┤■■■■■■■■■■ 1.6633418910550544             
    5 ┤■■■■■■■■■■■ 1.7539164370983664            
    6 ┤■■■■■■■■■■■■ 1.8979070454103046           
    7 ┤■■■■■■■■■■■■ 2.029101000488091            
    8 ┤■■■■■■■■■■■■■ 2.131390098817159           
    9 ┤■■■■■■■■■■■■■■■ 2.404445444041322         
   10 ┤■■■■■■■■■■■■■■ 2.3298078501324904         
   11 ┤■■■■■■■■■■■■■■■ 2.5203714645696014        
   12 ┤■■■■■■■■■■■■■■■ 2.507551812771401         
   13 ┤■■■■■■■■■■■■■■■■ 2.6150870825414865       
   14 ┤■■■■■■■■■■■■■■■■■ 2.744969656720737       
   15 ┤■■■■■■■■■■■■■■■■■■ 2.904456345800291      
   16 ┤■■■■■■■■■■■■■■■■■ 2.8652417233012377      
   17 ┤■■■■■■■■■■■■■■■■■■ 2.941829174340919      
   18 ┤■■■■■■■■■■■■■■■■■■■ 3.0719492885179607    
   19 ┤■■■■■■■■■■■■■■■■■■ 2.9945109021771774     
   20 ┤■■■■■■■■■■■■■■■■■■ 3.0065581585391272     
   21 ┤■■■■■■■■■■■■■■■■■■ 3.0109496232847355     
   22 ┤■■■■■■■■■■■■■■■■■■■ 3.1402539594223167    
   23 ┤■■■■■■■■■■■■■■■■■■■ 3.1824514771685575    
   24 ┤■■■■■■■■■■■■■■■■■■■■ 3.2984812341776344   
      └                                        ┘ 
$ cat process1.sh 
cat $1 \
| jq -r '
  .results 
  | .[]
  | [.parameters.num_threads, .median]
  | @tsv
' \
| uplot $2
$ ./process1.sh output-2024-08-13-runs3-threads1-24.json lineplot
      ┌────────────────────────────────────────┐ 
   40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢱⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠘⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⢇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠘⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⢱⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠣⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠈⠢⠤⢄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠑⠒⠒⠤⠤⠤⠤⠤⣀⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠑⠒⠒⠒⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
    0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      0                                       30
$ ./process1.sh output-2024-08-13-runs3-threads1-24.json barplot
      ┌                                        ┐ 
    1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■ 32.2092498448   
    2 ┤■■■■■■■■■■■■■■■■■■ 23.3338981978          
    3 ┤■■■■■■■■■■■■■■ 18.2908208418              
    4 ┤■■■■■■■■■■■■ 16.075638893799997           
    5 ┤■■■■■■■■■■■ 14.2880983908                 
    6 ┤■■■■■■■■■■■ 14.1533989558                 
    7 ┤■■■■■■■■■■■ 13.6315487518                 
    8 ┤■■■■■■■■■■ 12.5929080038                  
    9 ┤■■■■■■■■■■ 12.3133236468                  
   10 ┤■■■■■■■■■ 11.6892627738                   
   11 ┤■■■■■■■■■ 11.3719563868                   
   12 ┤■■■■■■■■■ 11.3712700118                   
   13 ┤■■■■■■■■■ 11.5064187448                   
   14 ┤■■■■■■■■■ 11.170618042800001              
   15 ┤■■■■■■■■ 10.7183057678                    
   16 ┤■■■■■■■■ 10.8838255428                    
   17 ┤■■■■■■■■ 10.5056841208                    
   18 ┤■■■■■■■■ 10.2210227558                    
   19 ┤■■■■■■■■ 10.4710754908                    
   20 ┤■■■■■■■■ 10.2478243678                    
   21 ┤■■■■■■■■ 10.0193668008                    
   22 ┤■■■■■■■■ 9.9365577298                     
   23 ┤■■■■■■■■ 9.8233886288                     
   24 ┤■■■■■■■ 9.4230639278                      
      └                                        ┘ 

output-2024-08-13-runs3-threads1-24.json

@RossComputerGuy
Copy link
Member

Ran with the same exact setup as before except --ignore-failures was removed.

outputs.json

$ cat output.json | jq -r '
  .results
  | .[0].user as $first_user
  | .[0].system as $first_system
  | .[]
  | [.parameters.num_threads, (.user + .system) / ($first_user + $first_system)]
  | @tsv
' \
| uplot barplot
      ┌                                        ┐
    1 ┤■■■■■ 1.0
    2 ┤■■■■■ 1.0742437880718498
    3 ┤■■■■■■ 1.1183476734942377
    4 ┤■■■■■■ 1.1480830061941265
    5 ┤■■■■■■ 1.1951060353515446
    6 ┤■■■■■■ 1.2463317894241261
    7 ┤■■■■■■■ 1.3569109849655925
    8 ┤■■■■■■■ 1.4240509543690296
    9 ┤■■■■■■■ 1.4800429131384885
   10 ┤■■■■■■■■ 1.619910014501121
   11 ┤■■■■■■■■ 1.6201380015193159
   12 ┤■■■■■■■■ 1.6599253300385328
   13 ┤■■■■■■■■■ 1.7008236813736564
   14 ┤■■■■■■■■■ 1.7818203835464186
   15 ┤■■■■■■■■■■ 1.9010991184423898
   16 ┤■■■■■■■■■■ 1.935250249295015
   17 ┤■■■■■■■■■■■ 2.0774385433183906
   18 ┤■■■■■■■■■■ 2.005914070473358
   19 ┤■■■■■■■■■■■ 2.1543125542759656
   20 ┤■■■■■■■■■■■ 2.094546009159931
   21 ┤■■■■■■■■■■ 2.049633643382846
   22 ┤■■■■■■■■■■■ 2.106576840230168
   23 ┤■■■■■■■■■■■ 2.213021136254079
   24 ┤■■■■■■■■■■■ 2.1888102420867597
   25 ┤■■■■■■■■■■■■■ 2.4744252796291466
   26 ┤■■■■■■■■■■■■ 2.2838692260756557
   27 ┤■■■■■■■■■■■■ 2.388376453510452
   28 ┤■■■■■■■■■■■■■ 2.5887737044010453
   29 ┤■■■■■■■■■■■■ 2.4440904883247994
   30 ┤■■■■■■■■■■■■■■ 2.6894382105095698
   31 ┤■■■■■■■■■■■■■ 2.5861891650635402
   32 ┤■■■■■■■■■■■■■ 2.5437229285101277
   33 ┤■■■■■■■■■■■■■■ 2.8384032833748694
   34 ┤■■■■■■■■■■■■■■ 2.754690940580082
   35 ┤■■■■■■■■■■■■■■ 2.863039314510525
   36 ┤■■■■■■■■■■■■■■ 2.697215881704366
   37 ┤■■■■■■■■■■■■■■■ 3.005053488799141
   38 ┤■■■■■■■■■■■■■■ 2.742781546449162
   39 ┤■■■■■■■■■■■■■■■■■■ 3.5338047959859553
   40 ┤■■■■■■■■■■■■■■■ 2.936502121193209
   41 ┤■■■■■■■■■■■■■■■■ 3.0931038842621277
   42 ┤■■■■■■■■■■■■■■■■ 3.072554403997825
   43 ┤■■■■■■■■■■■■■■■ 2.9806491596667444
   44 ┤■■■■■■■■■■■■■■■■ 3.1132977893074165
   45 ┤■■■■■■■■■■■■■■■■ 3.210063378210523
   46 ┤■■■■■■■■■■■■■■■■■ 3.2633512047356024
   47 ┤■■■■■■■■■■■■■■■■ 3.2375432707932914
   48 ┤■■■■■■■■■■■■■■■ 3.036618909286361
   49 ┤■■■■■■■■■■■■■■■■■■■ 3.71023827997656
   50 ┤■■■■■■■■■■■■■■■ 2.9999138473427407
   51 ┤■■■■■■■■■■■■■■■■ 3.175197220413717
   52 ┤■■■■■■■■■■■■■■■■■ 3.438028187255351
   53 ┤■■■■■■■■■■■■■■■■■ 3.4290124165272804
   54 ┤■■■■■■■■■■■■■■■■■■■ 3.735650675690233
   55 ┤■■■■■■■■■■■■■■■■■■ 3.597047732596688
   56 ┤■■■■■■■■■■■■■■■■ 3.141657271904648
   57 ┤■■■■■■■■■■■■■■■■■ 3.34309607808246
   58 ┤■■■■■■■■■■■■■■■■■■ 3.616016648237155
   59 ┤■■■■■■■■■■■■■■■■■■ 3.5622579473551563
   60 ┤■■■■■■■■■■■■■■■■■■ 3.529534568715364
   61 ┤■■■■■■■■■■■■■■■■■■■■■ 4.151842175352685
   62 ┤■■■■■■■■■■■■■■■■■■■■ 4.052837446502432
   63 ┤■■■■■■■■■■■■■■■■■■■ 3.6751877189356112
   64 ┤■■■■■■■■■■■■■■■■■■■ 3.6830928706505004
      └                                        ┘

@djacu
Copy link
Member

djacu commented Aug 14, 2024

Better illustration of the performance increase per additional thread

$ cat process_time.sh 
cat $1 \
| jq -r '
  .results 
  | [.[].median] as $median
  | [ $median[:-1], $median[1:] ]
  | transpose
  | map( .[0] - .[1] )
' \
| uplot $2 \
--title 'Absolute Time Improvement'

cat $1 \
| jq -r '
  .results 
  | [.[].median] as $median
  | [ $median[:-1], $median[1:] ]
  | transpose
  | map( ( .[0] - .[1] ) / .[0] * 100 )
' \
| uplot $2 \
--title '% Time Improvement'
$ ./process_time.sh output-2024-08-13-runs3-threads1-24.json lineplot
              Absolute Time Improvement
      ┌────────────────────────────────────────┐ 
    9 │⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⢸⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⡇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⡇⢇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⡇⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⡇⠸⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⡇⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢸⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢸⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢸⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢸⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢸⠀⠀⠀⠀⠘⡄⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⡇⠀⠀⠀⠀⠀⢣⠀⡔⠉⢢⢀⠤⡀⠀⠀⠀⢀⣀⡀⠀⢀⡀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠤⠧⠤⠤⠤⠤⠤⠬⠮⠤⠤⠤⠥⠤⠬⠵⠦⠴⠥⠤⠵⠴⠥⠬⠧⡤⠮⠭⠭⠶⠶⠭⠬⠦⠤⠤⠤⠤⠤⠤│ 
   -1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      0                                       30
                  % Time Improvement
       ┌────────────────────────────────────────┐ 
    30 │⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⢸⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡎⢱⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡇⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡇⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡇⠀⠸⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⢰⠁⠀⠀⢇⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⢸⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⢸⠀⠀⠀⠀⠀⡇⠀⠀⣴⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⢸⠀⠀⠀⠀⠀⢣⠀⡰⠁⢇⢠⠢⡀⠀⠀⠀⠀⣀⡄⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡄⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⡇⠀⠀⠀⠀⠀⢸⡜⠀⠀⠘⠃⠀⠙⢄⠀⢀⠎⠀⢣⠀⡎⠉⢇⠀⡔⠒⠢⣀⣠⠊⠱⡀⠀⠀⠀⠀⠀⠀│ 
       │⠒⠓⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠚⠓⠞⠒⠒⠚⡾⠒⠒⠚⣶⠓⠒⠒⠒⠒⠒⠒⠓⠒⠒⠒⠒⠒⠒│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   -10 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       └────────────────────────────────────────┘ 
       0                                       30

@roberth
Copy link
Member

roberth commented Dec 6, 2024

  • Infinite recursion detection through blackholing is currently disabled.

Relevant paper with a design that implements this.

It's quite a different architecture, but maybe we need that? Specifically, they check blackholes on the stack during GC.

I guess the alternative is to add more bookkeeping so that perhaps another thread could be tasked with periodically checking that no thread is deadlocked, and if they are, insert exception thunks, unblock them, and let them unwind their stacks.
That hinges on the overhead of the added bookkeeping.

Fwiw, they have

lock-free thunks

It sure would be nice not to regress single-threaded performance, and have a boost in multi-threaded as well.

I wouldn't oppose taking control of the stack instead of delegating all of that completely to bdwgc. Doing so would open the door to other benefits, such as more efficient --debugger, printing warning traces with aborting, and a more reliable and/or higher max-call-depth. (OT: but not necessarily tail calls, because that's more or less s/if/while in forceValue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c api Nix as a C library with a stable interface fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command repl The Read Eval Print Loop, "nix repl" command and debugger store Issues and pull requests concerning the Nix store with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants