Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forceValue: make pos mandatory #5676

Merged
merged 5 commits into from
Feb 3, 2022
Merged

Conversation

kamadorueda
Copy link
Member

@kamadorueda kamadorueda commented Nov 27, 2021

  • Make passing the position to forceValue mandatory,
    this way we remember people that the position is
    important for better error messages
  • Add pos to all forceValue calls

Mentions #3505

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

I didn’t check whether all the calls to determinePos were justified, but in any case that’s a big improvement and a good thing for the future

@@ -53,7 +53,7 @@ void EvalState::forceValue(Value & v, const Pos & pos)

inline void EvalState::forceAttrs(Value & v)
{
forceValue(v);
forceValue(v, v.determinePos(noPos));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the performance impact of these non-lazy calls to determinePos()?

BTW determinePos() is currently defined in eval.cc, but if we're going to call it everywhere (including on some hot paths), it should probably be moved to value.hh so that it can be inlined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamadorueda it would be really cool to have this merged before it gets conflicts 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some steps done:

  1. I tried to inline determinePos, however it uses struct Pos, class Bindings and others, which by the time determinePos needs to be declared in order for the compiler to inline it, are not defined, so the compiler complains that we do not know their size:

image

which means that we cannot inline it

  1. So I tried benchmarking the performance of this PR (top) vs nix-master (bottom) and there is certainly a difference:

Screenshot from 2022-01-21 09-33-25

Aparently 0.05 seconds every 6 or 7 seconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edolstra
I added a few commits to slim down the critical path,
closed my browser, code editor, only left open the terminal,
and increased the bench-marking rounds to 100 (~10 minutes running) to have more data points

this is the result:

image

where there is a difference in execution time (rejected) the difference is negative (so faster),

I think it should not be becoming faster even if data tells so,
so I would be happy if someone help me confirm the results on other machine

nix-master $ clear; PAGER=cat benchmark -t 100 -f ../prev -- ./outputs/out/bin/nix-env -qaf ../nixpkgs
nix-pr $ clear; PAGER=cat benchmark -t 100 -f ../new -c ../prev -- ./outputs/out/bin/nix-env -qaf ../nixpkgs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the flake to include it but I'm still seeing the same behavior.

$ git diff
diff --git a/flake.lock b/flake.lock
index 4f89d76..623d5c8 100644
--- a/flake.lock
+++ b/flake.lock
@@ -17,11 +17,11 @@
     },
     "nixpkgs": {
       "locked": {
-        "lastModified": 1642786796,
-        "narHash": "sha256-FsKuowd1SZGNhsX+YeNPlodp3mecyVbLPxCcUMbkLU8=",
+        "lastModified": 1643948526,
+        "narHash": "sha256-XmRNI2zmR0iTX7LUJy93MG6GZoMHy1nlMNcp31XHi8w=",
         "owner": "nixos",
         "repo": "nixpkgs",
-        "rev": "1281c5be8ae76b231e181cde812b7b6cdc3028f8",
+        "rev": "d67ad28fc301305baeeb364a04f0565c5f5118c8",
         "type": "github"
       },
       "original": {
diff --git a/flake.nix b/flake.nix
index a55646b..4a0de74 100644
--- a/flake.nix
+++ b/flake.nix
@@ -22,6 +22,7 @@
               StatisticsTTest
               UnixGetrusage
               FileSlurp
+              pkgs.util-linux
             ];
 
             unpackPhase = "true";

$ ./result/bin/benchmark -t 3 -- sleep 1
run 0...
taskset: failed to set pid 6481's affinity: Invalid argument
run 1...
taskset: failed to set pid 6483's affinity: Invalid argument
run 2...
taskset: failed to set pid 6485's affinity: Invalid argument
note: discarding 10% highest outliers
maximum RSS:        median =  10546.0000  mean =  10546.0000  stddev =      2.8284  min =  10544.0000  max =  10548.0000
soft page faults:   median =    103.0000  mean =    103.0000  stddev =      2.8284  min =    101.0000  max =    105.0000
system CPU time:    median =      0.0007  mean =      0.0007  stddev =      0.0000  min =      0.0007  max =      0.0008
elapsed time:       median =      0.0009  mean =      0.0009  stddev =      0.0000  min =      0.0009  max =      0.0010

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorBaker I'm able to reproduce it now, benchmark is trying to attach to CPU 6, but you maybe don't have that much CPU on the cloud instance:

(my home computer have 16)

[kamadorueda@nixos:/data]$ taskset -c 16 sleep 1
taskset: failed to set pid 3509's affinity: Invalid argument

[kamadorueda@nixos:/data]$ taskset -c 15 sleep 1
(ok)

[kamadorueda@nixos:/data]$ taskset -c 0 sleep 1
(ok)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sorting that out for me!

I used an m6g.metal instance and wasn't able to reproduce the taskset error (like you said, I think because of the core count).

$ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 64
  On-line CPU(s) list:  0-63
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 64
    Socket(s):          1
    Stepping:           r3p1
    BogoMIPS:           243.75
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
Caches (sum of all):    
  L1d:                  4 MiB (64 instances)
  L1i:                  4 MiB (64 instances)
  L2:                   64 MiB (64 instances)
  L3:                   32 MiB (1 instance)
NUMA:                   
  NUMA node(s):         1
  NUMA node0 CPU(s):    0-63
Vulnerabilities:        
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected

$ lsmem
RANGE                                  SIZE  STATE REMOVABLE   BLOCK
0x0000000000000000-0x000000007fffffff    2G online       yes     0-1
0x0000000100000000-0x0000003fffffffff  252G online       yes   4-255
0x0000008100000000-0x000000817fffffff    2G online       yes 516-517

Memory block size:         1G
Total online memory:     256G
Total offline memory:      0B

I think part of the issue was that when I specified the path to clone Nixpkgs to, I forgot to make it a relative path, and it cloned it to the root directory (so it was at /nixpkgs instead of /root/nixpkgs). It was able to do that since on the EC2 instance I'm using I haven't set up any users, so it defaults to root.

Anyway, probably not relevant any more, but here's what I observed on the m6g.metal instance:

$ PAGER=cat ./benchmark/bin/benchmark -t 20 -f ./prev -- ./nix-pr/bin/nix-env -qaf ./nixpkgs
# Output snipped
maximum RSS:        median = 2023162.0000  mean = 1873273.7778  stddev = 436250.4782  min = 674012.0000  max = 2023432.0000
soft page faults:   median = 506947.0000  mean = 469244.7778  stddev = 109796.7131  min = 167440.0000  max = 507034.0000
system CPU time:    median =      0.9361  mean =      0.9208  stddev =      0.0556  min =      0.8147  max =      0.9861
user CPU time:      median =     10.3891  mean =      9.5182  stddev =      2.5623  min =      2.4753  max =     10.4820
elapsed time:       median =     11.3407  mean =     10.8687  stddev =      1.3863  min =      7.0558  max =     11.4138
$ PAGER=cat ./benchmark/bin/benchmark -t 20 -f ./new -c ./prev -- ./nix-master/bin/nix-env -qaf ./nixpkgs
# Output snipped
maximum RSS:        median = 2023060.0000  mean = 1873229.1111  stddev = 436261.1472  min = 674060.0000  max = 2023512.0000  [not rejected, p=0.99976, Δ=-44.66667±436546.64994]
soft page faults:   median = 506948.5000  mean = 469238.2222  stddev = 109793.4156  min = 167443.0000  max = 507037.0000  [not rejected, p=0.99986, Δ=-6.55556±109868.26108]
system CPU time:    median =      0.9401  mean =      0.9402  stddev =      0.0398  min =      0.8621  max =      1.0020  [not rejected, p=0.23556, Δ=0.01946±0.04839]
user CPU time:      median =     10.3588  mean =      9.4907  stddev =      2.5674  min =      2.4120  max =     10.4415  [not rejected, p=0.97448, Δ=-0.02755±2.56658]
elapsed time:       median =     11.3278  mean =     10.8647  stddev =      1.3801  min =      7.0591  max =     11.3926  [not rejected, p=0.99318, Δ=-0.00397±1.38413]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very nice, those 5 "not rejected" mean this PR did not introduced any relevant performance impact, at least from the point of view of evaluating nixpkgs

Cool! Thank you

- Make passing the position to `forceValue` mandatory,
  this way we remember people that the position is
  important for better error messages
- Add pos to all `forceValue` calls
@edolstra edolstra added this to the nix-2.7 milestone Feb 3, 2022
@edolstra edolstra self-assigned this Feb 3, 2022
@edolstra edolstra merged commit 4c755c3 into NixOS:master Feb 3, 2022
@edolstra
Copy link
Member

edolstra commented Feb 3, 2022

Thanks, merged it. Because I did still observe a small slowdown, I've made most calls to determinePos() lazy by passing them as a lambda to forceValue() (bd383d1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants