-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forceValue: make pos mandatory #5676
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I didn’t check whether all the calls to determinePos
were justified, but in any case that’s a big improvement and a good thing for the future
src/libexpr/eval-inline.hh
Outdated
@@ -53,7 +53,7 @@ void EvalState::forceValue(Value & v, const Pos & pos) | |||
|
|||
inline void EvalState::forceAttrs(Value & v) | |||
{ | |||
forceValue(v); | |||
forceValue(v, v.determinePos(noPos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the performance impact of these non-lazy calls to determinePos()
?
BTW determinePos()
is currently defined in eval.cc
, but if we're going to call it everywhere (including on some hot paths), it should probably be moved to value.hh
so that it can be inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kamadorueda it would be really cool to have this merged before it gets conflicts 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some steps done:
- I tried to inline determinePos, however it uses struct
Pos
, classBindings
and others, which by the time determinePos needs to be declared in order for the compiler to inline it, are not defined, so the compiler complains that we do not know their size:
which means that we cannot inline it
- So I tried benchmarking the performance of this PR (top) vs nix-master (bottom) and there is certainly a difference:
Aparently 0.05 seconds every 6 or 7 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edolstra
I added a few commits to slim down the critical path,
closed my browser, code editor, only left open the terminal,
and increased the bench-marking rounds to 100 (~10 minutes running) to have more data points
this is the result:
where there is a difference in execution time (rejected) the difference is negative (so faster),
I think it should not be becoming faster even if data tells so,
so I would be happy if someone help me confirm the results on other machine
nix-master $ clear; PAGER=cat benchmark -t 100 -f ../prev -- ./outputs/out/bin/nix-env -qaf ../nixpkgs
nix-pr $ clear; PAGER=cat benchmark -t 100 -f ../new -c ../prev -- ./outputs/out/bin/nix-env -qaf ../nixpkgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the flake to include it but I'm still seeing the same behavior.
$ git diff
diff --git a/flake.lock b/flake.lock
index 4f89d76..623d5c8 100644
--- a/flake.lock
+++ b/flake.lock
@@ -17,11 +17,11 @@
},
"nixpkgs": {
"locked": {
- "lastModified": 1642786796,
- "narHash": "sha256-FsKuowd1SZGNhsX+YeNPlodp3mecyVbLPxCcUMbkLU8=",
+ "lastModified": 1643948526,
+ "narHash": "sha256-XmRNI2zmR0iTX7LUJy93MG6GZoMHy1nlMNcp31XHi8w=",
"owner": "nixos",
"repo": "nixpkgs",
- "rev": "1281c5be8ae76b231e181cde812b7b6cdc3028f8",
+ "rev": "d67ad28fc301305baeeb364a04f0565c5f5118c8",
"type": "github"
},
"original": {
diff --git a/flake.nix b/flake.nix
index a55646b..4a0de74 100644
--- a/flake.nix
+++ b/flake.nix
@@ -22,6 +22,7 @@
StatisticsTTest
UnixGetrusage
FileSlurp
+ pkgs.util-linux
];
unpackPhase = "true";
$ ./result/bin/benchmark -t 3 -- sleep 1
run 0...
taskset: failed to set pid 6481's affinity: Invalid argument
run 1...
taskset: failed to set pid 6483's affinity: Invalid argument
run 2...
taskset: failed to set pid 6485's affinity: Invalid argument
note: discarding 10% highest outliers
maximum RSS: median = 10546.0000 mean = 10546.0000 stddev = 2.8284 min = 10544.0000 max = 10548.0000
soft page faults: median = 103.0000 mean = 103.0000 stddev = 2.8284 min = 101.0000 max = 105.0000
system CPU time: median = 0.0007 mean = 0.0007 stddev = 0.0000 min = 0.0007 max = 0.0008
elapsed time: median = 0.0009 mean = 0.0009 stddev = 0.0000 min = 0.0009 max = 0.0010
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnorBaker I'm able to reproduce it now, benchmark is trying to attach to CPU 6, but you maybe don't have that much CPU on the cloud instance:
(my home computer have 16)
[kamadorueda@nixos:/data]$ taskset -c 16 sleep 1
taskset: failed to set pid 3509's affinity: Invalid argument
[kamadorueda@nixos:/data]$ taskset -c 15 sleep 1
(ok)
[kamadorueda@nixos:/data]$ taskset -c 0 sleep 1
(ok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-c 6
attaches to the 7th cpu
https://github.com/edolstra/benchmark/blob/096970f0ac3e74405db124902c39ecc8c0394713/benchmark.pl#L39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sorting that out for me!
I used an m6g.metal
instance and wasn't able to reproduce the taskset
error (like you said, I think because of the core count).
$ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: ARM
Model name: Neoverse-N1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 1
Stepping: r3p1
BogoMIPS: 243.75
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
Caches (sum of all):
L1d: 4 MiB (64 instances)
L1i: 4 MiB (64 instances)
L2: 64 MiB (64 instances)
L3: 32 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-63
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
$ lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x000000007fffffff 2G online yes 0-1
0x0000000100000000-0x0000003fffffffff 252G online yes 4-255
0x0000008100000000-0x000000817fffffff 2G online yes 516-517
Memory block size: 1G
Total online memory: 256G
Total offline memory: 0B
I think part of the issue was that when I specified the path to clone Nixpkgs to, I forgot to make it a relative path, and it cloned it to the root directory (so it was at /nixpkgs
instead of /root/nixpkgs
). It was able to do that since on the EC2 instance I'm using I haven't set up any users, so it defaults to root.
Anyway, probably not relevant any more, but here's what I observed on the m6g.metal instance:
$ PAGER=cat ./benchmark/bin/benchmark -t 20 -f ./prev -- ./nix-pr/bin/nix-env -qaf ./nixpkgs
# Output snipped
maximum RSS: median = 2023162.0000 mean = 1873273.7778 stddev = 436250.4782 min = 674012.0000 max = 2023432.0000
soft page faults: median = 506947.0000 mean = 469244.7778 stddev = 109796.7131 min = 167440.0000 max = 507034.0000
system CPU time: median = 0.9361 mean = 0.9208 stddev = 0.0556 min = 0.8147 max = 0.9861
user CPU time: median = 10.3891 mean = 9.5182 stddev = 2.5623 min = 2.4753 max = 10.4820
elapsed time: median = 11.3407 mean = 10.8687 stddev = 1.3863 min = 7.0558 max = 11.4138
$ PAGER=cat ./benchmark/bin/benchmark -t 20 -f ./new -c ./prev -- ./nix-master/bin/nix-env -qaf ./nixpkgs
# Output snipped
maximum RSS: median = 2023060.0000 mean = 1873229.1111 stddev = 436261.1472 min = 674060.0000 max = 2023512.0000 [not rejected, p=0.99976, Δ=-44.66667±436546.64994]
soft page faults: median = 506948.5000 mean = 469238.2222 stddev = 109793.4156 min = 167443.0000 max = 507037.0000 [not rejected, p=0.99986, Δ=-6.55556±109868.26108]
system CPU time: median = 0.9401 mean = 0.9402 stddev = 0.0398 min = 0.8621 max = 1.0020 [not rejected, p=0.23556, Δ=0.01946±0.04839]
user CPU time: median = 10.3588 mean = 9.4907 stddev = 2.5674 min = 2.4120 max = 10.4415 [not rejected, p=0.97448, Δ=-0.02755±2.56658]
elapsed time: median = 11.3278 mean = 10.8647 stddev = 1.3801 min = 7.0591 max = 11.3926 [not rejected, p=0.99318, Δ=-0.00397±1.38413]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is very nice, those 5 "not rejected" mean this PR did not introduced any relevant performance impact, at least from the point of view of evaluating nixpkgs
Cool! Thank you
4a639c4
to
f1268a7
Compare
- Make passing the position to `forceValue` mandatory, this way we remember people that the position is important for better error messages - Add pos to all `forceValue` calls
60f3bed
to
50efc54
Compare
Thanks, merged it. Because I did still observe a small slowdown, I've made most calls to |
forceValue
mandatory,this way we remember people that the position is
important for better error messages
forceValue
callsMentions #3505