systems/architecture: bump default architecture to x86-64-v2 #202526

SuperSandro2000 · 2022-11-23T14:03:05Z

Compile everything by default with SSE4.2 to gain free performance.
This also removes support for any CPU older than ~~westmere~~ Nehalem.
Nehalem was chosen because the bootstrap gcc is to old to know westmere.

TODO:

Do we want to enable this?
Do all the hydra builders support this?
Write a changelog entry

Description of changes

Things done

SuperSandro2000 · 2022-11-23T14:05:44Z

Just drafting so that no one merges it by accident but I still want feedback and discuss this.

K900 · 2022-11-23T14:06:20Z

RHEL is on x86_64-v2 by default now: https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level

K900 · 2022-11-23T14:07:14Z

Also we should probably add the x86_64-vX feature levels instead of using uarch names, because we want the featureset without the uarch specific codegen.

NickCao · 2022-11-23T15:38:48Z

The baseline is mostly defined by the gcc-N package, which is configured to produce baseline binaries when options like -march= are not used. - https://wiki.debian.org/ArchitectureSpecificsMemo#Architecture_baselines

Is that what this change does?

thiagokokada · 2022-11-24T14:53:36Z

x86-64-v2 (this PR) should be safe-ish to enable and brings a good performance boost.

I think Arch uses x86-64-v3 (haswell), but maybe this is too much.

SamLukeYes · 2022-11-24T15:13:20Z

x86-64-v2 (this PR) should be safe-ish to enable and brings a good performance boost.

I think Arch uses x86-64-v3 (haswell), but maybe this is too much.

AFAIK, most of Arch packages are still using x86-64, though x86_64_v3 architecture was added to devtools. The RFC is to keep both x86_64 and x86_64_v3 packages, instead of bumping the default architecture.

SuperSandro2000 · 2022-11-27T22:21:49Z

Also we should probably add the x86_64-vX feature levels instead of using uarch names, because we want the featureset without the uarch specific codegen.

I added a suggestion from me to this PR.

Is that what this change does?

Yes, on first look.

x86_64_v3 packages, instead of bumping the default architecture.

We are currently not doing this because it would be equivalent of adding a new architecture because we would build everything twice.

I am also currently (trying) to build my systems with haswell/skylake on my private hydra, to see how things go. I'll probably upstream some fixes along the way, too. The first broken package I encountered was libxcrypt which is already fixed on staging and that when bumping to skylake I couldn't run the jemalloc tests on a zenver3 machine.

Tungsten842 · 2022-11-28T08:42:13Z

Some more cpu features should be added: POPCNT, LAHF-SAHF, CMPXCHG16B...
https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

thiagokokada

LGTM.

IMO, we should merge this ASAP to get early feedback on what it is going to break.

Flakebi · 2023-06-14T20:06:45Z

To clarify, I did compare the runtime performance of an application compiled with gcc and znver3.
That application happened to be clang (because compilation is where I care most about performance). The flags I passed to clang to compile SuperTuxKart were the same in both cases I compared.

I was hoping that any improvement I see with znver3 is at least as good as x86-64-v2, which should be a subset/more general optimizations.
Apparently, clang is not a workload that benefits from that.

K900 · 2023-06-14T20:09:38Z

The difference between v2 and higher feature levels is mostly vector extensions, so it's to be expected a compiler (which is mostly pointer chasing) doesn't gain much here.

oxalica · 2023-07-28T10:30:28Z

The difference between v2 and higher feature levels is mostly vector extensions, so it's to be expected a compiler (which is mostly pointer chasing) doesn't gain much here.

I want to note that v3 DOES have some non-vector ops which benefits compilers, that includes 3-operand bitshift {S{H{L,R},AL},RO{L,R}}X. Since they have individual destination registers and does not limit CL for shift count, they can reduce register pressure and MOVs, especially when shift-count is not a constant. Bitshifts are heavily used anywhere, including time-sensitive hashing algorithms.
SHRX is also relied by a 1-cycle-per-byte shift-based-DFA algorithm, which can be used for lexing, UTF-8 validation or anything other small-state DFAs.

peterhoeg · 2023-07-29T13:32:45Z

Here are some benchmarks from arch where the person running it kindly did a tl;dr which I am pasting here:

- there is no or negligible performance benefit of *-march=nehalem*, which corresponds to x86_64-v2,

- there is a moderate benefit of *-march=haswell* (x86_64-v3) - of around 10%-20% as compared to baseline for the tests performed

Link: https://lists.archlinux.org/pipermail/arch-general/2021-March/048739.html

I know it's a sample size of 1, but I would argue that the burden of proof with regards to demonstrating any benefit now lies with those proposing this change. The argument that "well, everyone else is doing it" doesn't carry much weight in this.

peterhoeg · 2023-07-29T13:50:54Z

One thing I forgot (apologies for the comment spam) is if we do start looking at potential benefits and where we stand to gain something from this (full disclosure: I'm very much not a compiler guy).

either the act of building software is faster because the compilers benefit from this (use less energy, can get critical fixes shipped faster, can build more with less and so on), or
the resulting software runs faster (or uses fewer resources) - who doesn't like free speed?

I would imagine, that most of the software that really stands to benefit already has optimizations for various cpus/cpu features where we would gain nothing from fiddling with the baseline flags.

So could we compile variants of the compilers with a better baseline and use that when available? This way there would be no sudden breakage for end users?

vcunat · 2023-07-29T13:56:49Z

You can't do this "when available" that easily. What would people with older computers do? Impurely switch their compiler variant? Or would we compile everything twice (e.g. base + -v3)?

RaitoBezarius · 2023-07-29T14:13:25Z

Here are some benchmarks from arch where the person running it kindly did a tl;dr which I am pasting here:
- there is no or negligible performance benefit of *-march=nehalem*, which corresponds to x86_64-v2,

- there is a moderate benefit of *-march=haswell* (x86_64-v3) - of around 10%-20% as compared to baseline for the tests performed
Link: lists.archlinux.org/pipermail/arch-general/2021-March/048739.html

I know it's a sample size of 1, but I would argue that the burden of proof with regards to demonstrating any benefit now lies with those proposing this change. The argument that "well, everyone else is doing it" doesn't carry much weight in this.

This was mentioned in #202526 (comment) :).
It is not sufficient though.

I would imagine, that most of the software that really stands to benefit already has optimizations for various cpus/cpu features where we would gain nothing from fiddling with the baseline flags.

So could we compile variants of the compilers with a better baseline and use that when available? This way there would be no sudden breakage for end users?

Someone needs to do the rigorous work of looking into which packages benefit the most currently, see Guix prior art on that.

peterhoeg · 2023-07-29T14:50:34Z

Or would we compile everything twice (e.g. base + -v3)?

Compile the compilers twice and abuse FOD… I said it was an idea - not that it was a *good* idea.

ghost · 2023-07-31T21:38:57Z

(full disclosure: I'm very much not a compiler guy).

There's a bit of a self-selection problem here; the people who really care about this stuff are already building everything themselves. I build with -march= and -mcpu= set to exactly what -mnative would choose, on four different architectures.

But my nix is patched to remove the Hydra key, so I don't really care what cache.nixos.org does.

kjeremy · 2023-08-28T19:22:28Z

It looks like CentOS is investigating what a jump to v3 might look like: https://blog.centos.org/2023/08/centos-isa-sig-performance-investigation/

sergv · 2023-08-28T23:26:20Z

I'd be intrerested to revisit reasons of other distros for switching to x86-64-v2, e.g. RedHat. Do they apply in NixOs' case? Or are they completely unapplicable to the NixOs? It's "do it because everyone else is doing it" kind of argument, but my point is why is everyone else doing it? They stand dropping support for old hardware just like NixOs does but still they decide to go ahead. What makes NixOs's situation different from everyone else's?

RaitoBezarius · 2023-08-28T23:29:29Z

Right now, in NixOS, we decided to postpone as no one came with compelling evidence that v2 provides serious benefits. I am trying to assemble a recent unstable v2 and v3 system and build a "relevant" benchmark to see what's going on. But it takes time and resources. In the meantime, there's no reason to break older hardware I suppose. Le mar. 29 août 2023 à 02:26, Sergey Vinokurov ***@***.***> a écrit :

…

I'd be intrerested to revisit reasons of other distros for switching to x86-64-v2, e.g. RedHat. Do they apply in NixOs' case? Or are they completely unapplicable to the NixOs? It's "do it because everyone else is doing it" kind of argument, but my point is why is everyone else doing it? They stand dropping support for old hardware just like NixOs does but still they decide to go ahead. What makes NixOs's situation different from everyone else's? — Reply to this email directly, view it on GitHub <#202526 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACMZRHTFF4Y6VGVGFERS2TXXUSKRANCNFSM6AAAAAASI77324> . You are receiving this because you were mentioned.Message ID: ***@***.***>

vcunat · 2023-08-29T06:05:48Z

AFAIK the other distros don't do it without replacement and still provide also binaries for older CPUs. This PR's proposal didn't do that. (And actually I don't think this is worth doubling build and storage costs for the x86 infra parts.)

nyabinary · 2023-09-05T17:05:43Z

https://www.phoronix.com/news/CentOS-ISA-Experiment-Perform
CentOS is looking at v3 baseline

K900 · 2023-09-05T17:07:54Z

As I said on Matrix, CentOS can afford to do that because CentOS has very long release cycles, so people on CentOS N-1 will remain supported for many years to come. For NixOS, the switchover would take at most 6 months, which is not nearly enough time to cut off support for all pre-2015 hardware.

lucasew · 2023-09-06T12:27:11Z

Isn't there some kind of cflag to automatically create variants that use these features and decide in runtime which function sets will be used?

The examples cited here [1] have like explicit definitions but is there something that can kind of ramp up to more recent instructions?

For example, a math heavy library could generate more than one variant of loop heavy objects and some kind of logic to decide at runtime using the CPUID instruction which code variant will run.

IDK tbh if this is possible in single file binaries. The common approach for BLAS-like libraries for example is to generate a dynamic library for each variant then decide at runtime which one to dlopen.

[1] https://lwn.net/Articles/691932/

nixos-discourse · 2023-11-23T18:03:11Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/pre-rfc-moving-nixos-x86-64-baseline-to-x86-64-v3/35924/2

nixos-discourse · 2023-11-23T19:30:25Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/pre-rfc-gradual-transition-of-nixos-x86-64-baseline-to-x86-64-v3-with-an-intermediate-step-to-x86-64-v2/35924/1

LDprg · 2024-04-30T04:24:13Z

Any news on this? Would really like to see a v2 baseline for Nixos.

Atemu · 2024-04-30T06:51:46Z

See the discourse thread above.

To summarise:

To our current knowledge, there is no good data that proves a benefit from bumping the generic target to x86_64-v2. There might be a benefit but we don't actually know whether it exists or not. If it exits, it will be rather small though. (Note that there is a decent amount of bad data on this.)

There is no good data on how many users a bump beyond x86_64-v1 would exclude either. Anectdata suggests that there are at least some.

There are some packages where even the flawed existing data shows a significant benefit from march tuning beyond a reasonable doubt. glibc-hwcaps could be leveraged to enable such march tuning for those specific packages without excluding any users.

IMHO, a generic bump such as the one in this PR will require an RFC. Perhaps we should close this to signal this a bit clearer.

LDprg · 2024-04-30T07:29:54Z

What about duplicated nixos channels (at least for stable), where one is the generic one and the other is compiled with x86-64-v3? This wouldn't deprecate any platform and it is also the way cachyos handles this.

The problem with selectively compiling packages with x86-64-v3, which could improve the most, is that if I for example compile gcc this will retrigger a recompilation of my whole system. I did not find a way around this problem.

Maybe there should be a flag to prevent such rebuild at the cost of reproducability?

K900 · 2024-04-30T07:33:36Z

We do not have the hardware to build multiple channels. You can use system.replaceRuntimeDependency if you really want to. But also, x86_64-v3 also does not improve things much, and we can likely get very close in terms of performance with some dynamic dispatch.

SuperSandro2000 requested review from alyssais, nbp, Ericson2314 and matthewbauer as code owners November 23, 2022 14:03

SuperSandro2000 marked this pull request as draft November 23, 2022 14:05

SuperSandro2000 added the 9.needs: community feedback label Nov 23, 2022

ofborg bot added 10.rebuild-darwin: 501+ 10.rebuild-darwin: 1001-2500 10.rebuild-linux: 501+ 10.rebuild-linux: 2501-5000 labels Nov 23, 2022

SuperSandro2000 force-pushed the architecture-sse42-avx branch from 36aa480 to 8754614 Compare November 27, 2022 22:20

ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 and removed 10.rebuild-darwin: 501+ 10.rebuild-darwin: 1001-2500 10.rebuild-linux: 501+ 10.rebuild-linux: 2501-5000 labels Nov 27, 2022

SuperSandro2000 force-pushed the architecture-sse42-avx branch from 8754614 to 94707c7 Compare November 28, 2022 10:58

thiagokokada changed the title ~~systems/architecture: bump default architecture to westmere~~ systems/architecture: bump default architecture to x86_64-v2 Nov 28, 2022

thiagokokada approved these changes Nov 28, 2022

View reviewed changes

ofborg bot added 10.rebuild-darwin: 501+ 10.rebuild-darwin: 1001-2500 labels Nov 28, 2022

SuperSandro2000 mentioned this pull request Jun 20, 2023

[RFC 0153] Non-legacy boot NixOS NixOS/rfcs#154

Draft

fabianhjr mentioned this pull request Jun 22, 2023

lib.systems.architectures: add microarchitecture levels #239120

Merged

12 tasks

thiagokokada mentioned this pull request Oct 11, 2023

babashka: use upstream version of Clojure tools #257473

Merged

12 tasks

OPNA2608 mentioned this pull request Nov 2, 2023

yaml-cpp: 0.7.0 -> 0.8.0 #249947

Merged

12 tasks

wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024

stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 30, 2024

K900 closed this Apr 30, 2024

SuperSandro2000 deleted the architecture-sse42-avx branch April 30, 2024 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systems/architecture: bump default architecture to x86-64-v2 #202526

systems/architecture: bump default architecture to x86-64-v2 #202526

SuperSandro2000 commented Nov 23, 2022 •

edited

Loading

SuperSandro2000 commented Nov 23, 2022

K900 commented Nov 23, 2022

K900 commented Nov 23, 2022

NickCao commented Nov 23, 2022

thiagokokada commented Nov 24, 2022

SamLukeYes commented Nov 24, 2022

SuperSandro2000 commented Nov 27, 2022 •

edited

Loading

Tungsten842 commented Nov 28, 2022 •

edited

Loading

thiagokokada left a comment

Flakebi commented Jun 14, 2023

K900 commented Jun 14, 2023

oxalica commented Jul 28, 2023 •

edited

Loading

peterhoeg commented Jul 29, 2023

peterhoeg commented Jul 29, 2023

vcunat commented Jul 29, 2023

RaitoBezarius commented Jul 29, 2023

peterhoeg commented Jul 29, 2023 via email

ghost commented Jul 31, 2023 •

edited by ghost

Loading

kjeremy commented Aug 28, 2023

sergv commented Aug 28, 2023

RaitoBezarius commented Aug 28, 2023 via email

vcunat commented Aug 29, 2023

nyabinary commented Sep 5, 2023

K900 commented Sep 5, 2023

lucasew commented Sep 6, 2023

nixos-discourse commented Nov 23, 2023

nixos-discourse commented Nov 23, 2023

LDprg commented Apr 30, 2024

Atemu commented Apr 30, 2024 •

edited

Loading

LDprg commented Apr 30, 2024

K900 commented Apr 30, 2024

systems/architecture: bump default architecture to x86-64-v2 #202526

systems/architecture: bump default architecture to x86-64-v2 #202526

Conversation

SuperSandro2000 commented Nov 23, 2022 • edited Loading

Description of changes

Things done

SuperSandro2000 commented Nov 23, 2022

K900 commented Nov 23, 2022

K900 commented Nov 23, 2022

NickCao commented Nov 23, 2022

thiagokokada commented Nov 24, 2022

SamLukeYes commented Nov 24, 2022

SuperSandro2000 commented Nov 27, 2022 • edited Loading

Tungsten842 commented Nov 28, 2022 • edited Loading

thiagokokada left a comment

Choose a reason for hiding this comment

Flakebi commented Jun 14, 2023

K900 commented Jun 14, 2023

oxalica commented Jul 28, 2023 • edited Loading

peterhoeg commented Jul 29, 2023

peterhoeg commented Jul 29, 2023

vcunat commented Jul 29, 2023

RaitoBezarius commented Jul 29, 2023

peterhoeg commented Jul 29, 2023 via email

ghost commented Jul 31, 2023 • edited by ghost Loading

kjeremy commented Aug 28, 2023

sergv commented Aug 28, 2023

RaitoBezarius commented Aug 28, 2023 via email

vcunat commented Aug 29, 2023

nyabinary commented Sep 5, 2023

K900 commented Sep 5, 2023

lucasew commented Sep 6, 2023

nixos-discourse commented Nov 23, 2023

nixos-discourse commented Nov 23, 2023

LDprg commented Apr 30, 2024

Atemu commented Apr 30, 2024 • edited Loading

LDprg commented Apr 30, 2024

K900 commented Apr 30, 2024

SuperSandro2000 commented Nov 23, 2022 •

edited

Loading

SuperSandro2000 commented Nov 27, 2022 •

edited

Loading

Tungsten842 commented Nov 28, 2022 •

edited

Loading

oxalica commented Jul 28, 2023 •

edited

Loading

ghost commented Jul 31, 2023 •

edited by ghost

Loading

Atemu commented Apr 30, 2024 •

edited

Loading