Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mozconfigs with -mavx2 -mfma, and correct issue in windows mozconfig rustflags #1216

Merged
merged 2 commits into from
Sep 7, 2024

Conversation

Alex313031
Copy link
Contributor

No description provided.

@mauro-balades
Copy link
Member

Thanks! But wouldn't this lower the CPU requirements?

@mauro-balades
Copy link
Member

Heya, unfortunately, axv2 is already included with x86_64-v3 and for the sake of not "over-flagging", im going to close this PR...

Checkout what v3 includes:

gcc -march=x86-64-v3 -Q --help=target
The following options are target specific:
  -m128bit-long-double        		[enabled]
  -m16                        		[disabled]
  -m32                        		[disabled]
  -m3dnow                     		[disabled]
  -m3dnowa                    		[disabled]
  -m64                        		[enabled]
  -m80387                     		[enabled]
  -m8bit-idiv                 		[disabled]
  -m96bit-long-double         		[disabled]
  -mabi=                      		sysv
  -mabm                       		[disabled]
  -maccumulate-outgoing-args  		[disabled]
  -maddress-mode=             		long
  -madx                       		[disabled]
  -maes                       		[disabled]
  -malign-data=               		compat
  -malign-double              		[disabled]
  -malign-functions=          		0
  -malign-jumps=              		0
  -malign-loops=              		0
  -malign-stringops           		[enabled]
  -mamx-bf16                  		[disabled]
  -mamx-complex               		[disabled]
  -mamx-fp16                  		[disabled]
  -mamx-int8                  		[disabled]
  -mamx-tile                  		[disabled]
  -mandroid                   		[disabled]
  -march=                     		x86-64-v3
  -masm=                      		att
  -mavx                       		[enabled]
  -mavx2                      		[enabled]
  -mavx256-split-unaligned-load 	[disabled]
  -mavx256-split-unaligned-store 	[disabled]
  -mavx5124fmaps              		[disabled]
  -mavx5124vnniw              		[disabled]
  -mavx512bf16                		[disabled]
  -mavx512bitalg              		[disabled]
  -mavx512bw                  		[disabled]
  -mavx512cd                  		[disabled]
  -mavx512dq                  		[disabled]
  -mavx512er                  		[disabled]
  -mavx512f                   		[disabled]
  -mavx512fp16                		[disabled]
  -mavx512ifma                		[disabled]
  -mavx512pf                  		[disabled]
  -mavx512vbmi                		[disabled]
  -mavx512vbmi2               		[disabled]
  -mavx512vl                  		[disabled]
  -mavx512vnni                		[disabled]
  -mavx512vp2intersect        		[disabled]
  -mavx512vpopcntdq           		[disabled]
  -mavxifma                   		[disabled]
  -mavxneconvert              		[disabled]
  -mavxvnni                   		[disabled]
  -mavxvnniint8               		[disabled]
  -mbionic                    		[disabled]
  -mbmi                       		[enabled]
  -mbmi2                      		[enabled]
  -mbranch-cost=<0,5>         		3
  -mcall-ms2sysv-xlogues      		[disabled]
  -mcet-switch                		[disabled]
  -mcld                       		[disabled]
  -mcldemote                  		[disabled]
  -mclflushopt                		[disabled]
  -mclwb                      		[disabled]
  -mclzero                    		[disabled]
  -mcmodel=                   		[default]
  -mcmpccxadd                 		[disabled]
  -mcpu=                      		
  -mcrc32                     		[enabled]
  -mcx16                      		[enabled]
  -mdaz-ftz                   		[disabled]
  -mdirect-extern-access      		[enabled]
  -mdispatch-scheduler        		[disabled]
  -mdump-tune-features        		[disabled]
  -menqcmd                    		[disabled]
  -mf16c                      		[enabled]
  -mfancy-math-387            		[enabled]
  -mfentry                    		[disabled]
  -mfentry-name=              		
  -mfentry-section=           		
  -mfma                       		[enabled]
  -mfma4                      		[disabled]
  -mforce-drap                		[disabled]
  -mforce-indirect-call       		[disabled]
  -mfp-ret-in-387             		[enabled]
  -mfpmath=                   		sse
  -mfsgsbase                  		[disabled]
  -mfunction-return=          		keep
  -mfused-madd                		-ffp-contract=fast
  -mfxsr                      		[enabled]
  -mgather                    		-mtune-ctrl=use_gather
  -mgeneral-regs-only         		[disabled]
  -mgfni                      		[disabled]
  -mglibc                     		[enabled]
  -mhard-float                		[enabled]
  -mharden-sls=               		none
  -mhle                       		[disabled]
  -mhreset                    		[disabled]
  -miamcu                     		[disabled]
  -mieee-fp                   		[enabled]
  -mincoming-stack-boundary=  		0
  -mindirect-branch-cs-prefix 		[disabled]
  -mindirect-branch-register  		[disabled]
  -mindirect-branch=          		keep
  -minline-all-stringops      		[disabled]
  -minline-stringops-dynamically 	[disabled]
  -minstrument-return=        		none
  -mintel-syntax              		-masm=intel
  -mkl                        		[disabled]
  -mlam=                      		none
  -mlarge-data-threshold=<number> 	65536
  -mlong-double-128           		[disabled]
  -mlong-double-64            		[disabled]
  -mlong-double-80            		[enabled]
  -mlwp                       		[disabled]
  -mlzcnt                     		[enabled]
  -mmanual-endbr              		[disabled]
  -mmemcpy-strategy=          		
  -mmemset-strategy=          		
  -mmitigate-rop              		[disabled]
  -mmmx                       		[enabled]
  -mmovbe                     		[enabled]
  -mmovdir64b                 		[disabled]
  -mmovdiri                   		[disabled]
  -mmove-max=                 		128
  -mmpx                       		[disabled]
  -mms-bitfields              		[disabled]
  -mmusl                      		[disabled]
  -mmwait                     		[enabled]
  -mmwaitx                    		[disabled]
  -mneeded                    		[disabled]
  -mno-align-stringops        		[disabled]
  -mno-default                		[disabled]
  -mno-fancy-math-387         		[disabled]
  -mno-push-args              		[disabled]
  -mno-red-zone               		[disabled]
  -mno-sse4                   		[disabled]
  -mnop-mcount                		[disabled]
  -momit-leaf-frame-pointer   		[disabled]
  -mpc32                      		[disabled]
  -mpc64                      		[disabled]
  -mpc80                      		[disabled]
  -mpclmul                    		[disabled]
  -mpcommit                   		[disabled]
  -mpconfig                   		[disabled]
  -mpku                       		[disabled]
  -mpopcnt                    		[enabled]
  -mprefer-avx128             		-mprefer-vector-width=128
  -mprefer-vector-width=      		none
  -mpreferred-stack-boundary= 		0
  -mprefetchi                 		[disabled]
  -mprefetchwt1               		[disabled]
  -mprfchw                    		[disabled]
  -mptwrite                   		[disabled]
  -mpush-args                 		[enabled]
  -mraoint                    		[disabled]
  -mrdpid                     		[disabled]
  -mrdrnd                     		[disabled]
  -mrdseed                    		[disabled]
  -mrecip                     		[disabled]
  -mrecip=                    		
  -mrecord-mcount             		[disabled]
  -mrecord-return             		[disabled]
  -mred-zone                  		[enabled]
  -mregparm=                  		6
  -mrelax-cmpxchg-loop        		[disabled]
  -mrtd                       		[disabled]
  -mrtm                       		[disabled]
  -msahf                      		[enabled]
  -mscatter                   		-mtune-ctrl=use_scatter
  -mserialize                 		[disabled]
  -msgx                       		[disabled]
  -msha                       		[disabled]
  -mshstk                     		[disabled]
  -mskip-rax-setup            		[disabled]
  -msoft-float                		[disabled]
  -msse                       		[enabled]
  -msse2                      		[enabled]
  -msse2avx                   		[disabled]
  -msse3                      		[enabled]
  -msse4                      		[enabled]
  -msse4.1                    		[enabled]
  -msse4.2                    		[enabled]
  -msse4a                     		[disabled]
  -msse5                      		-mavx
  -msseregparm                		[disabled]
  -mssse3                     		[enabled]
  -mstack-arg-probe           		[disabled]
  -mstack-protector-guard-offset= 	
  -mstack-protector-guard-reg= 		
  -mstack-protector-guard-symbol= 	
  -mstack-protector-guard=    		tls
  -mstackrealign              		[disabled]
  -mstore-max=                		128
  -mstringop-strategy=        		[default]
  -mstv                       		[enabled]
  -mtbm                       		[disabled]
  -mtls-dialect=              		gnu
  -mtls-direct-seg-refs       		[enabled]
  -mtsxldtrk                  		[disabled]
  -mtune-ctrl=                		
  -mtune=                     		generic
  -muclibc                    		[disabled]
  -muintr                     		[disabled]
  -munroll-only-small-loops   		[disabled]
  -mvaes                      		[disabled]
  -mveclibabi=                		[default]
  -mvect8-ret-in-mem          		[disabled]
  -mvpclmulqdq                		[disabled]
  -mvzeroupper                		[disabled]
  -mwaitpkg                   		[disabled]
  -mwbnoinvd                  		[disabled]
  -mwidekl                    		[disabled]
  -mx32                       		[disabled]
  -mxop                       		[disabled]
  -mxsave                     		[enabled]
  -mxsavec                    		[disabled]
  -mxsaveopt                  		[disabled]
  -mxsaves                    		[disabled]

  Known assembler dialects (for use with the -masm= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known choices for mitigation against straight line speculation with -mharden-sls=:
    all indirect-jmp none return

  Known indirect branch choices (for use with the -mindirect-branch=/-mfunction-return= options):
    keep thunk thunk-extern thunk-inline

  Known choices for return instrumentation with -minstrument-return=:
    call none nop5

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known preferred register vector length (to use with the -mprefer-vector-width= option):
    128 256 512 none

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
    vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

  Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids emeraldrapids alderlake raptorlake meteorlake graniterapids graniterapids-d bonnell atom silvermont slm goldmont goldmont-plus tremont gracemont sierraforest grandridge knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 lujiazui k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 znver4 btver1 btver2 generic native

  Known valid arguments for -mtune= option:
    generic i386 i486 pentium lakemont pentiumpro pentium4 nocona core2 nehalem sandybridge haswell bonnell silvermont goldmont goldmont-plus tremont sierraforest grandridge knl knm skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake rocketlake graniterapids graniterapids-d intel lujiazui geode k6 athlon k8 amdfam10 bdver1 bdver2 bdver3 bdver4 btver1 btver2 znver1 znver2 znver3 znver4

@Alex313031
Copy link
Contributor Author

@mauro-balades Well, the flags have everything up to -mavx, and then the -march=x86_64-v3 directly after. I was just going with how the code currently is and adding -mavx2 and -fma to complete the lines.

Also, the Windows config was inadvertently set in RUSTFLAGS to -C target-feature=+avx instead of target-feature=+avx2

Also, I should have put this in the notes, but even though -march=xxx is supposed to propagate all the individual -mxxx flags, I have found through experience with Chromium and Firefox, especially concerning third party libraries such as the ones in the third_party dir in both Chromium and Firefox like WebRTC, that that isn't often the case.
Especially when a library has different source files for different SIMD intrinsics, but only will compile them when a -mxxx flag matching the intrinsics is passed.

For example, in Chromium, in //third_party/ffmpeg, if one passes -march=x86_64-v3, it won't compile the AVX2 files for fast Fourier transform audio paths, but it will if you use -mavx2.

So just to cover all the bases and prevent any ambiguity as to whether a flag is being passed, I like to do as this commit does, and pass all the -mxxx flags as well as the -march=xxx and/or -mtune=xxx flag.

I like to keep things simple too, but because I'm a speed freak, I never skimp out on my compiler flags lolol. I even have a lot of -mllvm, blahblahblah flags in Thorium, such as loop unrolling and a 2nd vectorizor pass to "SIMDify" loops after they have been unrolled.

Also, is this the same guy I was talking to on Reddit about this, and had initially thought you had used the flags from Mercury?

@mauro-balades mauro-balades reopened this Sep 6, 2024
@mauro-balades
Copy link
Member

Yep, it's me, the reddit guy. Anyways, thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants