intel B580 issues #315

chocolate42 · 2024-12-24T11:06:53Z

The timings are much worse on prpll, but there are issues on gpuowl too, a lot of warnings about spilling registers on kernels ~~and assertion failures. Sometimes a run crashes sometimes it completes.~~

For some data to chew on I ran most exponents prpll tunes with to exercise a lot of the FFT's, maybe there's a pattern. There were still the same/similar issues with NO_ASM and DEBUG, more reg spills with DEBUG on presumably from the debug symbols.

gpuowl_b580_regspill.zip

While running with DEBUG,NO_ASM I noticed some roundoff output that wasn't being captured by the log so here's a terminal copy:

gouowl_b580_roundoff.zip

It's worth noting that mfakto doesn't complete all its self tests, with differing amounts completing each time.

edit: Some of the exponents in the tune list are composite which may be clouding these results. Will filter them out and retry.

chocolate42 · 2024-12-24T15:22:51Z

Here's a redo with definite prime numbers. No more crashes that was a red herring.

gpuowl_prpll_ffttest.zip

I stopped the gpuowl test partway through because both NEO (intel runtime) and mesa have had updates. Time to rebase.

edit: Runs with the latest neo. No change that I can notice.

gpuowl_prpll_neo_latest.zip

preda · 2024-12-27T21:45:56Z

So firstly, does PRPLL run correctly on B580?
The ROUNDOFF errors.. are they because the FFT that was selected was too small, or because of some actual problems producing garbage that manifests itself as roundoff? The first case (FFT too small) is normal and fixed by selecting a larger FFT. The second case is an actual problem.

Secondly, does PRPLL run fast on B580?
to measure performance, DEBUG should not be used! (DEBUG enables asserts() in OpenCL code and thus kills performance).

We never had an opportunity to tune B580 to any degree, so it would not be completly surprising if it runs slow..

chocolate42 · 2024-12-28T10:05:54Z

PRPLL does run correctly, the issues with the data in the first post is me being a dummy and blindly running the exponents in the tune list as if they were prime, mostly they are not.

The latest commit George did disabling NONTEMPORAL by default made prpll much quicker, 4M seems in a good spot but George says 512:15:512 is 10x slower than RadeonVII which IMO probably means that the B580 gets relatively slower as FFT increases. https://www.mersenneforum.org/node/1062411?p=1064580#post1064580

The roundoff error in the link is because I intentionally chose 4M FFT for that exponent to match what gpuowl uses/used for this benchmark: https://docs.google.com/spreadsheets/d/1Kxd8wQayP8FtdoKQ6k5kayM8cMNTaNJV/edit?gid=1315040372#gid=1315040372

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intel B580 issues #315

intel B580 issues #315

chocolate42 commented Dec 24, 2024 •

edited

Loading

chocolate42 commented Dec 24, 2024 •

edited

Loading

preda commented Dec 27, 2024 •

edited

Loading

chocolate42 commented Dec 28, 2024

intel B580 issues #315

intel B580 issues #315

Comments

chocolate42 commented Dec 24, 2024 • edited Loading

chocolate42 commented Dec 24, 2024 • edited Loading

preda commented Dec 27, 2024 • edited Loading

chocolate42 commented Dec 28, 2024

chocolate42 commented Dec 24, 2024 •

edited

Loading

chocolate42 commented Dec 24, 2024 •

edited

Loading

preda commented Dec 27, 2024 •

edited

Loading