Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel B580 issues #315

Open
chocolate42 opened this issue Dec 24, 2024 · 3 comments
Open

intel B580 issues #315

chocolate42 opened this issue Dec 24, 2024 · 3 comments

Comments

@chocolate42
Copy link

chocolate42 commented Dec 24, 2024

The timings are much worse on prpll, but there are issues on gpuowl too, a lot of warnings about spilling registers on kernels and assertion failures. Sometimes a run crashes sometimes it completes.

For some data to chew on I ran most exponents prpll tunes with to exercise a lot of the FFT's, maybe there's a pattern. There were still the same/similar issues with NO_ASM and DEBUG, more reg spills with DEBUG on presumably from the debug symbols.

gpuowl_b580_regspill.zip

While running with DEBUG,NO_ASM I noticed some roundoff output that wasn't being captured by the log so here's a terminal copy:

gouowl_b580_roundoff.zip

It's worth noting that mfakto doesn't complete all its self tests, with differing amounts completing each time.

edit: Some of the exponents in the tune list are composite which may be clouding these results. Will filter them out and retry.

@chocolate42
Copy link
Author

chocolate42 commented Dec 24, 2024

Here's a redo with definite prime numbers. No more crashes that was a red herring.

gpuowl_prpll_ffttest.zip

I stopped the gpuowl test partway through because both NEO (intel runtime) and mesa have had updates. Time to rebase.

edit: Runs with the latest neo. No change that I can notice.

gpuowl_prpll_neo_latest.zip

@preda
Copy link
Owner

preda commented Dec 27, 2024

So firstly, does PRPLL run correctly on B580?
The ROUNDOFF errors.. are they because the FFT that was selected was too small, or because of some actual problems producing garbage that manifests itself as roundoff? The first case (FFT too small) is normal and fixed by selecting a larger FFT. The second case is an actual problem.

Secondly, does PRPLL run fast on B580?
to measure performance, DEBUG should not be used! (DEBUG enables asserts() in OpenCL code and thus kills performance).

We never had an opportunity to tune B580 to any degree, so it would not be completly surprising if it runs slow..

@chocolate42
Copy link
Author

PRPLL does run correctly, the issues with the data in the first post is me being a dummy and blindly running the exponents in the tune list as if they were prime, mostly they are not.

The latest commit George did disabling NONTEMPORAL by default made prpll much quicker, 4M seems in a good spot but George says 512:15:512 is 10x slower than RadeonVII which IMO probably means that the B580 gets relatively slower as FFT increases. https://www.mersenneforum.org/node/1062411?p=1064580#post1064580

The roundoff error in the link is because I intentionally chose 4M FFT for that exponent to match what gpuowl uses/used for this benchmark: https://docs.google.com/spreadsheets/d/1Kxd8wQayP8FtdoKQ6k5kayM8cMNTaNJV/edit?gid=1315040372#gid=1315040372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants