-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intel B580 issues #315
Comments
Here's a redo with definite prime numbers. No more crashes that was a red herring. I stopped the gpuowl test partway through because both NEO (intel runtime) and mesa have had updates. Time to rebase. edit: Runs with the latest neo. No change that I can notice. |
So firstly, does PRPLL run correctly on B580? Secondly, does PRPLL run fast on B580? We never had an opportunity to tune B580 to any degree, so it would not be completly surprising if it runs slow.. |
PRPLL does run correctly, the issues with the data in the first post is me being a dummy and blindly running the exponents in the tune list as if they were prime, mostly they are not. The latest commit George did disabling NONTEMPORAL by default made prpll much quicker, 4M seems in a good spot but George says 512:15:512 is 10x slower than RadeonVII which IMO probably means that the B580 gets relatively slower as FFT increases. https://www.mersenneforum.org/node/1062411?p=1064580#post1064580 The roundoff error in the link is because I intentionally chose 4M FFT for that exponent to match what gpuowl uses/used for this benchmark: https://docs.google.com/spreadsheets/d/1Kxd8wQayP8FtdoKQ6k5kayM8cMNTaNJV/edit?gid=1315040372#gid=1315040372 |
The timings are much worse on prpll, but there are issues on gpuowl too, a lot of warnings about spilling registers on kernels
and assertion failures. Sometimes a run crashes sometimes it completes.For some data to chew on I ran most exponents prpll tunes with to exercise a lot of the FFT's, maybe there's a pattern. There were still the same/similar issues with NO_ASM and DEBUG, more reg spills with DEBUG on presumably from the debug symbols.
gpuowl_b580_regspill.zip
While running with DEBUG,NO_ASM I noticed some roundoff output that wasn't being captured by the log so here's a terminal copy:
gouowl_b580_roundoff.zip
It's worth noting that mfakto doesn't complete all its self tests, with differing amounts completing each time.
edit: Some of the exponents in the tune list are composite which may be clouding these results. Will filter them out and retry.
The text was updated successfully, but these errors were encountered: