Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T-Rex v0.26.1 - WARN: NVML: can't get fan speed for GPU #4, error code 999 #1310

Open
MiroslavKatic opened this issue May 13, 2022 · 6 comments

Comments

@MiroslavKatic
Copy link

MiroslavKatic commented May 13, 2022

I'm using every Windows T-Rex releases last 12 months with the same overclock setting (command line, not Afterburner) and it was great and stable. With latest release I'm having instability, every 8 to 12 hours it crashes.
NVIDIA Driver v512.59
Log:

GPU #0: MSI RTX 3060 - 49.40 MH/s, [T:50C, P:113W, F:50%, E:437kH/W], 187/192 R:2.6%
GPU #1: MSI RTX 3060 - 48.69 MH/s, [T:50C, P:113W, F:39%, E:431kH/W], 182/187 R:2.67%
GPU #2: MSI RTX 3060 - 49.37 MH/s, [T:50C, P:113W, F:62%, E:437kH/W], 205/208 R:1.44%
GPU #3: MSI RTX 3060 - 48.75 MH/s, [T:50C, P:113W, F:53%, E:431kH/W], 216/221 R:2.26%
GPU #4: MSI RTX 3060 - 48.07 MH/s, [T:50C, P:113W, F:40%, E:425kH/W], 195/197 R:1.02%
GPU #5: MSI RTX 3060 - 49.40 MH/s, [T:50C, P:113W, F:70%, E:437kH/W], 201/201 R:0%
Hashrate: 293.68 MH/s, Shares/min: 2.712 (Avg. 1.895), Avg.P: 678W, Avg.E: 433kH/W
Uptime: 3 hours 59 mins 12 secs | Algo: ethash | Driver: 512.59 | T-Rex 0.26.1
WD: 10 hours 16 mins 58 secs, shares: 1186/1206 R:1.66%, restarts 1
WD: ======== GPU CRASH LIST ========
WD: GPU#4: 1

20220513 09:16:40 [ OK ] 453/459 - 293.58 MH/s, 43ms ... GPU #2
20220513 09:16:45 WARN: NVML: can't get fan speed for GPU #4, error code 999
20220513 09:16:48 TREX: Can't stop device [ID=5, GPU #5], cuda exception: CUDA_ERROR_UNKNOWN
20220513 09:16:48 WARN: Miner is going to shutdown...
20220513 09:16:48 Main loop finished. Cleaning up resources...
20220513 09:16:48 ApiServer: stopped listening on 192.168.1.222:6661
20220513 09:17:17 WARN: WATCHDOG: T-Rex has a problem with GPU, terminating...
20220513 09:17:18 WARN: WATCHDOG: recovering T-Rex
20220513 09:17:18 WATCHDOG: 4 miner restarts till 'exit'
20220513 09:17:19 miner_start: executing user script [C:\Crypto\Miner\T-Rex\script_start.bat]
20220513 09:17:19 T-Rex NVIDIA GPU miner v0.26.1 - [Windows]
...

Please, can you fix it?

@ESP4Ever
Copy link

ESP4Ever commented May 13, 2022

hi

I have same problem since v0.25.x on 3080 non-LHR GPU

@tjayz
Copy link

tjayz commented May 13, 2022

I have seen this error a lot on different rigs for different reasons. I have observed that with v26 Trex release, if not setting PL, watt draw has increased somewhat on every gpu. So were you setting PL both before and after upgrade to v26 or allowing it to auto from a locked core? Either way, the gpu that gets the error, mclk must be toned down to resolve, increment of 50 should do the trick from my experiences.

@MiroslavKatic
Copy link
Author

No.

The SAME overclock setting (via (windows) command line args) are used in every single T-Rex version for almost 1 year.

--fan t:50 ^
--pl 67 ^
--cclock -460 ^
--mclock 1280 ^

This ih the first T-Rex release that brings mining instability. Please, fix this ASAP ot I'll have to change T-Rex miner for something else. Please, feel free to ask aditional information.

@tjayz
Copy link

tjayz commented May 13, 2022

Did you update to driver 512.59 with v26 trex or on v25.15? Is this happening on all gpu or just number 4? Are these LHR gpus?

If you are not willing to adjust settings until a later version is released, then I would encourage you to try a different miner to see if you get the same error to narrow down the issue.

To hold you over till new version is released, try lowering mclk 50, and/or increase PL, and/or decrease fan target temp on the afflicted gpus.

@supzfly
Copy link

supzfly commented May 14, 2022

Same issue, haven't run my miner is a couple of days because of it, as it properly crashed the drivers, so I was waiting to see if a fix would be incoming. tuf 3080LHR (65% pl. core temps are in the 50s and vrms are early 90s <-- copper shim modded), win10, 512.59 drivers. v0.26.1 t-rex.

Never had a problem before this version.

On the plus side I was seeing 90-100mh/s beforehand, and it did run for several hours before the problem, so nice work on the improvements :)

@MiroslavKatic
Copy link
Author

MiroslavKatic commented May 14, 2022

Did you update to driver 512.59 with v26 trex or on v25.15? Is this happening on all gpu or just number 4? Are these LHR gpus?

If you are not willing to adjust settings until a later version is released, then I would encourage you to try a different miner to see if you get the same error to narrow down the issue.

To hold you over till new version is released, try lowering mclk 50, and/or increase PL, and/or decrease fan target temp on the afflicted gpus.

As you could see: T-Rex v0.26.1 / NVIDIA Driver v512.59
It is happening on all GPUs randomly.
3060 12GB LHR v2 all of them

For almost 21 hours, I'm running NBMiner flawlessly. Couldn't make over 10 hours on T-Rex v0.26.1 on the same system. To make things worse for T-Rex, I increased mem.clock from 1280 to 1320 in NBminer so the overclock is not an issue.

image

Please fix T-Rex ASAP. I would like to stay on T-Rex, NBMiner is Chinese product I think. But I can't revert to T-Rex until you fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants