Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmrig-amd 2.14.0. Rigs crasing afrer 5-6 hours of working. Out of memory. #241

Closed
rumatadest opened this issue Mar 13, 2019 · 25 comments
Closed
Labels
Milestone

Comments

@rumatadest
Copy link

rumatadest commented Mar 13, 2019

amd rigs crasing afrer 5-6 hours of working. Out of memory
on xmr-stak same issue


xmrig-amd : v2.14.0
kernels 4.20.8 and 5.0.1
driver : amdgpu build-in kernels
OS : Ubuntu 18.04.1 without GUI
algo : cnR , monero
GPU : 6 x RX580 - 4G

---
---
dmesg
[29802.252972] Out of memory: Kill process 6417 (xmrig-amd) score 157 or sacrifice child
[29802.253100] Killed process 6417 (xmrig-amd) total-vm:68738522144kB, anon-rss:843408kB, file-rss:0kB, shmem-rss:4
kB
[29802.278766] amdgpu_mn_invalidate_range_start_gfx+0x0/0x150 [amdgpu] callback failed with -11 in non-blockable co
ntext.
----
top
top - 13:44:02 up  4:47,  3 users,  load average: 0.77, 0.50, 0.46
Tasks: 223 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.9 us,  2.0 sy,  0.0 ni, 91.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3992336 total,   122152 free,  2548004 used,  1322180 buff/cache
KiB Swap:  2097148 total,  1545456 free,   551692 used.   128376 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                        
23543 root      20   0 64.017t 775940  60768 S  15.6 19.4  20:38.91 xmrig-amd      
---
uname -a
Linux rig2 5.0.1-050001-generic #201903100732 SMP Sun Mar 10 07:33:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
---
lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic
@rumatadest
Copy link
Author

rumatadest commented Mar 13, 2019

After 2 hours swap usage has grown

top - 15:12:10 up  6:15,  3 users,  load average: 0.89, 0.60, 0.50
Tasks: 223 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  2.7 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3992336 total,   126076 free,  2649836 used,  1216424 buff/cache
KiB Swap:  2097148 total,   589052 free,  1508096 used.   123864 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                        
23543 root      20   0 64.018t 810704  34132 S  13.6 20.3  32:22.17 xmrig-amd        

@xmrig xmrig added the bug label Mar 13, 2019
@xmrig
Copy link
Owner

xmrig commented Mar 14, 2019

This issue should be fixed in dev branch, OpenCL kernels was leak on every new block.
Thank you.

@xmrig xmrig added this to the v2.14 milestone Mar 14, 2019
@xmrig
Copy link
Owner

xmrig commented Mar 14, 2019

@netmebtc
Copy link

netmebtc commented Mar 18, 2019

v2.14.1 seems not solve it, My vega rig running 2.14.1 also working for 7-8 hours then stop mining. rig is windows 10 , amd drv 18.6.1. I hvae five 6 x vegas rigs, they all have the same issue.

@napaster
Copy link

The problem still remains. Also working for 7-8 hours then stop mining

kernel: linux-catalyst 4.16.12-1
driver: catalyst-test 15.12-26
xmrig-amd 2.14.1-1
os: ArchLinux
GPU: 6xr270x

dmesg log:
Out of memory: Kill process 259 (xmrig-amd) score 714 or sacrifice child
Killed process 259 (xmrig-amd) total-vm:2608612kB, anon-rss:1365524kB, file-rss:9340kB, shmem-rss:0kB
INFO: task xmrig-amd:1677 blocked for more than 120 seconds.
Tainted: G O 4.16.12-1-catalyst #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xmrig-amd D 0 1677 1 0x00000000
Call Trace:
? __schedule+0x282/0x890
? select_task_rq_fair+0x620/0xbd0
schedule+0x32/0x90
schedule_timeout+0x311/0x4a0
? KCL_DEBUG_Print_Trace+0x32/0xf0 [fglrx]
__down+0x7d/0xd0
? __check_object_size+0xfb/0x180
down+0x3b/0x50
firegl_cmmqs_CWDDE_32+0x16a/0x480 [fglrx]
? _copy_from_user+0x37/0x60
? firegl_cmmqs_CWDDE32+0x8e/0x140 [fglrx]
? firegl_cmmqs_disabledriver+0x110/0x110 [fglrx]
? firegl_ioctl+0x1f4/0x260 [fglrx]
? ip_firegl_unlocked_ioctl+0xa/0x10 [fglrx]
? do_vfs_ioctl+0xa4/0x610
? __do_page_fault+0x237/0x570
? SyS_ioctl+0x74/0x80
? do_syscall_64+0x74/0x190
? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
INFO: task xmrig-amd:1705 blocked for more than 120 seconds.
Tainted: G O 4.16.12-1-catalyst #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xmrig-amd D 0 1705 1 0x00000000
Call Trace:
? __schedule+0x282/0x890
schedule+0x32/0x90
schedule_timeout+0x311/0x4a0
? KCL_DEBUG_Print_Trace+0x32/0xf0 [fglrx]
__down+0x7d/0xd0
? __check_object_size+0xfb/0x180
down+0x3b/0x50
firegl_cmmqs_CWDDE_32+0x16a/0x480 [fglrx]
? _copy_from_user+0x37/0x60
? firegl_cmmqs_CWDDE32+0x8e/0x140 [fglrx]
? firegl_cmmqs_disabledriver+0x110/0x110 [fglrx]
? firegl_ioctl+0x1f4/0x260 [fglrx]
? ip_firegl_unlocked_ioctl+0xa/0x10 [fglrx]
? do_vfs_ioctl+0xa4/0x610
? __schedule+0x28a/0x890
? SyS_ioctl+0x74/0x80
? do_syscall_64+0x74/0x190
? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
<6>[fglrx] ASIC hang happened

@rumatadest
Copy link
Author

Now I have uptime all rigs 6 days
But I also think that the problem is not fully resolved. The size of the used ram is still growing. But not as fast as before.
Used ram in a rig with one buggy GPU grows faster than in a rigs with a all normal GPU

@MetallianFR68
Copy link

I created the issue #245 and I continue to comment here.
On my systems, the problem appears after exactly 22 hours of mining
Windows 10 Pro - Adrenalin 18.6.1 - I3-4160 CPU - 8Gb Ram
Current Memory Usage after 6 Hours Mining : 764Mb RAM for the xmrig process

@BKdilse
Copy link

BKdilse commented Mar 19, 2019

I can also confirm this problem existing in 2.14.1. I've tested 2 rigs, and they both start failing after several hours. XMR-Stak 2.10.2 has the same issue.

@QwertyJack
Copy link

QwertyJack commented Mar 20, 2019

Same here on windows 10 ( 2.14.1, vega56 x4 = 8 threads total, Adrenalin 18.6.1, 4 Gb ram, I5-4xxx ).
After about 20 hours some threads ( 3~4 out of 8 ) drop to 0; I have to reboot win.
BTW: This make me think my card is dying :-(

@k0ste
Copy link

k0ste commented Mar 20, 2019

@SChernykh please take a look. There is definitely memory leak that not fixed via 2.14.1 release, and also was backported by @psychocrypt to xmr-stak.
Thanks.

@xmrig
Copy link
Owner

xmrig commented Mar 20, 2019

I confirm this issue is still exists, to helps investigate this issue I create test proxy with ability to create fake jobs with increasing height and specified interval. I starting dig this issue again.
Thank you.

Sample pool for miner config (4000 ms interval):

        {
            "url": "159.65.202.177:3333",
            "user": "x+4000",
            "pass": "x",
            "rig-id": null,
            "nicehash": false,
            "keepalive": false,
            "variant": -1,
            "enabled": true,
            "tls": false,
            "tls-fingerprint": null
        },

@gjimenezf
Copy link

I have the same problem with 2.14.1

@xmrig
Copy link
Owner

xmrig commented Mar 22, 2019

This is driver issue, seems with recent drivers no memory leaks, for old drivers (18.6.1, etc) @SChernykh created workaround #249 it 10 times reduce leak.

However after multiple drivers install/uninstall and reverting to 18.6.1, memory leak no longer exists on my PC, somehow it fixed, still investigate this issue.

@MetallianFR68
Copy link

MetallianFR68 commented Mar 23, 2019

I'm using 18.6.1 but is there a more recent version we can use for mining ?

@gjimenezf
Copy link

I didn't have this problem with my driver using previous version of xmrig

@MetallianFR68
Copy link

any update on that bug ? It's very annoying. I have to restart my rigs very often

@QwertyJack
Copy link

@MetallianFR68 How about upgrading to the latest driver ?

@MetallianFR68
Copy link

I'm using adrenalin 18.6.1 drivers. What version should I use ?

@QwertyJack
Copy link

Maybe 19.3.3 and 19.1.1 worth a try.

@MetallianFR68
Copy link

19.3.3 : hashrate divided by 10
18.12.3 : hashrate divided by 10, only one card working

@gjimenezf
Copy link

No fix yet on releases

@MetallianFR68
Copy link

Any update on this ?
I have to restart my rigs every morning and I loose 2 to 3 hours mining each night

@SChernykh
Copy link
Contributor

@MetallianFR68 Update. Your. Drivers.

@SChernykh
Copy link
Contributor

SChernykh commented Apr 12, 2019

Alternatively, you can try building dev version https://github.com/xmrig/xmrig-amd/commits/dev which has mitigation for driver memory leak.

@xmrig
Copy link
Owner

xmrig commented Sep 15, 2019

@xmrig xmrig closed this as completed Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants