-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
z_null_int high disk I/O #6171
Comments
So I blew away all my filesystem snapshots except for the most recent one. I have a cron that makes them once every 2 weeks, and the oldest was in early February and the newest was 2017-05-15. Killing the snapshots also killed the constant disk I/O but I'm not entirely sure why. |
I'm having the same issue on Ubuntu 17.04 with Linux 4.11.6 (Ubuntu mainline) with SPL 0.7.0-rc4_4_gac48361c and ZFS 0.7.0-rc4_65_gd9ad3fea3b without having any snapshots. Together with The system behaves without any major limitations (#5867 is still an issue, so that's only as far as I can tell). |
I'm experiencing the same with
|
For me it happens only during a scrub on and up to date Arch system. Scrub is super slow, 5-10 MB/s, and
|
I haven't had this issue happen in recent weeks. Each time it has happened, though, blowing away all my snapshots has caused it to immediately stop churning the disks. I never noticed any I/O performance degradation even when it was abusing the poor disks. It seemed like it was only doing a ton of work when the pool was idle? I wonder if the issue @dpendolino is describing is somehow orthogonal to this one?
I don't use deduping because the cost of doing so was super high the last time I tried it. And my pool structure is still the same as I had in my original report (two stripes, mirrored). I also changed how I'm handling snapshotting. The zfs-auto-snapshot package in the list above is handling it. I used to have a cron that ran twice monthly (on the 1st and 15th of the month) to create a snapshot. Now I have the much more frequent snapshots managed by zfs-auto-snapshot and haven't run into the issue since. |
@tycho I am using the git version of zfs-auto-snapshot from the AUR. My scrub from yesterday is still chugging at ~10 MB/s =/ I currently have about 48 snapshots of my pool, perhaps I'll try deleting them all as soon as possible.
|
Same issue CentOS Linux release 7.4.1708 (Core) kmod-spl-devel-0.7.3-1.el7_4.x86_64 |
Same issue over here, dedup and lz4 compression is enabled.
|
I've got this as well, however i've got mirrored pool, on a ProxMox. Killing the PID just results in another one starting, i do get slightly quick speeds for a while after the kill. Just rebooted and the box improved with speed back to normal. However i did change swappienes to = 1 |
I have the same issue. Looks like it is getting much worse as soon as Dedup is on or snapshots exists. |
I've moved my drive to ext4 as it was slowly drowning my system :( |
I got same issue with high z_null_int IO on Proxmox 5.1 with kernel 4.13.x and ZFS 0.7.3. I use 2 pools of ZFS RAID1, zfs_arc_max is limited to 1GB, no zfs snapshot, and without deduplication. Is this z_null_int high IO normal for ZFS 0.7.3? Never caught it on iotop when it was on ZFS 0.6.x. Attached files are the ZFS details and screenshot. |
I have the same 99.99% iowait on z_null_int with
i removed all snapshots, it doesn't change anything |
Same issue on Proxmox 5.1, kernel 4.13.4 and ZFS 0.7.3. No snapshots or dedup, 2 pools (RAIDZ1 (with L2ARC and ZIL on SSD) and single drive). |
To help us narrow down what's causing this could someone please provide the following debugging when the issue is occurring. # Enable TXG history (the last 32 TXGs)
echo 32 >/sys/module/zfs/parameters/zfs_txg_history
# Wait 30 seconds then dump the TXG history and TX stats
cat /proc/spl/kstat/zfs/<pool>/txgs
cat /proc/spl/kstat/zfs/dmu_tx # Enable internal debug log
echo 1 >/sys/module/zfs/parameters/zfs_dbgmsg_enable
# Wait 30 seconds then dump the internal log.
cat /proc/spl/kstat/zfs/dbgmsg |
@behlendorf here is mine
|
@behlendorf I should dump the log in single shot |
i can prob spin up a debian (proxmox) box if you need more logs/info |
@AndCycle thank you for the logs. Unfortunately they don't clearly show what's causing the IO. Could you try grabbing a couple stack traces from a
@cooljimy84 thanks for the offer, but I think what would be most helpful would be a reproducer against the stock 0.7.3 release or master. I've been unable to reproduce this behavior locally. |
On Thu, Nov 30, 2017 at 11:10 PM, AndCycle ***@***.***> wrote:
@behlendorf <https://github.com/behlendorf> oops, for a reason my system
doesn't have the stack path, do I messed kernel config?
/proc/PID/stack only shows up if you have CONFIG_STACKTRACE in your kernel (shows up under "Kernel hacking" in menuconfig).
|
@tycho thanks :) @behlendorf here it is
|
The stack traces tend to be non-useful for running tasks. If everyone reporting high CPU in the zio null threads has an l2arc device, I suspect that may be the cause since l2arc writes are dispatched to those threads. A good experiment would be to remove the l2arc device and see if the excessive CPU stops. |
@dweeezil actually it's not high CPU usage, it just hang there on iotop, |
@AndCycle Indeed. That said, l2arc is one of the uses for those taskqs. I'd also suggest that anyone experiencing this issue might gain some information with |
I got same results across different nodes. It seems every pool spawns its own 99.99% IO z_null_int thread in iotop. Could be 99.99% IO indicates ZFS is performing read write mirror operation inside the pool vdevs and iotop is not being able to list the read and write size properly? Somehow I could not use perf in Proxmox 5.1 as it comes with kernel 4.13 and Debian Stretch's linux-perf package is for kernel 4.9.
|
sorry I don't have the knowledge to do debugging as I write python and html most of the time,
https://www.andcycle.idv.tw/~andcycle/tmp/tmp/20171203/perf.data.gz -- |
I can easily reproduce this by running a 100% write load and constraining the ARC size because the "IO" column in iotop is showing a time spent waiting. In this case, most of the null zios are created by Does anyone reporting this problem have constrained ARC or low ARC hit rates? @AndCycle As to using At least in the case of poor cache rates, the null zio threads will definitely show high IO wait times. |
Ah that is why. Nice finding @dweeezil ! On a 64GB Proxmox box, we limited the ARC to 1GB so there will be 3GB left for host OS, and 60GB for guest VMs. After the upgrade to ZFS 0.7.x, the ARC hit rates decreased to around 30%. It used to be like 70-80% on previous ZFS 0.6.x. |
@apollo13 according to the logs you posted the pool looks to be busy handling IO for real applications. In order to get to the bottom of this we're going to want to gather logs, using the
I'm reopening this issue so we can get to the bottom of it. |
@behlendorf http://apolloner.eu/~apollo13/out2.perf is the idle log. The time |
@apollo13 thanks for the speedy reply. Am I correct in assuming ZFS is being used as the root pool on this system? From what I can see in the logs the following process are all actively performing IO.
Does this match up with the top processes reported by One way to get a much better sense of exactly what is being read from disk is to enable the read history. This keeps a rolling log of the last N blocks which needed to be read from disk. It won't include anything which was serviced from the ARC cache.
|
@behlendorf Thank you for the detailed response, I'll answer as good as I can.
That said, the following tests were done without doing anything else; so For the idle case: http://apolloner.eu/~apollo13/proxmox_zfs/idle/ -- From the looks of it, For the load case (with VMs running): http://apolloner.eu/~apollo13/proxmox_zfs/load/ -- basically constantly at 99% I/O with (actual) disk writes between 0 and 15 MB/s -- the disk write from the KVM processes together seems to be less than 1 MB/s (probably even less than 100 KB/s). All in all the system still feels stable. |
I'm experiencing exactly same (or very similar) problem as @apollo13 for past 2 months. Also using Proxmox with patched zfs 0.7.4 (today I asked Proxmox guys to update to full 0.7.6 release). Just please note that this problem renders this server as basically not able to do majority of the backups. My HW: If I have any free time tomorrow I will replicate problem and catch any required data. For now here are just parameters of the pools and some tests that I have done previously: My tunables (I tried lowering ARC to see if it has any effect but what I can see it doesn't effect slowdowns at all):
DATA pool:
System Pool:
sdd & sde are main data SSD (where data are copied from) Command: As you can see System SSD (including swap) are hammered with data (utilization 100%) during data transfers on another drives, which I do not understand...
This is for example log from backup - you can see how speed will start to drop drastically (usually to the point of KB/s)
Usually during transfer I will also get:
EDIT Of course I also have in iotop:
|
When the compressed ARC feature was added in commit d3c2ae1 the method of reference counting in the ARC was modified. As part of this accounting change the arc_buf_add_ref() function was removed entirely. This would have be fine but the arc_buf_add_ref() function served a second undocumented purpose of updating the ARC access information when taking a hold on a dbuf. Without this logic in place a cached dbuf would not migrate its associated arc_buf_hdr_t to the MFU list. This would negatively impact the ARC hit rate, particularly on systems with a small ARC. This change reinstates the missing call to arc_access() from dbuf_hold() by implementing a new arc_buf_access() function. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#6171 Closes openzfs#6852 Closes openzfs#6989
We also had a problem with high IO load (z_null_int). The system's root was on the same zfs pool as data storage for VMs. We got rid of this problem by reinstalling the system on different disks (not using ZFS anymore for root of the system). When we imported the pool in reinstalled system (using the same version of ZFS) the problem was gone and did not appear again - it's over 2 months now. Hope this info will help somehow. |
Kinda wanted to use ZFS on the root, for nice quick roll backs after updates/upgrades. |
@cooljimy84 It was working for over a year before first problem started for me so It's something that was updated approximately 3-5 months ago. @vlcek did you transfer your proxmox config without any problem? I know about Script From Dan Ziemecki but I actually never tested if it transfers everything. |
@jkalousek we are not using proxmox (only debian and xen as hypervisor) so I cannot help you with that. |
@behlendorf well I tried to install perf (from git) but during cloning, host system stopped responding (probably because of this problem) and after 15 minutes of waiting (VM were running fine) I did hard reboot of whole server so I wont be probably able to provide perf log... |
@behlendorf This is zfs_read for me during idle (all VM running - cpu usage ~4%, overall system load ~1)
|
just a simple report back, upgrade to 0.7.6 and have system run for 24hrs, |
a simple report back as well (even if I didn't participate in the conversation before). I still see z_null_int with 99.99% in iotop using |
After upgrade to 0.7.6-1 on Proxmox (Instead of patched 0.7.4) on Friday I can say that I also still see z_null_int in iotop but after two nights full of backups so far I can say that problem with slowdowns and freezing didn't occur (unlike every backup in past 3 months) so I will keep continue testing through this week and report back if I hit any performance problems. |
for the PVE users in this issue: #7235 (comment) contains information about a test build of the PVE kernel with commit e9a7729 / PR#7170 backported |
I ran into this problem also with kernel 4.13.0-37 , ubuntu 16.04 with zfs master using dkms modules. Now in iotop the processes are Biggest difference is z_null_int doesn't even show up in iotop at all. I can switch back and forth easily and will try to upload screenshots from iotop later this evening. Update problem started with 4.11 Kernels. , works fine with 4.10 and below. |
@benbennett thanks for the additional information, that helps. To be clear you were running master with commit e9a7729 and still observe the issue with 4.11 and newer kernels? Could you summerize a minimal environment and pool config needed to reproduce the issue. Is a clean Ubuntu 16.04 install with a 4.11 kernel from the mainline ppa sufficient? |
@behlendorf I am master at cec3a0a , which contains the commit. Wondering if any of the meltdown/spectre patches are affecting it, but disabling them with nopti noretpoline option has no effect. I will have to check if 4.10 has the kpti/spectre_v2 changes and the intel firmware install. |
The issue got solved for me with:
I cannot say whether it was the kernel update or the zfs update :/ |
A further data point from the proxmox forums (https://forum.proxmox.com/threads/z_null_int-with-99-99-io-load-after-5-1-upgrade.38136/page-3#post-207766):
So maybe the kernel got some relevant fixes too? |
On Sun, Apr 29, 2018 at 11:57:34PM -0700, Florian Apolloner wrote:
A further data point from the proxmox forums (https://forum.proxmox.com/threads/z_null_int-with-99-99-io-load-after-5-1-upgrade.38136/page-3#post-207766):
```
I just want to add that I needed upgrade to 4.13.16-2-pve for the 0.7.7 fix to work.
Pure ZOL upgrade didn't work until after kernel upgrade to the latest.
Even relatively new 4.13.13-6-pve kernel didn't work.
```
So maybe the kernel got some relevant fixes too?
in PVE the kernel contains the SPL and ZFS modules, so yes, of course
the kernel package version is relevant too ;)
|
Oh, with all the zfs packages on PVE I got a little bit confused; my mistake. |
Why this issue is opened ? Is 0.7.9 affected in any way? |
The primary cause of this issue was determined to be #7289 which would primarily manifest itself on systems with a low default kernel HZ values. The issue was resolved in 0.7.7. I'm closing this bug as resolved. |
System information
Describe the problem you're observing
I noticed that my hard drives are constantly making noise despite nothing actively issuing I/Os. I ran
iotop
first and found that z_null_int is very busy doing something:iostat agrees that the disks are being kept very busy:
Is there some way to discover what z_null_int is doing? And how to stop it?
The pool isn't doing any scrubbing or resilvering:
Describe how to reproduce the problem
Unknown. I don't know what caused it.
Include any warning/errors/backtraces from the system logs
Nothing relevant.
The text was updated successfully, but these errors were encountered: