-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs randomly giving errors #2863
Comments
@uwkhahn This sure feels like another corrupted SA to me. If you could find the inode number of the |
@dweeezil Here's what I got:
|
One other bit of info, before the crash.. I see this set of output from the rsync.. I'm assuming it's related:
|
@uwkhahn Sorry to say but you've got the corrupted SA (actually, more correctly, a corrupted dnode) problem. I've been trying to track this down for a better part of a month or two but have not been able to reproduce it. I've been working on an overhaul of the way these SAs are created but it's proving to be quite a bit more involved than I initially expected due to the differences in the ways that posix acls and selinux acls interface with the kernel. Although I'm 99% sure I know exactly what the form of the corruption is, you could confirm it by providing the output of The problem is that in some cases, the SA layout in the dnode is not updated when kicking an SA xattr to the spill block. This problem is likely caused by a race condition and/or made more probable under conditions of memory pressure. Was there a lot else going on while this "etc" directory was created? Finally, I'd like to ask if you know whether this directory has been corrupted since its creation or whether some other operation may have been done to it after-the-fact. |
Here's the first bit you asked for.
Here's the custom zdb output:
Be aware that I'm anonymizing the paths that I'm posting here. so the pathname is slightly different in length and slightly different in content, but unless you tell me otherwise I'm going to assume that's not an issue. At the times that these issues occur, it appears that zfs is using most of the system memory (there's 128GB RAM). I'm going to move the affected directory and then try to do the rsync again to see if the path is created corrupted again. I'll let you know what I see. |
Ah.. one further addition.. There is a zvol which is being used over iscsi that is active at the same time. |
@uwkhahn Thanks, the problem is exactly the same as others have been having. The SA layout is wrong. It's 4 but should be 2. Could you please check the source file system and see whether the parent directory of "etc" has a default ACL? I'm looking for any clues which might help me reproduce this problem. I'm going to look into how hard it would be to whip up a gross hack to detect and work around this problem after the fact. |
Here's a "getfacl -n ." on the parent directory.
So if I destroy this filesystem and re-create with xattrs=on will that be at risk of hitting this same bug? Also with 128GB I've set the zfs_arc_max to 107374182400 (100GB). I note that I see significant usage beyond 100GB (which reduces when the zvol is not being written to actively) so I'm contemplating reducing it further. You were saying that you believed this might be related to memory pressure, so I wonder if you think this might help, in our case. BTW Thanks for helping figure this out. |
@uwkhahn Thinking about the ARC oversubscription for a moment, I just re-read this thread and realized you're running a post 3.12 kernel and stock 0.6.3 packages which means you've not got the shrinker fixes. You might want to try cherry-picking ed6e9cc and openzfs/spl@802a4a2 (assuming you're using DKMS they should patch cleanly into 0.6.3) and see whether the ARC stays under control better. |
Thanks, I've not done a patched dkms package before (have done patched kernel packages directly, but not dkms).. I'll go figure that out (I'd take pointers to any good instructions). I don't suppose ZOL is close to any new release coming out the door, is it? :) |
Ah... I see that there is no instructions on how to build the debian dkms packages, but yet they exist for download. Any clue on how they are generated currently? |
You could just patch the sources in /usr/src and then |
hmm.. the patch for spl didn't apply cleanly... |
Oh well, I thought they'd apply cleanly. They're pretty small. You can definitely try it with just the zfs change. |
Last night, I removed the parent directory of the one with the bad SA, rsync ran again and indeed caused the I/O errors followed by the same crash. I tried rsyncing data out of there, so I was only reading.. I also got the immediate I/O error and then later the same kernel crash. For now I've kept with the same version of everything and moved the filesystems with the xattrs=sa setting to an archived location. We're recompiling a debian dkms package using the developer build instructions, but not that the current head doesn't apply the debian patches cleanly. (apparently the debian daily build is from a month ago) I've started rsyncing the old data to new filesystems with xattrs=on instead of sa. It has not been crashing, with the exception of when I rsync from the filesystem I've been talking with you about. We'll be trying to use newer versions soon, but I wanted to get our backups back on track... the filesystems are just moved so we can go look at them later. |
@uwkhahn So you have saved the corrupted filesystem? Good. I'm about halfway done with a patch that will implement a simple heuristic to detect this and work around the problem. I've resisted doing this over the past few weeks but figured that it'll get my eyes in front of the SA code yet again and maybe I'll get an idea as to what's actually causing this corruption. And it might help others recover their files more easily. |
@uwkhahn Could you please run While working on the hack alluded to in my previous post, I realized that your corrupted dnode actually contains an empty, but corrupted packed nvlist dxattr which has pointed me in a slightly different direction. That said, I'm fairly certain the work in #2809 will eliminate this whole issue. |
I moved the filesystem, so the command is slightly different, but here's the output:
|
@uwkhahn Thanks. Except for the ZAP remnants in the slack area, it looks to be OK to me. I'll run a standalone decoder program on it later today to verify. |
Hey, just for reference I followed the process for another dir that is having the Input output error.
|
Big breakthrough this weekend, I've got a 100% reliable reproducer and am hot on the trail of the problem. More information later. I think it will explain all currently reported instances of SA corruption. |
Update (now that I'm not on a mobile device): The problem is that a transition from a dnode with a spill block to a dnode without a spill block doesn't work properly if the the dnode isn't synced in-between the transition. The cause seems to be that the dbuf corresponding to the spill block (db_blkid of DMU_SPILL_BLKID) is still laying around to be synced even after the spill block has been removed at higher levels. The effect is that a perfectly valid dnode is written with the exception that it has an unnecessary spill block attached to it. This, in itself, doesn't cause much of a problem (but it is detected by my extra zdb debugging) but when subsequent changes are made to the dnode, such as re-extending an xattr into the spill area, fatal corruption can occur. I'm hoping to have a better description and solution within the next day or so. As best as I can tell at the moment, this problem is not specific to SA xattrs, but is exacerbated by their use because without them, the spill code within ZFS doesn't get a whole lot of use. |
Very interesting. It sounds like the spill block dbuf isn't getting properly undirtied in this case and is being left around. It wouldn't shock me if all the ZFS implementations suffer from this. We make much more use of the SA code and are more likely to notice it. |
The I think I'm getting pretty close to the root of the problem on my test system on which I'm running the reproducers. |
Further tracing shows that the dbufs for both the dnode and the spill are being dirtied properly. It looks like I went on the wrong track here and the problem is likely to be in the main SA code but it is definitely dependent on the flushing of earlier data. |
Ugh, you were right @behlendorf (as was my initial hunch; I was confusing dirtied with undirtied). Patch to be posted shortly. |
… build more recent git versions of ZFS on Linux. See Also: openzfs/zfs#2863
Thanks for all the work.. will this show up as a daily build at some point? I took some of bpkroth's work to try and get a debian package with the latest code, and it hasn't gone well. (zfs 7b2d78a and spl 917fef2 ). I then tried turnning on xattrs=sa on a new filesystem and rsyncing old data over... I crashed pretty soon after the rsync started (rsync -aviPHAX) and now I can't even traverse the directory structure without it eventually crashing... It is a very strange crash... Initially it will tell me that the program traversing the filesystem: (e.g. find,ls,rm) is using 100% of the cpu then it eventually freezes the machine. zpool iostat 2 doesn't really show anything going on. Hmm |
Hi, I've been working with Ken on this issue as well. Here's some other details we have to offer: We were testing the git checkout build ids Ken mentioned, first with a local build, then again with the packages in the wheezy-daily repository (details below). The testing procedure goes something like this:
When doing that, we'll end up with a crash in a couple of different places, as noted above. In the case of the zfs destory, we just got this:
For the crash while walking the filesystem (either via find or rsync) we were basically getting "BUG: soft lockup" and/or "INFO: rcu_sched detected stalls on CPUs/tasks" and the rsync or find process would appear to be spinning at 100% in top (though I'd have expected it to be in D state or something like that and spinning in kernel land instead). Anyways, those netconsole dumps are pretty long since they repeat until the panic occurs and dump for every CPU on the box, so I'll only post a snippet here:
Something else interesting to note, that's not necessarily related to the zfs xattrs=sa stuff: Here are the (even more) broken package and kernel version details:
Anyways, we're going to downgrade back to the "stable" 0.6.3 release in your wheezy (non-daily) repo and then attempt to apply the patches you suggested (for the shrinker fix and xattr=sa fix) separately directly to the local dkms source to see how that responds. Let us know if you need any more details. Thanks, |
I was not able to cleanly apply the patch that you provided for the dev source (4254acb) to the current stable source. Having a look at it I came up with the following.. I was curious if it looked correct to you
|
If a spill block's dbuf hasn't yet been written when a spill block is freed, the unwritten version will still be written. This patch handles the case in which a spill block's dbuf is freed and undirties it to prevent it from being written. The most common case in which this could happen is when xattr=sa is being used and a long xattr is immediately replaced by a short xattr as in: setfattr -n user.test -v very_very_very..._long_value <file> setfattr -n user.test -v short_value <file> The first value must be sufficiently long that a spill block is generated and the second value must be short enough to not require a spill block. In practice, this would typically happen due to internal xattr operations as a result of setting acltype=posixacl. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#2663 Closes openzfs#2700 Closes openzfs#2701 Closes openzfs#2717 Closes openzfs#2863 Closes openzfs#2884 Conflicts: module/zfs/dbuf.c
@uwkhahn Yes, that patch looks good. @bpkroth I'd be very interested to hear if applying this patch resolves the issue for you. As for the .zfs/snapshot issue I can explain what's going on there. A lesser known feature of ZFS is that you can create/destroy/rename snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory. Because you must have snapdir=visible set rsync traversed in to this directory and then created it at the destination with mkdir which created a snapshot. That snapshot of course doesn't match what was at the source but that's why it happened. |
Brian Behlendorf notifications@github.com 2014-12-01 13:43:
We're working on testing it in a VM now. Having a little bit of trouble reproducing the original issue without the patch so far, but with the patch it doesn't appear to fail catastrophically, so that's good at least. That said, his patch is just the one for fixing the dirty spill block corruption. It seemed like the rest of the issues we were experiencing with the zfs 7b2d78a and spl 917fef2 builds were related to other code updates that were brought along with them.
Huh, that's kinda cool/weird. I'd have expected .zfs/snapshot and all of it's children to be read-only. ... I was about to say that we didn't need snapdir=visible anyways since the point was to allow users to browse backups for themselves via NFS and that that was broken, but it looks like that's about to be fixed too: Thanks for the info. Cheers, |
FYI, we've been running with Ken's patch in [1] and all volumes resynced fresh with xattr=sa now for about a week. Things seem stable and there's no zfs_iput_taskq task spinning. Thanks, [1] #2863 (comment) |
If a spill block's dbuf hasn't yet been written when a spill block is freed, the unwritten version will still be written. This patch handles the case in which a spill block's dbuf is freed and undirties it to prevent it from being written. The most common case in which this could happen is when xattr=sa is being used and a long xattr is immediately replaced by a short xattr as in: setfattr -n user.test -v very_very_very..._long_value <file> setfattr -n user.test -v short_value <file> The first value must be sufficiently long that a spill block is generated and the second value must be short enough to not require a spill block. In practice, this would typically happen due to internal xattr operations as a result of setting acltype=posixacl. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2663 Closes #2700 Closes #2701 Closes #2717 Closes #2863 Closes #2884
We use ZFS in a home grown backup solution that previously worked on zfs on solaris. It is a series of scripts doing periodic rsync and ZFS snapshot creation.
Now in moving it to zfs on Debian (Wheezy) Linux I'm seeing a number of errors where a directory created by rsync gives me an error upon trying to cd to it. For example:
When I try to take a closer look with an strace ls etc, I see the following:
The rsync we use looks like:
zfs properties on the filesystem:
Debian relevant zfs packages:
uname info:
Not long after seeing this problem, I will see the kernel crash:
The text was updated successfully, but these errors were encountered: