zpool rw import fails after attempting to destroy old corrupt fs #4030

bpkroth · 2015-11-21T08:59:22Z

Hi, lots of backstory first ...

A long time ago, we were setting up a home grown rsync backup on top of zfs system on linux using posixacls via xattrs=sa and daily snapshots and ran into a bug (#2863) that caused some filesystem corruption.

While we were working on the fix for that and testing patches we'd stashed the bad fs away on the side (eg: tank/archive/somethingsomething - I honestly don't remember what it was called now, but it unfortunately wasn't labeled as "don't look at me sideways - there be dragons here").

The patches (against 0.6.3-1~wheezy) fixed our problem at the time and we ran more or less happily in our main layout (tank/rsyncbackup/...) until early this week when we attempted to remove that old corrupted filesystem.

When that happened, we encountered a stack dump:

[8632513.332586] VERIFY(size != 0) failed
[8632513.332620] SPLError: 13576:0:(space_map.c:111:space_map_add()) SPL PANIC
[8632513.337836] SPL: Showing stack for process 13576
[8632513.337864] CPU: 1 PID: 13576 Comm: z_fr_iss_0/2 Tainted: P           O  3.16-0.bpo.3-amd64 #1 Debian 3.16.5-1~bpo70+1
[8632513.337914] Hardware name: Supermicro X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 3.0a 02/06/2014
[8632513.337963]  0000000000000000 0000000000000000 ffffffff8154144f 0000000000000000
[8632513.338015]  ffffffffa048368c ffffffffa0497ced 0000000000000026 ffff882029410000
[8632513.338067]  0000000000000001 ffff880ff8d0cd00 ffffffffa2a07de7 ffff88203f39ca08
[8632513.338120] Call Trace:
[8632513.338149]  [<ffffffff8154144f>] ? dump_stack+0x41/0x51
[8632513.338197]  [<ffffffffa048368c>] ? spl_debug_bug+0x7c/0xe0 [spl]
[8632513.338246]  [<ffffffffa2a07de7>] ? space_map_add+0x347/0x370 [zfs]
[8632513.338285]  [<ffffffffa29edb82>] ? metaslab_free_dva+0x112/0x1e0 [zfs]
[8632513.338325]  [<ffffffffa29efe3c>] ? metaslab_free+0x8c/0xc0 [zfs]
[8632513.338364]  [<ffffffffa2a4e06c>] ? zio_dva_free+0x1c/0x30 [zfs]
[8632513.338400]  [<ffffffffa2a4f07c>] ? zio_execute+0x9c/0x130 [zfs]
[8632513.338433]  [<ffffffffa048c6b6>] ? taskq_thread+0x236/0x4c0 [spl]
[8632513.338469]  [<ffffffff8109f330>] ? try_to_wake_up+0x310/0x310
[8632513.338500]  [<ffffffffa048c480>] ? task_done+0x150/0x150 [spl]
[8632513.338533]  [<ffffffff8108f491>] ? kthread+0xc1/0xe0
[8632513.338561]  [<ffffffff8108f3d0>] ? flush_kthread_worker+0xb0/0xb0
[8632513.338592]  [<ffffffff8154787c>] ? ret_from_fork+0x7c/0xb0
[8632513.338621]  [<ffffffff8108f3d0>] ? flush_kthread_worker+0xb0/0xb0

After a reboot, now the system won't do the standard zpool import/mount in read-write mode (it reports the same error above) and instead we have to try and manually import it in read-only mode.

However, in read-only mode, it fails to mount all the fs in the fs hierarchy.

When we first started the system (on linux, last November), things were mounted in this sort of organization:

tank
tank/rsyncbackup
tank/rsyncbackup/byhost
tank/rsyncbackup/byhost/$hostname
...
tank/rsyncbackup/byservice
tank/rsyncbackup/byservice/$servicename
...

Later (last February), we added support for other groups in our college, by zfs renaming things to be under an organization fs:

tank
tank/rsyncbackup
tank/rsyncbackup/cae
tank/rsyncbackup/cae/byhost
tank/rsyncbackup/cae/byhost/$hostname
...
tank/rsyncbackup/cae/byservice
tank/rsyncbackup/cae/byservice/$servicename
...
tank/rsyncbackup/ece/byhost
tank/rsyncbackup/ece/byhost/$hostname
...

Now, after the latest panic and zfs import read-only, the system fails to mount everything because it doesn't see the organization level hierarchies. It's almost as if the "tank/rsyncbackup" directory structure is stuck with one from the distant past.

However, "zfs list" still shows all of the expected filesystems and their snapshots and mountpoints.

We tried to just do a simple "aptitude safe-upgrade" to get the latest zfs packages (v0.6.5.2-2-wheezy), but it had the same results.

At this point, we have another set of disks we haven't put into action yet, and would like to basically start rebuilding.

Ideally, we'd be able to

mount the existing fs read-write so that we can continue to take backups while we
create a new zpool on the new disks and zfs send/recv all of our other data over to them (for the next month or however long it takes).

I think at this point we think there are problems with the original pool and want to start with a clean pool, but want to try and avoid loosing our historical data, but also want to try and become at least semi operational while we wait on step (2) (~128T could take a while to transfer).

While reviewing the git commit logs between our old and current version (zfs-0.6.3..zfs-0.6.5.2), I ran across the zfs_recover module parameter (specifically commit ids 53b1d97, which I think was maybe a more complete fix to our original problem, and 7d2868d, due to the bad DVA message, had caught my attention).

However, the zfs-module-paramaters man page comments for the zfs_recover option says:

This should only be used as a last resort, as it typically results in leaked space, or worse.

Right now I'm just wondering if you guys think it's safe to do that to attempt to recover so that we can get step (1) of our plan going or if you have other ideas (eg: skip auto mounting and try and reconstruct the mount hierarchy to avoid that top level old appearing fs, or ...)?

Also, this problem appears to be with the free space map. That's a zpool item, not an fs item, correct? So, a zfs send/recv to a new pool shouldn't bring that issue with it, correct?

Thanks,
Brian

The text was updated successfully, but these errors were encountered:

GregorKopka · 2015-11-23T21:58:33Z

With a damanged space map any write can nuke your pool by overwriting seemingly free space, which isn't free. So only do read-write experiments in case you have full and tested block-level backups of the old pool.

Since you can import read-only and zfs list the filesystems/snapshots: can you still zfs send them (for example to /dev/null, or zstreamdump) ? If so:
Build a new pool, start sending over the old pool into a temporary 'old' hirarchy on the new pool, while continuing to take backups into a 'new' one. When transfer from the old pool is completed you would have to recreate the never snapshot history (by rsync'ing from the 'new' snapshots into the 'old' hirarchy, recreating the snapshot chain there. When that is done you can dump the 'new' one and put the 'old' (which should now be historically correct from a snapshot perspective, and up-to-date with 'new') into production.

bpkroth · 2015-11-24T04:18:51Z

Thanks. That's pretty close to what we decided to do in the end.
Skipped the full send/recv since it's mostly just backup data and we
didn't need all of the incremental snapshots right away anyways
(restores should be infrequent, and we can just grab them on demand if
necessary and we have enough space somewhere to zfs recv it), and in a
couple of weeks/months we can just wipe the old corrupt disks.

I think before we reclaim the old disks I may check to see if the
zfs_recover=1 would have worked, just for curiosity sake.

Only trouble we're running into now is that the new pool is smaller than
the old pool and we have a zvol we would like to transfer over as well,
but capacity wise it gets pretty tight. We could probably make it a
sparse zvol and try and use a sparse sort of dd to get it over, but I'm
not sure how much free space in it to know yet whether or not that will
work.

Anyways, thanks for your response.

Cheers,
Brian

Gregor Kopka notifications@github.com 2015-11-23 13:58:

With a damanged space map any write can nuke your pool by overwriting seemingly free space, which isn't free. So only do read-write experiments in case you have full and tested block-level backups of the old pool.

Since you can import read-only and zfs list the filesystems/snapshots: can you still zfs send them (for example to /dev/null, or zstreamdump) ? If so:
Build a new pool, start sending over the old pool into a temporary 'old' hirarchy on the new pool, while continuing to take backups into a 'new' one. When transfer from the old pool is completed you would have to recreate the never snapshot history (by rsync'ing from the 'new' snapshots into the 'old' hirarchy, recreating the snapshot chain there. When that is done you can dump the 'new' one and put the 'old' (which should now be historically correct from a snapshot perspective, and up-to-date with 'new') into production.

Reply to this email directly or view it on GitHub:
#4030 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zpool rw import fails after attempting to destroy old corrupt fs #4030

zpool rw import fails after attempting to destroy old corrupt fs #4030

bpkroth commented Nov 21, 2015

GregorKopka commented Nov 23, 2015

bpkroth commented Nov 24, 2015

zpool rw import fails after attempting to destroy old corrupt fs #4030

zpool rw import fails after attempting to destroy old corrupt fs #4030

Comments

bpkroth commented Nov 21, 2015

GregorKopka commented Nov 23, 2015

bpkroth commented Nov 24, 2015