-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import a pool with missing top-level vdevs #852
Comments
This isn't an unreasonable request. I've actually noticed a pattern of ZFS not supporting these type of recovery operations particularly gracefully. And from talking with some of the original developers and looking at the code it's pretty clear an all or nothing philosophy was adopted. Two good additional examples of this mind set are:
Anyway, I hear what your saying and longer term we certainly may work on improving the disaster recovery code so you can at least get something back from the pool without heroic efforts. |
I encountered the same problem when trying to import a zpool with two mirrored hard drives. Additionaly there is to say that both hard drives are still present, but data, on one of the hard drives, was overwritten, whilst the zpool was Is there any way to mount the zpool mirror in read-only mode so data can be extracted even if only one hard drive with the original data is left? So I'd like to ask wether one of you zfsonlinux pro's could write a small HowTo on "editing a ZFS label" or on how to make the pool import in such a condition? |
I encountered the same problem when trying to import a zpool. Now I have to get labels right on one of the drives and force import pool with -f. Any suggestions which is a best way to change labels? |
To be clear, this issue only impacts top level vdevs. If you set your pool up as a mirror and then destroy only one of the two drives ZFS will be able to import the pool without issue. It's only the case where you create a striped pool and then destroy one drive where ZFS will refuse import it. In the second case you've already lost half of your data, but it still would be nice to be able to import the pool and save whatever is left. |
Yes, I agree. In my case I have a sriped pool and ZFS refuse to import it with "missing device" error. What I am trying to do is change or push labels to hard drive that fall out of pool. |
Can someone please elaborate how to change / write the zfs label on the disk? |
I'll second @jaggel's request. I don't have a damaged pool to fix, but this problem has captured my curiosity. I've managed to manually edit the labels for a forged missing device (I'm using filesystem files for testing), but now the pool is just saying the devices have corrupted data instead of just being missing. |
FYI, I received at least half a dozen e-mails over the years asking about this. Unfortunately I was not able to remember the exact procedure so I wasn't of great help to them, but it does indicate there is substantial demand for such a "import at all costs" feature. |
|
Pavel Zakharov is working this as part of his "SPA import and pool recovery" project. The slides are not available anywhere that I'm aware of. If they become available, I'd assume they'd be linked from here: |
Pavel's work was integrated for 0.8 in commit 6cb8e53 which adds this feature. Note that since importing a pool with a missing top-level vdev almost certainly implies data loss, the pool must be imported read-only. |
…tore (openzfs#852) = Description This patch enables evacuating removing device data of block vdevs to the object store vdev if there exists one and we can allocate from it. This functionality allows us to migrate block-based pools to object-store ones (as found on cloud engines/DOSE). = Implementation Details and Changes == Removal If there is an object-store vdev in the pool, data evacuated from block-based vdevs always end up in the object store vdev. When this happens the open-context removal thread will perform the read ZIO from open-context (as with normal device removal) but will keep that segment in memory and perform the write to the object later in syncing context (unlike normal removal), as we can only allocate from the object-store in syncing-context. I currently use the in-flight I/O limit of normal device removal to control the amount of memory we use in-flight for those mappings. I may or may not change its value or introduce another knob in a follow-on commit (see Next Steps) after I spend more time analyzing its performance and memory overhead. == Multiblock DVAs For removal indirect mappings whose destination vdev is the object store, we split the destination mapping to 512 byte blocks with monotonically increasing BlockIds. This ensures that we can properly perform sub-segment reads and frees in the destination DVA. This part of the design will further be optimized in the future as it currently introduces a lot of artificial split blocks when performing reads from those mappings (see Next Steps). == Removing Disks and Marking noalloc If there is an object-store vdev in the pool we allow the removal of all block-based vdevs. We disallow marking object store vdevs as non-allocating. == Minor Side-Changes * s/DVA_GET_OBJECTID/DVA_GET_BLOCKID for a more precise definition. * I abstracted out some logic from `spa_vdev_copy_segment()` to be reused from `spa_vdev_copy_segment_object_store()`. * `vdev_copy_arg_t` is now part of `spa_vdev_removal_t` so that we can access it from syncing context and update the amount of bytes inflight to be written to the object store because of removal. * `metaslab_group_alloc_object()` can now allocate more than 1 blocks (n blocks parameter) and that allows us to allocate mutliple block IDs at once for our multiblock DVAs. = Testing Automated tests have been added performing complete migrations from block-based pool to object-based. Some of those tests are ran with both 512 and 4K block sizes to simulate AWS/ESX and Azure. I've also ran a few manual migrations in an Azure VM with Azure Blob instead of S3 to test prospective customer setups on Azure. = Next Steps I'll first work on the appstack bits to unblock QA for testing this. Then I'll come back and work on 3 things: (1) Optimize the writing of multiblock DVAs during device removal (2) Create a tunable for selecting whether we want the zettacache to ingest all the data passed to the agent from a device removal. (3) Fine tune the memory limit for device removal to the object store. Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Currently, when trying to import a pool with missing vdevs, I got this:
That's a shame considering ZFS is technically capable of mouting a pool with missing top-level vdevs. In fact, I managed to make this pool import again (albeit in read-only mode) by adding empty disks and then forging a ZFS label for them with nothing more than an hexadecimal editor (ah... good times) and some printk debugging to match the GUIDs in the forged labels with the contents of the MOS.
Of course there is data loss, but data from the remaining devices is still salvageable and thanks to the metadata ditto blocks, the dataset/filesystem structure is still intact. The pool mostly works in this state, except for some glitches like being unable to umount some datasets.
What I'm wondering is: ZFS seems perfectly capable of salvaging a pool with missing top-level vdevs, yet it doesn't allow me to and I have to resort to some cumbersome fiddling with labels to make ZFS believe the disks are still there. Why isn't there a feature to do this automatically? Like, for example,
zpool import -M
which would mean "I know I have missing top-level vdevs, but just do what you can and try to salvage the data I have left"?The text was updated successfully, but these errors were encountered: