Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import a pool with missing top-level vdevs #852

Closed
dechamps opened this issue Jul 27, 2012 · 11 comments
Closed

Import a pool with missing top-level vdevs #852

dechamps opened this issue Jul 27, 2012 · 11 comments
Labels
Type: Feature Feature request or new feature

Comments

@dechamps
Copy link
Contributor

Currently, when trying to import a pool with missing vdevs, I got this:

  pool: disasterpool
    id: 10403888125469171240
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
config:

    disasterpool                      UNAVAIL  missing device
      /root/disaster/zpool/active/d1  ONLINE
      /root/disaster/zpool/active/d2  ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

That's a shame considering ZFS is technically capable of mouting a pool with missing top-level vdevs. In fact, I managed to make this pool import again (albeit in read-only mode) by adding empty disks and then forging a ZFS label for them with nothing more than an hexadecimal editor (ah... good times) and some printk debugging to match the GUIDs in the forged labels with the contents of the MOS.

Of course there is data loss, but data from the remaining devices is still salvageable and thanks to the metadata ditto blocks, the dataset/filesystem structure is still intact. The pool mostly works in this state, except for some glitches like being unable to umount some datasets.

What I'm wondering is: ZFS seems perfectly capable of salvaging a pool with missing top-level vdevs, yet it doesn't allow me to and I have to resort to some cumbersome fiddling with labels to make ZFS believe the disks are still there. Why isn't there a feature to do this automatically? Like, for example, zpool import -M which would mean "I know I have missing top-level vdevs, but just do what you can and try to salvage the data I have left"?

@behlendorf
Copy link
Contributor

This isn't an unreasonable request.

I've actually noticed a pattern of ZFS not supporting these type of recovery operations particularly gracefully. And from talking with some of the original developers and looking at the code it's pretty clear an all or nothing philosophy was adopted. Two good additional examples of this mind set are:

  • The lack of an interface to recover what you can of a damaged file which has no redundancy. One has to resort to zdb to manual extract the damaged blocks.
  • There are still certain data structure which if they are damaged can prevent a pool from cleanly importing. Even if ZFS does have redundant copies of them.

Anyway, I hear what your saying and longer term we certainly may work on improving the disaster recovery code so you can at least get something back from the pool without heroic efforts.

@rockc
Copy link

rockc commented Dec 14, 2012

I encountered the same problem when trying to import a zpool with two mirrored hard drives. Additionaly there is to say that both hard drives are still present, but data, on one of the hard drives, was overwritten, whilst the zpool was
a) offline
and
b) not properly exported.

Is there any way to mount the zpool mirror in read-only mode so data can be extracted even if only one hard drive with the original data is left?
Parts of the system where used in production and the last backup is from 3weeks ago.

So I'd like to ask wether one of you zfsonlinux pro's could write a small HowTo on "editing a ZFS label" or on how to make the pool import in such a condition?

@accedent
Copy link

I encountered the same problem when trying to import a zpool. Now I have to get labels right on one of the drives and force import pool with -f. Any suggestions which is a best way to change labels?

@behlendorf
Copy link
Contributor

To be clear, this issue only impacts top level vdevs. If you set your pool up as a mirror and then destroy only one of the two drives ZFS will be able to import the pool without issue. It's only the case where you create a striped pool and then destroy one drive where ZFS will refuse import it. In the second case you've already lost half of your data, but it still would be nice to be able to import the pool and save whatever is left.

@accedent
Copy link

Yes, I agree. In my case I have a sriped pool and ZFS refuse to import it with "missing device" error. What I am trying to do is change or push labels to hard drive that fall out of pool.

@jaggel
Copy link

jaggel commented Sep 24, 2013

Can someone please elaborate how to change / write the zfs label on the disk?
I have the same problem and I really could use some help in order to restore at least some of my data...

@craigyk
Copy link

craigyk commented Nov 5, 2013

I'll second @jaggel's request. I don't have a damaged pool to fix, but this problem has captured my curiosity. I've managed to manually edit the labels for a forged missing device (I'm using filesystem files for testing), but now the pool is just saying the devices have corrupted data instead of just being missing.

@dechamps
Copy link
Contributor Author

FYI, I received at least half a dozen e-mails over the years asking about this. Unfortunately I was not able to remember the exact procedure so I wasn't of great help to them, but it does indicate there is substantial demand for such a "import at all costs" feature.

@mailinglists35
Copy link

mailinglists35 commented Jun 19, 2016

@behlendorf commented on Jul 27, 2012
[...]
Anyway, I hear what your saying and longer term we certainly may work on improving the disaster recovery code so you can at least get something back from the pool without heroic efforts

  1. is zpool import complains about missing log device, suggests -m, then imports with the missing device anyways #4168 depending on this?
  2. what long-term means now? since this statement was made 4 years ago (wow, that seems quite long ago, thanks for all hard development!), how long could a newbie expect what "long term" means from now on?

@rlaager
Copy link
Member

rlaager commented Oct 12, 2016

Pavel Zakharov is working this as part of his "SPA import and pool recovery" project. The slides are not available anywhere that I'm aware of. If they become available, I'd assume they'd be linked from here:
http://open-zfs.org/wiki/OpenZFS_Developer_Summit_2016

@behlendorf
Copy link
Contributor

Pavel's work was integrated for 0.8 in commit 6cb8e53 which adds this feature. Note that since importing a pool with a missing top-level vdev almost certainly implies data loss, the pool must be imported read-only.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…tore (openzfs#852)

= Description

This patch enables evacuating removing device data of block vdevs
to the object store vdev if there exists one and we can allocate
from it. This functionality allows us to migrate block-based pools
to object-store ones (as found on cloud engines/DOSE).

= Implementation Details and Changes

== Removal

If there is an object-store vdev in the pool, data evacuated from
block-based vdevs always end up in the object store vdev. When this
happens the open-context removal thread will perform the read ZIO
from open-context (as with normal device removal) but will keep
that segment in memory and perform the write to the object later
in syncing context (unlike normal removal), as we can only allocate
from the object-store in syncing-context. I currently use the
in-flight I/O limit of normal device removal to control the amount
of memory we use in-flight for those mappings. I may or may not
change its value or introduce another knob in a follow-on commit
(see Next Steps) after I spend more time analyzing its performance
and memory overhead.

== Multiblock DVAs

For removal indirect mappings whose destination vdev is the object
store, we split the destination mapping to 512 byte blocks with
monotonically increasing BlockIds. This ensures that we can properly
perform sub-segment reads and frees in the destination DVA. This
part of the design will further be optimized in the future as it
currently introduces a lot of artificial split blocks when performing
reads from those mappings (see Next Steps).

== Removing Disks and Marking noalloc

If there is an object-store vdev in the pool we allow the removal
of all block-based vdevs. We disallow marking object store vdevs
as non-allocating.

== Minor Side-Changes

* s/DVA_GET_OBJECTID/DVA_GET_BLOCKID for a more precise definition.
* I abstracted out some logic from `spa_vdev_copy_segment()` to be
  reused from `spa_vdev_copy_segment_object_store()`.
* `vdev_copy_arg_t` is now part of `spa_vdev_removal_t` so that we
  can access it from syncing context and update the amount of bytes
  inflight to be written to the object store because of removal.
* `metaslab_group_alloc_object()` can now allocate more than 1 blocks
  (n blocks parameter) and that allows us to allocate mutliple block
  IDs at once for our multiblock DVAs.

= Testing

Automated tests have been added performing complete migrations from
block-based pool to object-based. Some of those tests are ran with
both 512 and 4K block sizes to simulate AWS/ESX and Azure.

I've also ran a few manual migrations in an Azure VM with Azure Blob
instead of S3 to test prospective customer setups on Azure.

= Next Steps

I'll first work on the appstack bits to unblock QA for testing this.
Then I'll come back and work on 3 things:
(1) Optimize the writing of multiblock DVAs during device removal
(2) Create a tunable for selecting whether we want the zettacache
    to ingest all the data passed to the agent from a device removal.
(3) Fine tune the memory limit for device removal to the object store.

Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

8 participants