-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs receive fails receiving a deduplicated stream "invalid backup stream" #2210
Comments
There is an old issue (https://www.illumos.org/issues/755), describing an "invalid backup stream" failure, but this should be already solved and included in ZOL sources, isn´t it? |
@pyavdr We did pick up the fix for the issue you references a long time ago. We'll need to dig in to this and see what the cause is. |
I needed to rearrange my partitions, so I did a ZFS send to copy off my ZFS data, repartitioned, and then tried to ZFS recv that data back. Unfortunately I seem to have independently found this bug. Now that the original ZFS data has been overwritten is my only choice restoring data from backups, or is there some sort of workaround? I tried using zfs-fuse but it said "cannot receive: stream has unsupported feature". Would maybe compiling ZFS for myself make the receive work for me? |
It maybe possible to recv with another Zfs system. It isnt clear wether the bug happens on the send or the recv side. If it happens on recv side, your chances are better for a good recv with another system which supports deduplicated streams. If you have a valid backup try again without dedup the stream. @behlendorf |
If it helps at all, the problem appears to be in ZFS send. I setup a VM with nas4free (based on FreeBSD) and tried the ZFS receive from there. It failed with the same error at about the same point (~17 GB in). I will begin restoring from my bacula backups now. I suggest that a warning be printed when a user tries to use the ZFS send -D option. I have decent backups but the next person might not be so lucky. Also I would like to help if I can. I will try this weekend to create a smaller test case that doesn't contain all my personal data. I am not a programmer but might at least be able to help test potential fixes. |
I am experiencing the same problem on the official 0.6.3 release for Debian and as well on the nightly build 0.6.3-10 The issue can be simply reproduced (suppose backup_pool is an empty pool and the copied file is larger than pool block size; I am using a small deb archive here): cp python-gitzilla_2.0-1_all.deb /backup_pool/p.deb The output from the last command is as follows: receiving full stream of backup_pool@snap1 into backup_pool@snap1 The problem is most probably on the receiving side because the same error is reported when trying to receive an older stream created by a 0.6.1 (?) version which I was able to successfully receive earlier. |
For those stacked like me with no help from the ZFS for Linux team: If you have the deduped full stream of the entire pool, this regression can be worked around by downgrading to version 0.6.2. The procedure is as follows (for Debian system with installed DKMS SPL/ZFS version 0.6.3):
Recommendation: If you are not sure that you backup stream is OK, backup your existing pool without -D option if possible before destroying it. Hope it will help. |
Sorry guys, I've been working on other issues. I've bumped this to the 0.6.4 tag and we'll definitely try to get it resolved since it's a regression. |
Brian, thank you very much for your effort in ZFS for Linux development. In order to help you at least a little with the resolution of this issue (I am not a C programmer), I made bisect search over the ZFS Git repository using a Debian virtual machine. The first commit which failed to receive the deduped stream (see my reproducing procedure above) is this one: This can be with relatively high probability source of the regression. Unfortunatelly, this commit introduces a complex code change. The previous commit received the deduped stream without problems. Hope this could help. |
The problem is triggered here: https://github.com/zfsonlinux/zfs/blob/0426c16804e572f088dd293029ab39fa75e1e203/module/zfs/dmu_send.c#L1506. I'm looking into why. BTW, if you receive the stream into a new dataset, it seems to work (zfs receive /), which of course created everything a level deeper than it originally was. Also, this seems specific to recursive dumps. |
I have figured out the cause of this problem: In some circumstances, the guid of a WRITE_BYREF's referenced dataset (refobject) is not recorded in the The restructured sync task refactored many parts of the receive process, including among other things, the contents of I'm working on a fix for this problem. |
@FransUrbo I don't think so, at least not from my reading of the issue. Update on this issue: There are actually two somewhat related problems. First, the referred-to object is not recorded in I did manage to come up with an awful hack which fixes receiving into a top-level dataset (named as the pool) but is not a general solution. I'll continue to work on a proper solution. |
The cause is pretty much the same as was the case for #372, however, the reason for the incomplete |
@dweeezil Were you able to determine why ZoL was impacted and not the other ports? |
@behlendorf I've been working on the assumption that other ports will be affected, too. More information as I get it. I'll be testing my saved send stream file under Illumos and FreeBSD. |
@behlendorf The problem occurs under OpenIndiana 151a8. I see no reason this problem wouldn't occur under any other implementation as well. I will still try FreeBSD when I get a chance. EDIT: any other implementation "with the restructured sync task code". |
@behlendorf The problem occurs under FreeBSD 10. I've not had much time to work on a fix since the weekend. The old receive cookie had a couple of fields to help track it:
but in the new code, these have been replaced by I suspect that @ahrens could fix this in short order because I'm trying to guess the intent of the new code whereas he actually wrote it. Regardless, I'll see if I can't come up with something that works properly. |
@dweeezil Thanks for the clarification, for some reason I was under the impression this was Linux specific. |
@dweeezil do you have a simple set of commands to hit this issue? I'd like to try and reproduce it locally. |
@prakashsurya I think I can come up with a (short) reliable reproducer pretty easily. I've been preoccupied with other issues and haven't been able to look at this one all week... making a bit of time now. It was a bit simpler than I thought: https://gist.github.com/dweeezil/78320cd167b41bc13cd3 Here's an abbreviated dump of the stream:
I added a comment where the problem occurs. Here's a few interesting pointers into dmu_send.c: One issue is in |
@dweeezil Awesome! Thanks for the simple script. I can confirm it's not Linux specific, I see it in an illumos based VM. |
Can somebody give this a try? It seems to work in my illumos based environment...
|
@prakashsurya I applied and checked your patch, but it doesn't work: " invalid backup stream". |
@dweeezil When posting that patch, I had only tested it with the simple test script you provided earlier. Since then I've run it through some more testing and hit a panic, which I believe to be a NULL pointer deference. So that's almost correct, but as you say, it needs some more work. I'll try and devote some more time to this, and see what I can come up with. |
... as opposed to sending properties of all snapshots of the relevant filesystem. The previous behavior results in properties being set on all snapshots on the receiving side, which is quite slow. Behavior of zfs send -R is not changed. References: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/346 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2729 Issue openzfs#2210
... as opposed to sending properties of all snapshots of the relevant filesystem. The previous behavior results in properties being set on all snapshots on the receiving side, which is quite slow. Behavior of zfs send -R is not changed. References: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/346 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2729 Issue openzfs#2210
… needs 'readonly' (pool/backup) to ensure immutability. - Don't use deduplication due to issue 2210 <openzfs/zfs#2210>
Just ran into this myself. No data loss thanks to backups, but this is a really scary bug. Would it at least be possible to add a warning to zfs send -R until this can be fixed as crowtrobot suggests above? |
The problem is that the user learns this when receiving, not when sending. And while yes, everyone should test their backups (indeed, that's how I found this myself), it's a really nasty surprise if you don't. Just skip using -D is a fine workaround, but it's one that is far from obvious to users that they need to be employing it:-) |
The issue is that receive is potentially happening (much) later than send. In my case the issue is that send goes to a file, which is then backed up offsite, rather than being directly restored to a zpool, so the issue wouldn't be noticed until restore time. This is why I test them before I'm "done", since at least then I know there exists a version of ZFS that can receive them without error. I understand there are always risks inherent in this approach that make testing much more crucial, but I'd argue that at the moment a flag that claims to do one thing while secretly corrupting your backup without any warning goes beyond the "always test" threshold and into the "we should warn the user" territory, ESPECIALLY since the -D flag is much more attractive if you're backing up to tape/etc than if you're immediately sending to another zpool. |
No, I don't have easy access to any of those systems. I'm just deleting it and am recreating the backup from the original source, without -D. I'm not looking at actual data loss because I'm pretty cautious these days, but I'm remembering several youthful errors many years ago where a bug like this could have cost me everything:-) |
If you need full receive, you can install ZFS 0.6.2 which is not yet affected by the problem and receive under this version. This rescued me a while ago. The problem is that you need to start completely from scratch. |
Original author: Matthew Ahrens https://reviews.csiden.org/r/240/
I just hit this as well. Would be ideal to get this working properly because deduplication could have a lot of wins when replicating an entire pool. |
Just a heads-up for anyone stumbling upon this issue by searching for "cannot receive incremental stream: invalid backup stream": This error message can also occur if you have something running (like a cron job) that automatically takes snapshots on the pool you are zfs-receiving against. In my case, I use puppet to automatically configure a bunch of Linux boxes that all have a standard config. I have a pool named 'tank' on all of them, and during normal production use zfs-auto-snapshot runs every 15 minutes to ensure customer data is protected. I was restoring data to a newly created 'tank' with much larger drives. The zfs receive would run for a while, then zfs-auto-snapshot would snapshot, and the next snapshot in the received stream would cause this error. |
I believe the PR posted here (openzfs/openzfs#159) should fix this issue, and it also needs a reviewer to get pushed upstream. If anyone here wants to take a look I'd be very appreciative! |
@prakashsurya @dweeezil @behlendorf @dankimmel I tried to patch the current head of 7.0.-rc1_39* with the changes of (openzfs/openzfs@b09697c) and it works successful. Receiving the deduplicated stream with snapshots works. But i am unsure with it, because there are a some ambiguous differences in the code and i cant check all of them. Find my patch file in the attachment. There should be an additional reviewer. |
@pyavdr thanks for porting this. It'll be much easier to review and comment on if you open a new pull request against master with the patch. And make sure to include the test case from the original OpenZFS patch. You just need to add it to the |
Closed by #5689 |
"zfs receive" fails when receiving a deduplicated stream with snapshots with "invalid backup stream"
When the stream isn´t deduplicated it runs smooth.
To be sure, i patched another pool (part2 of this issue) with the newest pull request from @dweeezil (Recent illumos/Delphix features #2149) but the problem persists.
Linux Suse122 3.7.7-2-default #1 SMP Wed Feb 13 09:33:05 UTC 2013 (d55bd47) x86_64 x86_64 x86_64 GNU/Linux
ZFS: Loaded module v0.6.2-219_ga15dac4, ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.6.2-23_g4c99541
NAME USED AVAIL REFER MOUNTPOINT
stor2/zfstest 4.67G 102G 4.67G /stor2/zfstest
stor2/zfstest3 49.9G 102G 49.8G /stor2/zfstest3
stor2/zfstest3@394806312 736K - 768K -
stor2/zfstest3@421188256 976K - 1.94G -
stor2/zfstest3@484463391 992K - 4.26G -
stor2/zfstest3@618833865 1.28M - 7.45G -
stor2/zfstest3@579117221 1.41M - 9.58G -
stor2/zfstest3@516414280 1.54M - 12.8G -
stor2/zfstest3@967496177 1.68M - 16.0G -
stor2/zfstest3@636523150 1.77M - 18.2G -
stor2/zfstest3@672405261 1.87M - 20.5G -
stor2/zfstest3@949169779 1.99M - 23.1G -
stor2/zfstest3@688080912 2.33M - 25.8G -
stor2/zfstest3@522831668 2.91M - 28.5G -
stor2/zfstest3@493854542 3.20M - 31.1G -
stor2/zfstest3@151707443 3.32M - 33.8G -
stor2/zfstest3@194586695 3.46M - 36.4G -
stor2/zfstest3@493432740 3.58M - 39.1G -
stor2/zfstest3@144417120 3.68M - 41.7G -
stor2/zfstest3@138155230 3.80M - 44.4G -
stor2/zfstest3@122410522 3.98M - 47.0G -
stor2/zfstest3@967710983 1.86M - 49.7G -
stor2/zfstest3@now 0 - 49.8G -
Suse122:/stor2/zfstest # zfs send -R -D stor2/zfstest3@now > /stor2/zfstest/test.zfs
Suse122:/stor2/zfstest # zfs destroy -r stor2/zfstest3
Suse122:/stor2/zfstest # zfs recv -v stor2/zfstest3 < /stor2/zfstest/test.zfs
receiving full stream of stor2/zfstest3@394806312 into stor2/zfstest3@394806312
received 2.11MB stream in 5 seconds (432KB/sec)
receiving incremental stream of stor2/zfstest3@421188256 into stor2/zfstest3@421188256
received 249MB stream in 14 seconds (17.8MB/sec)
receiving incremental stream of stor2/zfstest3@484463391 into stor2/zfstest3@484463391
cannot receive incremental stream: invalid backup stream
Linux linux-ts3r 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC 2014 (750023e) x86_64 x86_64 x86_64 GNU/Linux
ZFS: Loaded module v0.6.2-201_g8eac9fb, ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.6.2-23_g4c99541
linux-ts3r:/etc # zpool get all
NAME PROPERTY VALUE SOURCE
tank size 496G -
tank capacity 0% -
tank altroot - default
tank health ONLINE -
tank guid 8224401125096643018 default
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -
tank free 496G -
tank allocated 126K -
tank readonly off -
tank ashift 0 default
tank comment - default
tank expandsize 0 -
tank freeing 0 default
tank feature@async_destroy enabled local
tank feature@empty_bpobj enabled local
tank feature@lz4_compress enabled local
tank feature@spacemap_histogram active local
tank feature@enabled_txg active local
tank feature@hole_birth active local
tank feature@extensible_dataset enabled local
tank feature@bookmarks enabled local
NAME USED AVAIL REFER MOUNTPOINT
tank 49.8G 438G 32K /tank
tank/zfstest 47K 438G 47K /tank/zfstest
tank/zfstest2 49.8G 438G 49.8G /tank/zfstest2
tank/zfstest2@300726135 242K - 2.24G -
tank/zfstest2@296730124 470K - 4.81G -
tank/zfstest2@224726639 771K - 7.39G -
tank/zfstest2@902427907 908K - 9.98G -
tank/zfstest2@540759649 1.07M - 12.6G -
tank/zfstest2@255536517 1.49M - 15.1G -
tank/zfstest2@199845779 1.62M - 17.7G -
tank/zfstest2@566124845 1.74M - 20.3G -
tank/zfstest2@956409770 1.87M - 22.8G -
tank/zfstest2@037434686 2.14M - 25.5G -
tank/zfstest2@301863766 2.63M - 28.1G -
tank/zfstest2@725218939 2.92M - 30.7G -
tank/zfstest2@694801830 3.06M - 33.2G -
tank/zfstest2@643148554 3.20M - 35.9G -
tank/zfstest2@492741743 3.33M - 38.5G -
tank/zfstest2@243681899 3.46M - 41.1G -
tank/zfstest2@659482439 3.59M - 43.7G -
tank/zfstest2@999733757 3.75M - 46.2G -
tank/zfstest2@921130344 3.69M - 48.9G -
linux-ts3r:/tank/zfstest # zfs snap tank/zfstest2@now
linux-ts3r:/tank/zfstest # zfs send -R -D tank/zfstest2@now > /tank/zfstest/testD.zfs
linux-ts3r:/tank/zfstest # zfs destroy -r tank/zfstest
zfs recv -v tank/zfstest2 < /tank/zfstest/testD.zfs
receiving full stream of tank/zfstest2@300726135 into tank/zfstest2@300726135
linux-ts3r:/tank/zfstest # zfs recv -v tank/zfstest2 < /tank/zfstest/testD.zfs
receiving full stream of tank/zfstest2@300726135 into tank/zfstest2@300726135
received 283MB stream in 19 seconds (14.9MB/sec)
receiving incremental stream of tank/zfstest2@296730124 into tank/zfstest2@296730124
received 319MB stream in 13 seconds (24.6MB/sec)
receiving incremental stream of tank/zfstest2@224726639 into tank/zfstest2@224726639
cannot receive incremental stream: invalid backup stream
The text was updated successfully, but these errors were encountered: