Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying container from one server in an encrypted pool to another in an encrypted pool fails after performing a second copy/--refresh #14383

Open
benatbenjaminit opened this issue Nov 4, 2024 · 1 comment
Labels
Incomplete Waiting on more information from reporter

Comments

@benatbenjaminit
Copy link

benatbenjaminit commented Nov 4, 2024

Required information

  • Distribution: Ubuntu

  • Distribution version: 22.04

  • The output of "snap list --all lxd core20 core22 core24 snapd":
    Name Version Rev Tracking Publisher Notes
    core20 20240705 2379 latest/stable canonical✓ base,disabled
    core20 20240911 2434 latest/stable canonical✓ base
    core22 20240904 1621 latest/stable canonical✓ base,disabled
    core22 20241001 1663 latest/stable canonical✓ base
    lxd 6.1-efad198 29943 latest/stable canonical✓ disabled
    lxd 6.1-78a3d8f 30130 latest/stable canonical✓ -
    snapd 2.62 21465 latest/stable canonical✓ snapd,disabled
    snapd 2.63 21759 latest/stable canonical✓ snapd

  • The output of "lxc info" or if that fails:

    • Kernel version:
    • LXC version:
    • LXD version:
    • Storage backend in use:

Issue description

A brief description of the problem. Should include what you were
attempting to do, what you did, what happened and what you expected to
see happen.

Using zfs as backend on one server, trying to copy a container from an encrypted zfs pool to a new server in an encrypted zfs pool works but when doing the same operation using --refresh, an error occurs and the destination server's container storage is lost. Default non-encrypted pools work OK. I am almost certain that I must be doing something wrong otherwise others would be hitting this issue.

(somewhat related - I have another server where I run daily --refresh of my containers which work fine, to an encrypted partition. But after a while, I get the same error (cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one) but the storage at least is not lost on the destination server. I haven't been able to pinpoint when exactly it occurs but it may have something to do with snapshots. Ie - every night a snapshot is ran, then every night a --refresh to the destination occurs, with snapshots auto expiring. But if the --refresh is not ran for a while, the destination snapshots expire then when the --refresh occurs again, it can't do the incremental properly. Then the only way to fix is a full copy from src to dst. But at least now with this new server I have set up, I can't even get that far).

Steps to reproduce

  1. On server A:

Create new container
lxc launch ubuntu:24.04 c3 -s encpool
Creating c3
Starting c3

Copy to server B:
lxc stop c3
lxc copy c3 serverB: -s encpool
[success, no errors returned]

  1. On server B:
    sudo zfs list|grep c3
    rpool/lxd/encrypted/containers/c3 659M 668G 659M legacy

  2. On server A:
    lxc copy c3 serverB: -s encpool --refresh
    Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Failed receiving volume "c3": Problem with zfs receive: ([exit status 1 write |1: broken pipe]) cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one

  3. On server B:
    At this stage, the container c3 has lost its storage:
    sudo zfs list|grep c3

[no output returned]

lxc info --show-log c3
Name: c3
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/11/04 19:25 AEDT

Log:

zfs get encryption rpool/lxd/encrypted
NAME PROPERTY VALUE SOURCE
rpool/lxd/encrypted encryption aes-256-gcm -

(same output on both servers)

lxc config show c3 --expanded
architecture: x86_64
config:
image.architecture: amd64
image.description: ubuntu 24.04 LTS amd64 (release) (20241004)
image.label: release
image.os: ubuntu
image.release: noble
image.serial: "20241004"
image.type: squashfs
image.version: "24.04"
volatile.apply_template: copy
volatile.base_image: 74957a5580288913be8a8727d121f16616805e3183629133029ca907f210f541
volatile.cloud-init.instance-id: d264bee1-2e03-4030-a875-19b25e4a2a49
volatile.eth0.hwaddr: 00:16:3e:9e:a7:f7
volatile.idmap.base: "0"
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
volatile.uuid: a89a025e-171b-47c2-bcf7-253e91280e9e
volatile.uuid.generation: a89a025e-171b-47c2-bcf7-253e91280e9e
devices:
eth0:
name: eth0
nictype: bridged
parent: br0
type: nic
root:
path: /
pool: encpool
type: disk
ephemeral: false
profiles:

lxc-info.txt

@simondeziel
Copy link
Member

Using zfs as backend on one server, trying to copy a container from an encrypted zfs pool to a new server in an encrypted zfs pool works but when doing the same operation using --refresh, an error occurs and the destination server's container storage is lost. Default non-encrypted pools work OK. I am almost certain that I must be doing something wrong otherwise others would be hitting this issue.

Using an encrypted rpool and using lxc copy --refresh to another encrypted pool seems niche enough to me ;)

(somewhat related - I have another server where I run daily --refresh of my containers which work fine, to an encrypted partition. But after a while, I get the same error (cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one) but the storage at least is not lost on the destination server. I haven't been able to pinpoint when exactly it occurs but it may have something to do with snapshots. Ie - every night a snapshot is ran, then every night a --refresh to the destination occurs, with snapshots auto expiring. But if the --refresh is not ran for a while, the destination snapshots expire then when the --refresh occurs again, it can't do the incremental properly. Then the only way to fix is a full copy from src to dst. But at least now with this new server I have set up, I can't even get that far).

The first thing I'd try would be to run a fresher kernel/ZFS version.

Your lxc info says kernel_version: 5.15.0-52-generic which is very out of date. If you can, please try the latest kernel and also the latest HWE (6.8.0) one.

This way we can rule out any kernel/ZFS bug that's already been fixed.

@simondeziel simondeziel added the Incomplete Waiting on more information from reporter label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Incomplete Waiting on more information from reporter
Projects
None yet
Development

No branches or pull requests

2 participants