Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The mbuffer settings relate to the remote system only, is this right? #629

Closed
jimklimov opened this issue Jan 16, 2024 · 1 comment · Fixed by #630
Closed

The mbuffer settings relate to the remote system only, is this right? #629

jimklimov opened this issue Jan 16, 2024 · 1 comment · Fixed by #630

Comments

@jimklimov
Copy link
Contributor

jimklimov commented Jan 16, 2024

At least, this is what I see in practice, per znapzend logs (wrapped for readability) e.g.:

# zfs send -Lce -I 'rpool/home/abuild@znapzend-auto-2024-01-16T00:00:00Z' \
    'rpool/home/abuild@znapzend-auto-2024-01-16T11:51:05Z'\
    |ssh -o batchMode=yes -o ConnectTimeout=30 znapzend \
        'mbuffer -q -s 256k -W 600 -m 128M\
        |zfs recv -u -F pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild'

...although documentation examples (in znapzendzetup embedded man page) seem to imply that this is (was originally?) about the sender's local mbuffer:

  • Specify the path to your copy of the mbuffer utility.

  • Specify the path to your copy of the mbuffer utility and the port used on the destination. Caution: znapzend will send the data directly from source mbuffer to destination mbuffer, thus data stream is not encrypted.

  • znapzendzetup create --recursive --mbuffer=/opt/omni/bin/mbuffer ...

On one hand, having it remote-only adds constraints on present software (and run-time resources like RAM) of the destination host(s).

On another, if the main goal of mbuffer is to level out the burstiness of original ZFS send stream generation (and/or, to an extent, of its consumption on the other side) - so sender is not always blocked on the receiver and vice versa - then the mbuffer may as well run on the source system (assuming network speed roughly a constant).

Running the buffer on sender also allows for a more predictable use of RAM (sender may control how many streams it is sending and how large their buffers are sized, but may not control how many different systems are currently backing up into the same destination server and the impact of the many buffers spawned only there).

In fact, with manual replications I often end up having both (to level out network lags): zfs send | mbuffer | ssh "mbuffer | zfs recv"

This issue is posted to begin a discussion about perhaps adding another group of settings (src_mbuffer and src_mbuffer_size?) to optionally use that instead of (or in addition to) an mbuffer on the destination system.

Technically, it could be more correct to track independent dst_N_mbuffer(_size) settings and keep the current one for source, but this might break some deployments upon upgrade?..

Finally note that there may be local destinations, and running two mbuffer's talking to each other on the same host is an overkill. Although... if the user's znapzendzetup calls for it? Maybe warn, but honour their choice.

@jimklimov
Copy link
Contributor Author

jimklimov commented Jan 16, 2024

At least, can confirm the observed (may be not "desired") behavior in codebase.

  • I see it prepare a @cmd with generic zfs send (no mbuffer) at:

    my @cmd;
    if ($lastCommon){
    @cmd = ([@{$self->priv}, 'zfs', 'send', @sendOpt, $incrOpt, $lastCommon, $lastSnapshot]);
    }
    else{
    @cmd = ([@{$self->priv}, 'zfs', 'send', @sendOpt, $lastSnapshot]);
    }

  • and then if sending port-to-port (not in SSH tunnel) then it appends a | mbuffer -O port to sender and prepends a mbuffer -I port | to receiver. Using the same $mbuffer path to binary (very much not a given with current cross-platform probabilities):

    #if mbuffer port is set, run in 'network mode'
    if ($remote && $mbufferPort && $mbuffer ne 'off'){
    my $recvPid;
    my @recvCmd = $self->$buildRemoteRefArray($remote, [$mbuffer, @{$self->mbufferParam},
    $mbufferSize, '-4', '-I', $mbufferPort], [@{$self->priv}, 'zfs', 'recv', @recvOpt, $dstDataSetPath]);
    my $cmd = $shellQuote->(@recvCmd);
    my $subprocess = Mojo::IOLoop::Subprocess->new;
    $subprocess->run(
    #receive worker fork
    sub {
    print STDERR "# " . ($self->noaction ? "WOULD # " : "" ) . "$cmd\n" if $self->debug;
    system($cmd)
    && Mojo::Exception->throw('ERROR: executing receive process') if !$self->noaction;
    },
    #callback
    sub {
    my ($subprocess, $err) = @_;
    $self->zLog->debug("receive process on $remote done ($recvPid)");
    Mojo::Exception->throw($err) if $err;
    }
    );
    #spawn event
    $subprocess->on(
    spawn => sub {
    my ($subprocess) = @_;
    my $pid = $subprocess->pid;
    $recvPid = $pid;
    $remote =~ s/^[^@]+\@//; #remove username if given
    $self->zLog->debug("receive process on $remote spawned ($pid)");
    push @cmd, [$mbuffer, @{$self->mbufferParam}, $mbufferSize,
    '-O', "$remote:$mbufferPort"];
    $cmd = $shellQuote->(@cmd);

  • or if sending in SSH tunnel, the @mbCmd is prepended between $remote and $recvCmd (is not part of sender processes):

    }
    else {
    my @mbCmd = $mbuffer ne 'off' ? ([$mbuffer, @{$self->mbufferParam}, $mbufferSize]) : () ;
    my $recvCmd = [@{$self->priv}, 'zfs', 'recv' , @recvOpt, $dstDataSetPath];
    push @cmd, $self->$buildRemoteRefArray($remote, @mbCmd, $recvCmd);
    my $cmd = $shellQuote->(@cmd);
    print STDERR "# " . ($self->noaction ? "WOULD # " : "" ) . "$cmd\n" if $self->debug;
    system($cmd) && Mojo::Exception->throw("ERROR: cannot send snapshots to $dstDataSetPath"
    . ($remote ? " on $remote" : '')) if !$self->noaction;
    }

And per git blame, this remote-ness of mbuffer goes from the first commits (v0.0.1):

($remote, $dstDataSet) = $splitHostDataSet->($dstDataSet);
my @cmd;
if ($lastCommon){
@cmd = (['zfs', 'send', '-I', $lastCommon, $lastSnapshot]);
}
else{
@cmd = (['zfs', 'send', $lastSnapshot]);
}
my @mbCmd = $mbuffer ne 'off' ? ([$mbuffer, @{$self->mbufferParam}]) : () ;
my $recvCmd = ['zfs', 'recv' , '-F', $dstDataSet];
push @cmd, $self->$buildRemoteRefArray($remote, @mbCmd, $recvCmd);
my $cmd = $shellQuote->(@cmd);
print STDERR "# $cmd\n" if $self->debug;
and explicit "check if executable is available on remote host" at
if ($self->cfg->{mbuffer} ne 'off'){
# property set. check if executable is available on remote host
my ($remote, $dataset) = $splitHostDataSet->($self->cfg->{dst});
my $file = ($remote ? "$remote:" : '') . $self->cfg->{mbuffer};
$self->zfs->fileExistsAndExec($file) or die "ERROR: executable '" . $self->cfg->{mbuffer} . "' does not exist on $remote\n";
}

So to minimize surprises in the field, any change here should honour that singular setting of mbuffer path name (used for each destination and for sender in port-to-port mode), unless overridden by newly defined src_* and dst_N_* variants, and documented as deprecated...

jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 16, 2024
…ource and destinations [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 16, 2024
…ource and destinations [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 16, 2024
…ource and destinations [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 16, 2024
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 16, 2024
oetiker pushed a commit that referenced this issue Jan 16, 2024
…tion (and source) system (#630)

* bin/znapzendzetup: update documentation for --mbuffer option variants

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

* Refactor "mbuffer(_size)" settings to handle different variants for source and destinations [#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

* README.md: clarify about requirements for a remote destination system

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

* README.md: update for src_mbuffer* [#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

* .github/workflows/spelling/expect.txt: update for mbuffer changes [#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

---------

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 17, 2024
…off [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 17, 2024
…mbuffer, fall back to "off" [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 17, 2024
…off [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 17, 2024
…mbuffer, fall back to "off" [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 17, 2024
…mbuffer, fall back to "off" [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jan 17, 2024
…mbuffer, fall back to "off" [oetiker#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
oetiker added a commit that referenced this issue Mar 13, 2024
* lib/ZnapZend/Config.pm: checkBackupSets(): do not ignore src_mbuffer=off [#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

* lib/ZnapZend/Config.pm: checkBackupSets(): do not assign "undef" src_mbuffer, fall back to "off" [#629]

Signed-off-by: Jim Klimov <jimklimov@gmail.com>

---------

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Co-authored-by: Tobias Oetiker <tobi@oetiker.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant