-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/syncoid: split in multiple systemd services and harden them #98455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea. I've tested this pretty thoroughly in my setup, and it works with the changes I suggested.
This is the modified version of this PR I have been testing: lopsided98@0894223
]) (localPoolName c.target); | ||
serviceConfig.User = cfg.user; | ||
serviceConfig.Group = cfg.group; | ||
serviceConfig.Restart = mkDefault "on-failure"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good default. With the default StartLimit
settings, persistent failures cause the service to be restarted in a loop forever and never marked as failed. This prevents me from getting a notification that my backups are failing. If users want restarting they can easily configure it themselves with StartLimit
settings appropriate for their use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've removed Restart=
and RestartSec=
, but I do not have enough monitoring perspective to know if this is suitable or not like that.
# For syncoid to be able to create /var/lib/syncoid/.ssh/ | ||
# and to use custom ssh_config or known_hosts. | ||
home = "/var/lib/syncoid"; | ||
createHome = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer StateDirectory = "syncoid"
in the systemd service. I think it is the more idiomatic way of doing it.
For my use case, I actually preferred not having a stateful known_hosts
file because it makes it harder to change host keys (I normally manage them declaratively but now I would also have to manually edit /var/lib/syncoid/.ssh/known_hosts
as well). I'm fine with this change though if it helps your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've moved to StateDirectory=
which ensures correct permissions and automatically binds the directory in RootDirectory=
.
Concerning the use case of /var/lib/syncoid/.ssh/
, when generating a /etc/ssh/ssh_known_hosts
with:
programs.ssh.knownHosts = {
mermet = {
hostNames = [ "mermet" "mermet.sourcephile.fr" ];
publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFvKN2sIpH782MFjaOpcty1Hs/T/TPNJpXI08H3O3oxl";
};
};
and using this hostname in :
services.syncoid.commands."backup@mermet.sourcephile.fr:rpool/var/mail".target =
"losurdo/backup/mermet/var/mail";
syncoid
now writes a /var/lib/syncoid/.ssh/known_hosts
specifying the IP address of the hostname:
80.67.180.129 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFvKN2sIpH782MFjaOpcty1Hs/T/TPNJpXI08H3O3oxl
instead of issuing the following warnings:
losurdo syncoid[19440]: Could not create directory '/var/empty/.ssh'.
losurdo syncoid[19440]: Failed to add the ED25519 host key for IP address '80.67.180.129' to the list of known hosts (/var/empty/.ssh/known_hosts).
Hence, if that IP address is dynamic, having such a stateful known_hosts
may be useful, though there are probably some --sshoption
that can help.
"+/run/booted-system/sw/bin/zfs" "allow" | ||
cfg.user "create,mount,receive,rollback" pool | ||
]) (localPoolName c.target); | ||
serviceConfig.User = cfg.user; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set PrivateTmp = true
, otherwise different commands that have the same destination host and run in parallel end up using the same SSH control socket. When one command finishes, it kills the connection of any others using the same socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, see also jimsalterjrs/sanoid#532 (comment).
I've set PrivateTmp=true
and also taken the time to find working values for the security options, based upon systemd-analyze security syncoid-$NAME.service
and previous experiences like #98904 .
As always, it might still be too much hardening. Please bear with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardening doesn't cause any problems for me, although as always it is definitely an increased maintenance burden that requires in depth systemd knowledge to understand. I'd like to get someone else's opinion, although it seems to be nearly impossible to find reviewers for sanoid/syncoid PRs for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying out sanoid/syncoid, hopefully I'll have the experience in a few weeks to have actual opinions. I'll try to remember this (and other PRs) -- but feel free to ping me if it's still stuck in a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it would be great to get this merged. Make sure you use #83904, otherwise sanoid templates won't work right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NixOS test needs to be updated to account for the changed syncoid service name.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Hardened with |
Rebased against latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lheckemann I've been running the latest version of this PR since last night without problems, so I'm good with merging this now if you want, as we discussed on IRC.
I've tested this, it seems to work, but some of the locking down seems to cause warnings (not errors though).
This is not the case when pushing the snapshots to a remote host, only when pulling snapshots from a remote host. |
@etu, oh, right, this is an important warning: syncoid calls
I am also relaxing |
Could it be that this has hardened the service so hard it has no more access to the ssh key in the end? I moved the ssh key into |
@phryneas yes, this PR chrooted Note that #147559 (awaiting reviews) uses the new systemd option |
Thank you @ju1m, I'm going to subscribe that future PR. |
@phryneas , note that you would usually reach |
Motivation for this change
Currently all syncoid commands are listed into a single script inside a single systemd service, hence if one command fails all the following commands are not run.
Ping @lopsided98
Things done
interval
option per command, defaulting to the globalinterval
.sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)