Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document pull-like operation #900

Closed
ThomasWaldmann opened this issue Apr 13, 2016 · 66 comments
Closed

document pull-like operation #900

ThomasWaldmann opened this issue Apr 13, 2016 · 66 comments

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 13, 2016

this is a FAQ (by people who have firewalls or want it for other reasons) and some people are evaluating setups with ssh -R (see some posts in #36).

this issue is to collect such setups and if evaluated successfully, add it to the documentation.

note: the debian/ubuntu package description says borg only supports push, maybe that can be removed after this ticket is closed.

so, if you successfully run a pull-like setup, the best thing you can do is to make a pull request that closes this ticket.


💰 there is a bounty for this

Note: to collect the bounty you need to run a reliable pull-like setup, do a pull request for our documentation, documenting the pull-related parts of the setup.

@ThomasWaldmann
Copy link
Member Author

A pull setup that does not involve ssh is to just mount the source filesystems on the machine that runs borg.

@textshell
Copy link
Member

For the usecase where the normal push way is problematic because of firewalls etc.

From axion on irc (slightly edited and simplified, so all errors are likely mine):

repo=ssh://${USER}@localhost:${PULL_PORT}${REPO_PATH}/${host}
ssh -R ${PULL_PORT}:localhost:22 ${host}         \
  BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes \
  borg create ${repo}::${archive} /some/path

This way tunnels a ssh connection through an ssh connections so it does have some additional overhead.

Another way would be to use BORG_RSH and a pair of socat instances to avoid one layer of ssh encryption.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 9, 2016

@textshell I don't think that BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes should be in there permanently, right?

Also, the repo=...{REPO_PATH}/${host} can be done that way, but is unrelated to pull-mode. Also, REPO_PATH there is rather the path that has the repos as subdirs, not the path of the repository itself.

@textshell
Copy link
Member

@UNKNOWN_UNENCRYPTED it depends if PULL_PORT is always the same. If that can be arranged that yes, it should not be in there.

@repo yes that can be simplified. I only did limited editing from axion’s pastebin, but i didn’t want this to get lost again.

I think the BORG_RSH + socat way would be nicer anyway (no ssh in ssh, not dependency on sshd running on the backups server, etc), but a little more complex bash script.

@leaf-node
Copy link
Contributor

While having documentation for this workaround is great, wouldn't it be better to add this functionality to borg itself? This kind of syntax would be awesome:

$ borg create /path/to/repo::example.com-now user@example.com:/

@cwebber
Copy link

cwebber commented Nov 19, 2016

I agree with @sudoman; it would be useful to know, are there architectural reasons this would be difficult? It feels like this would dramatically increase the number of scenarios for which Borg is a recommendable solution.

@textshell
Copy link
Member

We first need to agree on a plan. For example i don‘t like overloading the directory argument on borg create with additional magic. I personally tend to a new sub command.

(Note: server is where the repo is, client is the remote system where the to be backuped data is)

Also as this is not the main use case for borg i think the design should minimize the changes needed. Thus i think the stdin, stdout and stderr of the ssh session should be used for the ui of the borg client on the remote system not for data transfer so that all interaction still works as expected. Thus the repository communication would need to be tunneled with an additional unix socket forward. I’m not sure what to do about borg serve’s stderr. Maybe it‘s ok to just (implicitly) splice that in on the server side.

One way to implement borg pull would be to create unix socket and listen to it, ssh to the to be backuped system to run borg with special options to create that tell it the needed unix socket to connect. The server would then wait for an connection on the unix socket, dup2 these to stdin, stdout and invoke the RepositoryServer.

Still open is how key management is even supposed to work in this scenario. Maybe mandate a keyfile on the client in the default location?
We also need to ensure that there is a good way to secure this with the usual forced command stuff.

This would need minimal changes to the main borg code:

  • RemoteRepository would need to get an option to connect to a unix domain socket via a new option in create.
  • A new pull command needs to be implemented that does the initial setup and then chains into borg serve.

Of course this still requires a borg executable on the client.

So it doesn‘t need any architectural changes but is a lot fiddling with external ssh process interaction and the os module. So in the end i think it‘s a task that is doable for anyone with sufficient motivation and decent python skills.

@enkore
Copy link
Contributor

enkore commented Nov 19, 2016

It's pretty much what I do over at borgcube. I don't mean that as advertisement (wouldn't make any sense for a project that doesn't really work yet, does it?), rather, if someone wants to implement it in their system they can draw inspiration from there - the basics (pulling an archive from a client) work rather well.

You can also gauge how many changes it would likely need to do this; if it is even more tightly integrated into Borg itself it would probably mean much more changes than those presented in borgcube.

Which is the reason why I choose to put that into a separate project; however if someone wants to work toward integrating it into Borg itself I won't interfere of course, since I'm obviously inherently biased here.

@cwebber
Copy link

cwebber commented Nov 20, 2016

I'm reading the ~solution posted by @textshell again, and I'm realizing... this has reintroduces to the pull model one of the issues (well, issue depending on your setup) that I was hoping to avoid from the push model.
,
Consider a scenario where I have a backup machine running on say, my local LAN. I have a lot of backups on it. I don't want the machines I'm backing up from some remote VM hosting server to have access to this machine... the trust is in the backup machine accessing the other machines, not the reverse.

In the scenario being described, it sounds like both machines will have to have access to each other.

@textshell
Copy link
Member

@cwebber The client only has access to the borg repository not the whole backup server in the scenario i posted (but that is comparable to the access it would have in push mode with a correctly setup forced command and assuming sshd is not buggy). At least if RepositoryServer is started with restrictions to only allow access to the right borg repository and only via the socket that borg pull creates. (So no direct network or ssh access is needed)
I think doing the chunking and deduplication (and encryption) locally on the client is one of the core parts of borg. On the other hand it would be possible to have a pull script the creates an sshfs tunnel and does those on the server side. But i don‘t think that really needs support in borg, that‘s just a easy script to write, but looses quite some of borgs performance.

@horazont
Copy link
Contributor

horazont commented Dec 1, 2016

FWIW, I have made a small hack which works with socat, thus saving the SSH-in-SSH overhead and obliterating the need for the remote machine to have an account on the local machine. Using --append-only and --restrict-to-path, this should be as safe as Borg is, but I’d like any feedback on that.

First, we create socat-wrap.sh, which we will use as BORG_RSH:

#!/bin/bash
exec socat STDIO TCP-CONNECT:localhost:12345

Locally, we run socat to offer the borg service:

socat TCP-LISTEN:12345,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"

(omit the ,fork if you want to allow only exactly one borg command to be run)

Now we invoke borg on the remote using ssh, forwarding the port:

ssh -R 12345:localhost:12345 sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

foo is completely arbitrary; one could substitute anything here, because the socat-wrap.sh ignores its arguments.


Of course, it’s also possible to do the same with UNIX sockets, providing more isolation.

socat-wrap.sh:

#!/bin/bash
exec socat STDIO UNIX-CONNECT:/home/horazont/borg-remote.sock
socat UNIX-LISTEN:/home/horazont/borg-local.sock,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"
ssh -R /home/horazont/borg-local.sock:/home/horazont/borg-remote.sock sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

ssh is friendly enough to automatically set very strict permissions on the socket on the remote side.

@ThomasWaldmann
Copy link
Member Author

@horazont looks good. did you compare performance ssh vs. socat?

Is the socat-wrap.sh needed or could the socat command be used directly in BORG_RSH?

@horazont
Copy link
Contributor

horazont commented Dec 1, 2016

@ThomasWaldmann socat doesn’t like the additional arguments borg is attempting to add. Not sure how to circumvent that.

re performance, I haven’t checked. My main motivation for finding this solution was that I didn’t want to setup an account for the remote to SSH into (even though it should be pretty safe authorized_keys command restrictions). The appeal is that it works out-of-the-box, no configuration on either side needed (the socat-wrapper.sh can be scp’d on demand).

@ThomasWaldmann
Copy link
Member Author

ah, of course. yeah, then such a script is easiest way.

if you have that setup working ok, could you add a section to our docs about it and do a PR against 1.0-maint?

@fake666
Copy link

fake666 commented Jan 14, 2017

i set up the socat-based solution from @horazont mentioned above, running nightly backups from various locations. i noticed that with larger backup targets, after a couple of days, i reproducably get this error:

Traceback (most recent call last):
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 72, in __del__
   self.close()
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 192, in close
   self.lock.release()
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 298, in release
   self._roster.modify(EXCLUSIVE, REMOVE)
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 216, in modify
   elements.remove(self.id)
KeyError: (('storage', 31273, 0),)
$LOG ERROR Remote: Received SIGTERM.

after this happens once, the lockfile not having been deleted properly prohibits further backups...

i guess it has to do with one of the connections being closed prematurely?

edit: this error is reported on the server that "pulls" the backup from the client (i can only tell by of the /opt/lib/... location - this setup is pretty confusing to debug).

@enkore
Copy link
Contributor

enkore commented Jan 14, 2017

Maybe socat times out?

-T<timeout>
    Total inactivity timeout: when socat is already in the transfer loop and nothing has happened for <timeout> [timeval] seconds (no data arrived, no interrupt occurred...) then it terminates. Useful with protocols like UDP that cannot transfer EOF. 

Not sure if that's on by default.

@fake666
Copy link

fake666 commented Jan 14, 2017

hm, i just realized i used kill $SOCAT_PID after the ssh command finished (i'm doing borg prune right after the backup finishes) - i replaced that with wait $SOCAT_PID now, i guess that should fix it...

thanks for the timeout hint, i now enabled socat logging with -lf and -d -d. if it happens again, we'll know for sure if there was a timeout!

@fake666
Copy link

fake666 commented Jan 23, 2017

it happened again :( no timeout though. here is the borg output:

------------------------------------------------------------------------------
Archive name: home-2017-01-23 05:50:53.078703
Archive fingerprint: 65a8b026c5801c11411d9bc63354517d71bb95bc41b3a8f86838a19c64d6170f
Time (start): Mon, 2017-01-23 05:51:00
Time (end):   Mon, 2017-01-23 05:51:38
Duration: 37.94 seconds
Number of files: 62977
------------------------------------------------------------------------------
                      Original size      Compressed size    Deduplicated size
This archive:                8.28 GB              8.28 GB             31.97 MB
All archives:               98.76 GB             98.76 GB              8.12 GB

                      Unique chunks         Total chunks
Chunk index:                   63962               780503
------------------------------------------------------------------------------
Exception ignored in: <bound method Repository.__del__ of <Repository /share/.../back>>
Traceback (most recent call last):
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 72, in __del__
   self.close()
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 192, in close
   self.lock.release()
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 298, in release
   self._roster.modify(EXCLUSIVE, REMOVE)
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 216, in modify
   elements.remove(self.id)
KeyError: (('storage', 32244, 0),)
$LOG ERROR Remote: Received SIGTERM.
Failed to create/acquire the lock /share/.../back/lock.exclusive (timeout).

and here's the socat log for that run:

2017/01/23 05:50:50 socat[32239] N listening on AF=2 0.0.0.0:12345
2017/01/23 05:50:53 socat[32239] N accepting connection from AF=2 127.0.0.1:42657 on AF=2 127.0.0.1:12345
2017/01/23 05:50:53 socat[32239] N forking off child, using socket for reading and writing
2017/01/23 05:50:53 socat[32239] N forked off child process 32244
2017/01/23 05:50:53 socat[32239] N forked off child process 32244
2017/01/23 05:50:53 socat[32239] N starting data transfer loop with FDs [7,7] and [6,6]
2017/01/23 05:50:53 socat[32244] N execvp'ing "borg"
2017/01/23 05:51:39 socat[32239] N socket 1 (fd 7) is at EOF
2017/01/23 05:51:39 socat[32239] N exiting with status 0

all looks normal.. but all the days before, when stuff was working fine, there was an additional few lines in the log:

2017/01/22 05:50:49 socat[31968] N listening on AF=2 0.0.0.0:12345
2017/01/22 05:50:52 socat[31968] N accepting connection from AF=2 127.0.0.1:37109 on AF=2 127.0.0.1:12345
2017/01/22 05:50:52 socat[31968] N forking off child, using socket for reading and writing
2017/01/22 05:50:52 socat[31968] N forked off child process 31975
2017/01/22 05:50:52 socat[31968] N forked off child process 31975
2017/01/22 05:50:52 socat[31968] N starting data transfer loop with FDs [7,7] and [6,6]
2017/01/22 05:50:52 socat[31975] N execvp'ing "borg"
2017/01/22 05:51:34 socat[31968] N socket 1 (fd 7) is at EOF
2017/01/22 05:51:34 socat[31968] N childdied(): handling signal 17
2017/01/22 05:51:34 socat[31968] N socket 1 (fd 7) is at EOF
2017/01/22 05:51:34 socat[31968] N socket 2 (fd 6) is at EOF
2017/01/22 05:51:34 socat[31968] N exiting with status 0

note the childdied() : handling signal 17

i'm at a loss.. what's going on here?

@marcpope
Copy link

marcpope commented Feb 12, 2019 via email

@Daryes
Copy link

Daryes commented May 5, 2019

Unless I'm blind, I don't think anyone spoke about the fact a complete pull system with sshfs started before Borg is doable, without a root login (specific sudo right on the remote target is required).
The trick lies with -o sftp_server and sudo :

sshfs user@host:/  /local/mount/dir  -o ro -o sftp_server="sudo /usr/lib/openssh/sftp-server"

Adjust sftp_server arg to the sshd_config subsystems entry.

To have this working, you'll need :

  1. a dedicated user on the remote server. It can be a system user without password, but a home and shell are required. No specific group or rights aside this file in the sudoers (adjust the username) :
# sudoers file : /etc/sudoers.d/borg
borg ALL=NOPASSWD:/usr/lib/openssh/sftp-server
  1. this user will also need in his ~/.ssh directory the public key of the user running Borg on the backup server.

Now try to connect to the target server with ssh, and retry with sshfs. You'll see all files can be accessed, due to sftp-server running as root.
Borg can now start to backup the remote server using the mount point.

Only limitation for now is the fact the backup will have inside the full path of the mount point. And this will also need to be set as a prefix on all paths to backup and exclude.
For example : borg create ... repo::backup-set /mount/point/etc /mount/point/boot /mount/point/home /mount/point/usr --exclude /mount/point/usr/cache/

@jirib
Copy link

jirib commented May 27, 2019

Of course, it’s also possible to do the same with UNIX sockets, providing more isolation.

socat-wrap.sh:

#!/bin/bash
exec socat STDIO UNIX-CONNECT:/home/horazont/borg-remote.sock
socat UNIX-LISTEN:/home/horazont/borg-local.sock,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"
ssh -R /home/horazont/borg-local.sock:/home/horazont/borg-remote.sock sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

ssh is friendly enough to automatically set very strict permissions on the socket on the remote side.

Thanks! I used this reverse unix socket forwarding to backup a remote server. I didn't want to use static reverse port and I could not figure out how to catch dynamic port, also using unix sockets offers better isolation as default mask is 01777 and thus other uses can't try to access it. People who would like to use it just check AllowStreamLocalForwarding, StreamLocalBindUnlink options in sshd_config(5).

@horazont
Copy link
Contributor

horazont commented Jun 6, 2019

A new round of fun with pull-like operation.

I wrapped the pulling side in systemd units:

borg-remote-repositories.socket

[Unit]
Description=Socket for accessing a specific path as borg repositories

[Socket]
ListenStream=/data/test/borg.sock
Accept=yes

borg-remote-repositories@.service

[Unit]
Description=Borg serve

[Service]
Type=simple
ExecStart=/usr/bin/borg serve --append-only --restrict-to-path /data/test/repos/ --umask 077
StandardInput=socket
StandardOutput=socket
StandardError=journal
User=remote-backups
Group=remote-backups
ProtectSystem=strict
PrivateTmp=yes
PrivateNetwork=yes
PrivateDevices=yes
ProtectKernelTunables=yes
RestrictAddressFamilies=
ReadWritePaths=/data/test/repos/

This makes the borg serve:

  • run under its own user (remote-backups -- make sure that user has rwx permissions on /data/test/repos and everything therein)
  • have ~no privileges on the system: no network access, no device access, no access to a shared tmp, no write access to the system etc.
  • be able to run multiple times, once for each client connecting to the socket

To execute a backup, one can use for example:

ssh -R /root/borg.sock:/data/test/borg.sock root@remote-host BORG_RSH="'bash -c \"exec socat STDIO UNIX-CONNECT:/root/borg.sock\"'" borg create -p ssh://remote/data/test/repos/remotely-created::postgres-$(date --iso-8601=seconds) /var/lib/postgresql-backups/ ';' rm /root/borg.sock

The rm /root/borg.sock helps with cleanup in case the remote server cannot be configured to do StreamLocalBindUnlink.

(Of course, you’d normally not use root but instead a user with sudo privileges for exactly the required borg create commands.)

@ThomasWaldmann
Copy link
Member Author

@fantasya-pbem did you see this ticket / bounty?

@fantasya-pbem
Copy link
Contributor

Yeah, I find it quite difficult to go through all these comments and get the essence of what could be called a general recipe to to it. And I don't have experience with pull-like operations. I'll follow this issue and maybe one day find time to write some docs from it.

@ThomasWaldmann
Copy link
Member Author

@fantasya-pbem ok, thanks. guess one needs to actually try it and in parallel write complete / consistent docs.

@pinpox
Copy link

pinpox commented Nov 28, 2019

While having documentation for this workaround is great, wouldn't it be better to add this functionality to borg itself? This kind of syntax would be awesome:

$ borg create /path/to/repo::example.com-now user@example.com:/

Was this ever implemented?

@Skyr
Copy link

Skyr commented Dec 20, 2019

I managed to get a pull setup running, turns out it looks extremely similar to @horazont's approach ;-) I also use socat to redirect from/to a unix domain socket. I baked all in two shell scripts (one on the pull-side, one on the machine to be backupped) - no systemd, "borg serve" is only spun up during the actual backup process.

A first step to streamline this approach would be a modification of borg to get rid of the socat workaround, i.e. redirecting stdin/stdout to a unix domain socket (sounds similar to #4749). This clould look like the following:

  • Add an optional parameter --socket /path/to/socket
  • "borg serve" would use this socket
  • Add an extension for the repo URI: socket://path/to/repo/on/remote - this would use the socket passed via --socket and use it for communication

@ThomasWaldmann would you be interested in code changes implementing this? Or are you aiming for a more comfortabe "all-in-one solution" which would seamlessly integrate the push like in the comment above by @binaryplease?

@ThomasWaldmann
Copy link
Member Author

@Skyr I'ld like to see a solution that does not need major modifications or additions to the RPC code (remote.py). That code is fragile, performance critical and not easy to debug.

@ghost
Copy link

ghost commented Dec 30, 2019

Is this resolved? Seems like all @horazont needs to do is open a PR.

@jirib
Copy link

jirib commented Apr 10, 2020

BTW latest OpenSSH added support for remote/local unix socket forwarding tokens, see https://bugzilla.mindrot.org/show_bug.cgi?id=3014

@stobbsm
Copy link

stobbsm commented May 27, 2020

I'm creating a daemon to automate backups for me (need to learn go, and this fits it). Eventually, I'd like to be able to run the daemon on a server, that will tell my client to start a backup if one hasn't been done recently.

Does anyone see any issues doing this? Anyone interested in a similar thing?

@ThomasWaldmann
Copy link
Member Author

closing due to #5230.

ThomasWaldmann pushed a commit that referenced this issue Jun 23, 2020
docs: describe socat pull mode, fixes #900

also: fix sphinx deprecation warning

borg/docs/conf.py:114: RemovedInSphinx40Warning: The app.add_stylesheet() is deprecated. Please use app.add_css_file() instead.
@ThomasWaldmann
Copy link
Member Author

@BenediktSeidl solved this, but wants to give the bounty to the borg project thanks!

#5150 (comment)

So, I will claim it and transfer the funds back to borgbackup org, so they can be used for future bounties.

@ThomasWaldmann ThomasWaldmann modified the milestones: lithium, hydrogen Jun 23, 2020
@ThomasWaldmann
Copy link
Member Author

Now claimed USD 50 bounty and transferred back to borgbackup org, https://www.bountysource.com/orders/119860?receipt=1

@tombyman
Copy link
Contributor

tombyman commented Aug 5, 2020

This method requires only passwordless access from borg-server to borg-client. Not ssh-in-ssh.

Do this once on borg-server:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod go-w ~/.ssh/authorized_keys

Execute pull operation on borg-server:

(
  eval $(ssh-agent) > /dev/null
  ssh-add -q
  ssh -A borg-client "borg init -e none --rsh 'ssh -o StrictHostKeyChecking=no' $(id -un)@borg-server:repo"
  kill "${SSH_AGENT_PID}"
)

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Aug 5, 2020

@tombyman commenting on a closed issue / merged PR might be not the best way to push this.

So, maybe better open a new issue. Describe what the issue is and if you have a solution, make a PR that fixes the issue?

@tombyman
Copy link
Contributor

tombyman commented Aug 7, 2020

Created issue #5287 and PR #5288.

fantasya-pbem added a commit to fantasya-pbem/borg that referenced this issue Dec 11, 2021
Backport from master borgbackup#5150: Document the socat pull mode described in borgbackup#900.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests