Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a "safe" append-/readonly-mode #1772

Open
MichaelHierweck opened this issue Oct 28, 2016 · 42 comments
Open

Implement a "safe" append-/readonly-mode #1772

MichaelHierweck opened this issue Oct 28, 2016 · 42 comments

Comments

@MichaelHierweck
Copy link

MichaelHierweck commented Oct 28, 2016

The docs claim append-mode can be used to used to prevent hacked clients from permanently altering existing archives. This can be achieved be granting only append-mode access to the client. Then changes to the repository are appended to the transaction log/journal and can be reverted by removing the lastet transactions from the journal.

First, this kind of manual roleback is not state-of-the-art. ;)

Second, disk space is not infinite. Sooner or later a trusted client (or the server) itself will need to free disk space. This requires "true" write access to repository and is done by prune. However archives that have been marked as (to-be-)deleted in append-mode will be wiped out by prune even if the retention policy specified along with the prune invokation should have preserved them.

See: #1689 and #1744

Therefore the trusted client the invokes prune on the repository is responsible for checking the integrity of the repository. But how could the be achieved? When a trusted client runs prune at a time when a hack of a client was not detected yet the prune action will apply any malicious trancations permanently. Then even archives might be purges or compromised that have been created before the hack and should not have been purged according to the retention policy. This would make desaster recovery from borgbackup based backups impossible.

I would like to suggest the implementation of a (new) safe append-, readonly-, worm-mode or whatever-mode that restricts clients to add new archives and rejects any action that would delete or change existing archives. Prohibited actions should be rejected immediately and therefore should not go into the journal at all.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 28, 2016

Yes, currently one has to be sure about having a "valid" (untampered) repo state before writing to it with append-mode=0.

borg list repo, borg list archive, borg extract --dry-run archive can help here, but making really really sure might be difficult (and slow).

We could have something better if we could disallow delete tags within a no-delete mode.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 30, 2016

I reviewed the code where repository.delete(id) is used:

  • by borg delete archive in Archive.delete() (via chunk_decref())
  • by borg debug delete-obj
  • by borg check --repair
    • with --verify-data in verify_data() to remove corrupt objects (so that they will be replaced by non-corrupt ones by later backups, hopefully)
    • in orphan_chunks_check() (to remove unreferenced objects)
  • by borg create in Archive.write_checkpoint() to remove the checkpoint archive item again after it has been saved/committed (so that the next checkpoint [or final] save/commit will replace it without creating unreferenced stuff)

The first ones are more or less expected and unproblematic (we just need to fail them early if there is no delete capability) - they don't need to be done from a not-that-much-trusted client (but can be done from a more trusted machine).

The last one is more problematic, can we solve it better than just switching off checkpoints completely?

@textshell
Copy link
Member

We could also just keep checkpoints in "no delete" mode.
But I think the real problem is not "delete" operations, it is put. Mostly put for the manifest is very big hole. (we could ignore all other puts, because they are supposed to contain the same data, although we can‘t check because of encryption)
I think what we need for a safe append only mode is that the appended archives are not stored in the repo manifest but managed by the borg server. i.e. we would need a new RPC operation "add_archive" that either takes the chunk-id or maybe even the whole archive chunk. That way the server could even implement a policy where only the last one in one connection is persisted. Thus there would not be a pile up of archives for each checkpoint.
This of course is a bigger change, as all clients that interact with such an repo need to be able to see the append only archives using further new rpc commands.
The trusted client might merge all of these into the manifest to create a repo that would be compatible with older clients again, or maybe just because it is more efficient.

Still problematic: A client can put chunks that claim to contain the data for some chunk-id but do not (either corrupted, or something else). I don‘t think there is anything we really can do about this. The trusted client could download and check these chunks, but that‘s a bit late. Also a bad client can put chunks not linked from any archive, although borg check would be able to clean this up.

@enkore
Copy link
Contributor

enkore commented Nov 1, 2016

I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.

@ThomasWaldmann
Copy link
Member

@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.

I'ld say this is pretty much doomed to be unsolvable without fundamental changes.

@textshell
Copy link
Member

I don‘t think we need to loose all hope for something that works well enough. Fundamentally the borg model distrusts the server, so we can‘t get perfect security here. But i hope we can do enough that borg backups can have a reasonable trust level.

We basically want to prevent one evil client to interfere with other clients backups and with backups of the client before it became evil. I don‘t think there is any way (in any setup) to make sure that a client doesn’t sabotage its new backups.

Maybe we should think about kinds of attacks here. One that springs to mind is for example the crypto trojan. An evil client just wants to destroy the backups to prevent undoing it‘s damage. For evil clients that want to do data ex-filtration we already have #672 or #1164. What are other major attacks an evil client might want to do?

One nice thing would be to be able to restrict clients to a certain (set?) of prefixes. This would likely be another --restrict--something option.

I think just using the first put is a viable strategy. Excluding the manifest (maybe just by refusing puts to it‘s id in this mode), a bad client needs to predict the id of an chunk another client will want to save. This should be hard for most client unique data. On the other hand it would be easy for data say from a distribution update. But restore errors in distribution files are just an hassle. Nothing that would force a user to for example pay ransom to a crypto trojan.

Even further a client could validate already known chunks with a certain probability. This would guard against non malicious corruption or if a client massively poisons the repository. Ideally it would check "new" chunks with a higher probability. (detecting new chunks would mean tracking trusted chunks (i.e. written from this client) separately on the client, which of course is more work.

@ThomasWaldmann
Copy link
Member

Still don't get how one would defend against a low-level crap-chunk-putting client while being able to run delete or prune now and then (see first post).

@textshell
Copy link
Member

Another threat scenario would be a user that uses some kind of cloud syncing solution.
Evil client syncs some file (thesis.tex) first. It now knows how this will be chunked and can poison those chunk ids with bogus data.
Now even if the file is synced to a good client later that client can hardly fix the damage of the evil client. I don‘t see a feasible way to defend against this, apart from the cloud syncing service also having backups. Then again the evil client could also just replace the file in the synced folder with crap and hope it will be synced to the good client before it has backuped the correct version.

@textshell
Copy link
Member

To summarize:
Add a new client restriction to borg that restricts delete and overwriting capabilities of a client.
Such a client:

  • can not write to the manifest
  • can not prune or delete anything
  • has to register new archives using a new remote call with the server
  • The server should save a secure client id with each archive that is registered in this way, for later validation.
  • The client should be able to replace a previous checkpoint that was created in the same connection with a new one. The server has to check that this is really in the same connection.
  • checkpoints that are only later "resumed" can not be deleted.
  • the chunks that would be deleted in checkpoint rollover need to be added as metadata in the most recent checkpoint while replacing checkpoints
  • puts to chunk ids that the server already has are ignored. (should contain same data as already stored or are evil)

All borg clients:

  • need to use a new api to load all separatly registered archives in addition to useing the list from the manifest.
  • client could validate already known chunks with a certain probability to guard against corruption.

A trusted client that e.g. does purge:

  • needs to check that an archive is created from the expected client, else report to the admin
  • might want to merge correct archives into the manifest and remove them from the separate list.
  • might want to check new chunks added (possible a random sample)

@enkore
Copy link
Contributor

enkore commented Nov 6, 2016

I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out.

Status update for that: Prototype is working.

What I've been up to here is essentially a backup system built on top of Borg, where you only have one trusted party, a central backup server that controls access to repositories.

This works by having (among some higher level coordination that is kinda required to make it all work) a reverse proxy that the (untrusted!) clients use to access a view of the target repository.

This provides:

  • Clients can't read or mutate archives in the repository
  • Clients can't push bad data (id != id(data) -- they can still write bogus metadata etc. my plan is to thwart that in the cache sync phase on the server - ditto for bogus orphans [the RP can create a delta-index])
  • Clients don't know the location of the real repository
  • Clients don't get the encryption keys for the real repository
  • Hence clients could not access the data in the real repository even if they gained access to it
  • Clients don't maintain a cache, and no archive caches are needed anywhere
  • But still full deduplication across all clients

Code: https://github.com/enkore/borgcube (please heed the notes in the readme)

@MK-42
Copy link

MK-42 commented Feb 13, 2017

Actually, I got a little lost in all those issues about 'hacked-server', 'append-only', 'append-only not save with prune' and so on. So excuse me if I'm not commenting in the right/most-appropriate place...

If I understood the current situation correctly:

  • --append-only will save your backup-data in case some client tries to delete stuff from your repo, (by only tagging chunks 'to-be-deleted', but not beeing able to delete them)
  • when leaving --append-only and executing --prune (or some other operation), that will delete everything that is to-be-pruned and tagged as 'to-be-deleted' by previous repo-accesses from --append-only runs.

are those assumptions correct? I'm new to borg and try to get my head around all this stuff, so please correct me if I'm wrong.

What I'm thinking about is:

  • the combination of --append-only and pruning from a trusted client is save, as long as you are sure that your clients/your repo have/has not been tampered with when you do the pruning.

so what about introducing something like an 'incubation-period' aka: prune all transactions that are older than [insert user-supplied time span here]. That would mean, If I have plenty of space, I will keep all transactions of the current Year, but pruning the stuff that is further past than that year.
My intention on that is: If one of my clients gets evil I will notice that at some point in time. If I have the transactions 'unpruned' since that client got evil, I can easily recover from that, by deleting its transactions. The conclusion is: If i am sure that none of my clients were evil in the last year, I can prune the transactions that are further past than one year without loosing data.

That would allow to save some space, prune now and then and have some kind of 'incubation-period' for me noticing that one of my clients got evil without it tampering with all my backups.

Depending on the users choice and trust in their machines they could choose a reasonable 'incubation-period' for them to notice something went wrong before that could creep in their backups.

As I couldn't get my head around the --append-only logic completely, I'm not sure if that is even possible like that, but wanted to share that idea. Is it possible like this?

@ThomasWaldmann
Copy link
Member

@MK-42 yes, that's correct.

repo commits do not have timestamps, so we can't consider time.

@enkore
Copy link
Contributor

enkore commented Feb 13, 2017

In -ao mode there is the transaction log which could be parsed back, but this sort of thing definitely requires RPC updates -> something for 1.1+

Also I'm not super-convinced that this would be a big improvement over simple -ao, since it requires even more knowledge of internals to grasp and is even harder to use. Either is stop-gappy...

@lucassz
Copy link

lucassz commented Jun 1, 2019

I created a $100 bounty. I encourage others who would find this useful to contribute!

@imperative
Copy link

@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content.

I'ld say this is pretty much doomed to be unsolvable without fundamental changes.

To reiterate the problem and make sure that I understand it correctly now, after reading the ~10 various currently existing issues loosely requesting new types of read-only/write-only/etc mode, they all seemingly stem from the fact that "--append-only" mode as it exists right now is mostly broken in real life usage. (It is not technically broken, as it does what says in the docs, but in reality most users will want combine it with pruning old data on the server, which will make every deletion/corruption previously masked and prevented by the append-only mode permanent. Thus if administrators want to use pruning, they are now expected to somehow inspect all repositories before every real prune (which is usually done often using a scheduling mechanism), which is completely unrealistic. The only real use case for append-only, the only time when it can prevent corruption/hack is when an attack has been detected immediately, and an administrator has been notified and reacted immediately to stop pruning batch jobs and started inspecting the state of the repository immediately after the attack. (Or if no prune commands are ever issued on a repository at all.))

The difficulty in implementing a fix seems to be rooted in the fact that the client-server model of Borg allows a client to issue low-level simple commands (who ever thought that up as a viable way to design it?) such as "PUT" or "GET" on individual blocks or indexes or repo files, and most of these commands are required for both creating, deleting, removing and purging at the same time, and so simply banning certain low-level commands does not work because they are used in a normal "create" command as well, and so banning them would prevent any operation (even creating a new backup). Does this assessment sound correct?

If so, the only two ways we have to implement "real" append-only/write-only, in a meaningful way that many people expect is

  1. To implement a clever heuristic/well planned analyzer or rights management system on the server which will interpret and disentangle the stream of low-level commands sent to it by the client in order to make an educated guess about whether the high level operation the client is trying to do is legitimate and valid, and then restrict/allow it accordingly.
  2. Change the underlying architecture of the client-server model in Borg, and finally stop exposing low-level commands that should only be done in the server to clients, giving a new API which will then be easily restricted.

Judging by the number of open issues, the breadth of discussion and the different ideas, the lack of consensus, the timespan, etc., solution 1) is proving to be very difficult to design implement.

How far are developers from the decision to invest in the solution 2)? Is it a viable alternative at all, how much reorganization would it require? How long time would it require to implement? Can it be done? Would such a big change even be accepted as a pull request?

@textshell
Copy link
Member

textshell commented Oct 13, 2019

@imperative Not exactly. The basic security model says that the server is the untrusted part. This is needed for (data at rest) encryption to be actually meaningful. So the server can not do much high level operations. This is on purpose. Of course the server always can drop data to make the backup disappear.

I've outlined my view of this in #1772 (comment). Which i still think is viable.

This adds a bit more trust to the server, as now the server sees encrypted archive data separately instead all in a big block, but this should be tolerable, because it is still encrypted and the previous usage patterns are likely to leak the exact same data for creates (assuming the crypto is good). prune/manifest compaction should not expose to much details either.

In a situation with multiple (untrusted) clients accessing one repository it still has the problem that an evil client can poison the repository with chunks claiming an id that does not match the contained data. In my model the (weak) defense against this is having the client check random chunks. A secure defense would be to have a client keep track of validated chunks and download and validate each chunk that is needed in an archive that this client did not yet validate.

For single client repos this is not really a problem as long as you keep in mind that only backups done before your client has been compromised are reliable. As those will always have their data already in the repository before the evil client comes along and already existing chunks can not be erased or replaced it can not spoil the old archives. (defend against the crypto malware use case)

@diego-treitos
Copy link

diego-treitos commented Oct 23, 2020

I am testing borgbackup and I found also this problem with its architecture.

I would like to share an idea which I don't know if it is realistic. Could we implement the pruning in the server side?. If the server saves the last date of when a chunk was required by any archive, then maybe the server can delete the chunks that have a date older than the configured one for pruning.

If a chunk belongs to 3 archives: 1 month old, 1 a week old and 1 a day old. The date for the chunk would be the one of the "day old" archive. If that chunk stops being used in new archives, it will retain that date so when a month (or whatever date) passes, the server can remove the chunk, bypassing the append-only.

This way:

  • You can still have an append-only so clients cannot remove the backups, but the server will free space.
  • You have to configure the pruning on the server side.
  • The amount of information required for the server to do the pruning is minimal and can be acquired by the trusted client.
  • The client cannot make a chunk older than it is, so I don't think this can be exploited.

However, I don't know if this is feasible or if I am getting anything wrong (probably I am). Anyway I just wanted to share it.

@enkore
Copy link
Contributor

enkore commented Oct 23, 2020

The server doesn't know when an archive references a chunk due to encryption.

@diego-treitos
Copy link

diego-treitos commented Oct 24, 2020

The server doesn't know when an archive references a chunk due to encryption.

I guess so, but I was wondering if it could be possible for the server to store that information (the client could send it). The amount of information is minimal and it doesn't look that it could disclose anything about the contents of the backup. Only getting that information should allow to prune the archives server side which is a big improvement in security.

The only additional information required is to associate a chunk with a date.

@jose1711
Copy link

Hello, I am very new to borg so please forgive me if I am not making any sense, just trying to understand the core of this problem. If I understand correctly, a model situation could be as follows:

  1. user on untrusted clientA (configured with append-only) makes a full backup to a remote host
  2. untrusted clientA gets hacked, hacker locates important/sensitive data, encrypts them with his key, server is in full hacker's control
  3. user is not aware of the issue and continues to work as usual
  4. backups are run according to schedule
  5. storage space on a remote host is thinning - user decides to run prune from a trusted clientB, all good backups of sensitive documents are now removed by prune operation
  6. hacker contacts the user and asks for ransom

If the above is correct, is there anything that can be to prevent this other than manually checking the diff between the current and to-be-pruned archives? I don't see really see effective countermeasures - most primitive would be if a file in the archive at least has a readable header (matching its extension).

@ThomasWaldmann
Copy link
Member

manifest entries (incl. archive timestamps) are generated clientside and are not readable by the server, because they are encrypted.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 3, 2022

#1772 (comment) my recent ideas still leave some problems unresolved:

A malicious client could spam the repo with lots of "good looking", but fake and useless archives, making it hard for the admin to find the good archives. The encrypted manifest and archive contents are completely under client control. A helpful countermeasure would be if the server added a server-side timestamp to the manifest entries so that fake clientside-made timestamps could be recognized as fake.

But even that doesn't help if malicious behaviour is not recognized for a longer time: an admin wanting to reclaim repo space might just prune away the good archives of the past and keep the more recent bad archives.

@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0b4, 2.0.0rc1 Nov 25, 2022
@jmcclelland
Copy link

I think making it "good" is better than waiting to make it perfect. As long as server side pruning could be controlled by the timestamps generated by the server, I think --put-only-mode seems like a solid solution.

Specifically, I don't think we need a solution that solves the problem of malicious behavior not being recognized for a longer time. The specific use-case this ticket addresses is ransomware. A clever attacker might try to encrypt your data over a period of time, but as soon as anything remotely important to your organization is poisoined or encrypted, you will find out, and you will go to your backup and ensure you can restore it.

For groups that are still sensitive to this problem, you can always set your prune policy to a longer time period.

@spacefrogg
Copy link

I like the ideas put forth in "Thought experiment".

I don't see the use for the separate archives directory. I believe, the aIDs share the namespace with chunk IDs and could just be kept in data. By never allowing to PUT a changed ID, you also cannot overwrite aIDs.

Below, I want to address some attacks and countermeasures.

Attack: push out archives

Malicous client uploads empty archives to prune out the last good ones.

Countermeasure: Refuse prune of archives with a negative delta in referenced size of data of more than X (not determining the actual chunks, only the amount of data they reference) (exception for the last archive, obviously)

Conditions: Server can reliably determine the amount of referenced data from archive headers without the help of the client.

Attack: push out good archives with bogus data

Malicious client uploads malicious archives of the same size of previous archives to prune out the last good ones.

Countermeasure: To be pruned archives must reference at least X% of the same chunks as one of the archive that are to stay

Conditions:

  • As in previous attack
  • Additionally, when keeping unrelated / weakly related data distributed over different archives in the same repo, all of the last instances must be deleted simultaneously. The server must assume that they are all related and deleting anyone of them alone triggers the countermeasure.

Attack: prevent deletion

Malicious client uploads a completely unrelated archive to prevent rightful deletion.

Countermeasure:

  • Extend the lifecycle of archives by a final "this is now considered garbage" state that is based on server-side timestamps and is completely under server control. Archives in that state are allowed to be pruned. This countermeasure is purely for the convenience of the server hoster, as the client that has once sent its data away must expect that it is kept indefinitely anyhow.
  • Chosen time frame determines the maximum interval in which the client must check its capability to successfully restore.

As the attacks and countermeasures lay out, you cannot defend against bogus data and allow deletion at the same time. The best you can do is to delay deletion by a reasonable amount of time to give the client side time to react to malicious activity.

@spacefrogg
Copy link

By thinking about these attacks more, I think there are some misconceptions regarding the pruning implementation. Right now, pruning works in keeping a number of archive versions. So the time interval to defend against a "push-out attack" depends on the attackers ability to generate new versions; the faster, the smaller is the defense interval.

The attack model consists of three major components:

  • A partially trusted server, which is trusted to the extent that:
    • It keeps and returns the client's data on request
  • A completely untrusted client, which is only allowed to append:
    • It may have been taken over by a malicious entity and only fakes backups
    • The only possible defense against it is to not let it violate the pruning contract and to regularly check that restores actually work
  • A completely trusted client, which may also delete.

The simplest solution against the remaining attacks (the ones not already addressed by the upgraded append protocol) is to move checking the pruning contract from the client side to the server side and augment it with timestamps. This way, you can enforce that an archive must have at least an age of T before it can be pruned. The time span T is the recovery interval in which the untrusted client must check successful data recovery.

This changes the attack model such that the trusted client must also trust the server's timestamps while informing the server about the intended pruning contract and recovery interval T.

Trust could additionally be extended to fix the pruning contract and interval T at repository creation time or as borg serve command parameters. This way, there is no need for a trusted client and the untrusted one can release the prune as per pre-defined contract.

  • Attack 1: The untrusted client fakes the prune.
  • Defense 1: Nothing. The user must expect the data on the server to never get deleted anyhow.
  • Attack 2: The server maliciously changes the pruning parameters to delete all archives.
  • Defense 2: Nothing. The server could never have stored your archives in the first place and could delete all data at any time.

So, no other changes than keeping timestamps (and additionally, pruning parameters) on the server side are necessary to get comparable security guarantees. Also, this can be blown up to arbitrarily complex time-driven retention policies by making T a list of intervals.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jan 8, 2023

@jmcclelland

"server side pruning" is not possible, because archive content is encrypted and the server usually does not have the key. The archives reference the content chunks and chunks need to be refcounted, so borg only deletes what's not used (referenced) any more. Only the client can do that as it can decrypt the archive.

"For groups that are still sensitive to this problem, you can always set your prune policy to a longer time period."

For this to work, we must make sure that a malicious client alone can not fake timestamps of archives, like generating a thousand of empty archives with fake timespans spreading over years. That could lead to all valid backups being pruned away in one prune run.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jan 8, 2023

@spacefrogg there needs to be a way to list the directory of archives. Currently it is easy, because the manifest contains a list of archive IDs stored at fixed chunkid 0, so it can be easily discovered and then we have the list we need. Problem is that adding/removing an archive needs a read-modify-write operation and thus locking of the repo to make this operation safe.

if we just have the archives stored under their aID in the middle of many many data chunks, they are hard to discover and list. maybe we could use some extra bit to tag these chunks in the repo index, then it would be "only" a repo index scan (which is in memory). or, as i proposed, just put the manifest entries as separate files into a separate directory, then we can just list that directory.

the countermeasures you propose require the server to know about archive contents. but this is not the case and not wanted for borg. content data and metadata in archives shall be encrypted and not readable by the server.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jan 8, 2023

@spacefrogg with the server only seeing mostly encrypted chunks (with the only exception of the server-generated timestamps) I don't think a server-side part of the prune policy can work. The server is just seeing DELs for misc. chunkIDs and does not really know what these chunks contain (content data or archive metadata). So I guess we just need that secure trusted client that does the pruning.

@ThomasWaldmann ThomasWaldmann modified the milestones: 2.0.0rc1, 2.0.0b6 Jan 8, 2023
@ThomasWaldmann
Copy link
Member

^ we can't do such a big change in rc1. thus, if we do this, it will be in b6 (and it will likely delay borg2 quite a bit due to the amount of changes and testing required).

@spacefrogg
Copy link

When you consider listing the archives in a separate directory, then the list of archives is not secret to the server. If it is not secret, why not let the server manage chunkid 0? Change the protocol such that the untrusted client just announces the aID to append to the list of archives. Another way would be to use a manifest but keep it under its natural chunkid and not chunkid 0. The server generates a new manifest from the current one, either adding an archiveID or admitting a pruning request. The old manifest's chunkid is no longer referenced and it will get garbage collected. No, read-modify-write involved.

N.B. That's basically how git tree objects work.

@ThomasWaldmann
Copy link
Member

That separate directory (or in general: namespace) would just show a list of IDs, but not archive names. The only "new thing" that is disclosed is if we add these server-side added timestamps.

@PhrozenByte
Copy link
Contributor

PhrozenByte commented Feb 24, 2023

^ we can't do such a big change in rc1. thus, if we do this, it will be in b6 (and it will likely delay borg2 quite a bit due to the amount of changes and testing required).

I know the struggle of such decisions, so I was thinking about whether borg2 might be able to prevent attackers from exploiting this limitation by using organizational measures - and I believe that the new borg transfer can lead the attacks nowhere at the cost of increased storage usage. My idea is to let a trusted client copy new archives (and only new archives) to a secondary (off-site) repo using borg transfer on a regular basis (e.g. daily). Even though this surely is no solution to the issue, it might make the decision about delaying borg2 easier when a viable workaround is known.

Detailed explanation:

The borg client running on the backed up server connects to the storage server (running with borg serve --apend-only) and creates backups there (i.e. borg create) as usual. At some point the backed up server is hacked and an attacker manages to corrupt the repo, including not only new archives, but also all older ones. Since borg serve was running in append-only mode no immediate damage was done, but unfortunately the attack wasn't noticed quick enough and a trusted client wrote to the repo (e.g. using borg prune) some time later. The damage is done.

About 6,5 years ago @enkore more or less had the idea to create a backup system on top of a borg repo - and with borg transfer we can do something similar: Another trusted client running on a different machine regularly (e.g. daily) pulls new archives from the storage server to its local repo ("secondary repo") using borg transfer. This trusted client never speaks to the malicious client, but only to the storage server. It only pulls new archives (e.g. by diffing the current borg list with the one a day ago - a simple shell script should do the job) and simply ignores any deletions (e.g. borg --repo=LOCAL_SECONDARY_REPO transfer --other-repo=REMOTE_PRIMARY_REPO --match-archives=NAME_OF_SINGLE_ARCHIVE). The trusted client therefore only adds new archives to the secondary repo, but never touches older archives, simply because it doesn't have to. Therefore a malicious client might still prevent creating new backups, but won't be able to tamper with older archives in the secondary repo.

The secondary repo can have its own completely separate retention policies. To prevent archive timestamp tampering we should never transfer archives that report to be older than the latest archive in the secondary repo (again, this might be a matter of a shell script running borg, not of borg itself - even though a new archive filter for borg transfer to accomplish this would be awesome, e.g. borg transfer --new-archives-only).

Storing a third copy of the data off-site (i.e. a backup of the backup) is best practice (following the "3-2-1 backup rule") anyway.

Since I don't know the implementation of borg transfer I might be horribly wrong here, so: Are my assumptions correct and are older backups in the secondary repo safe? Or might borg transfer pass the corruption down to the secondary repo?

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Feb 25, 2023

@PhrozenByte the default of borg transfer is to transfer all archives from src repo to dst repo which are not already present at dst, so guess you don't need additional scripting for that.

But you would for implementing that "only newer from src than latest in dst" policy (maybe borg transfer could be tweaked a bit to offer that also).

borg transfer just adds stuff. But guess it would need a slower paranoid mode which decompresses chunks and calls assert_id to verify that chunkid == id_hash(plaintext) - usually it just takes the compressed chunk (and does not decompress it) to be faster.

One problem of course persists: as soon as you start deleting/pruning archives, you should be sure to not delete stuff you still need (e.g. because all newer archives only contain crap).

@PhrozenByte
Copy link
Contributor

PhrozenByte commented Feb 25, 2023

One problem of course persists: as soon as you start deleting/pruning archives, you should be sure to not delete stuff you still need (e.g. because all newer archives only contain crap).

By limiting the transfer to archives that are newer than the latest archive in the secondary repo, only new archives (in both terms of "didn't exist before" and "of a later date") can reach the secondary repo. Therefore an attacker can't "inject" older crappy archives to the secondary repo to tamper with the retention policy, thus e.g. a borg prune --keep-daily won't delete more than what is expected (with the exception of borg prune --keep-last maybe).

The crucial point with a retention policy is that you must ensure that you keep archives from before the attack - always, even with the best "safe append-/readonly-mode" imaginable. If you think that you will detect any attack within at least 30 days, you must keep at least one 31 days old archive. Verifying chunkid == id_hash(plaintext) is a good thing (especially if the malicious client is not the backed up server), but if an attacker has control over the backed up server, the malicious client might simply compress and encrypt crap to begin with, something borg can't possibly ever notice. The important thing is that borg transfer won't ever overwrite chunks of older archives, just add (possibly crappy) new chunks.

This workaround surely is no solution to the issue, but prevents damage in the worst imaginable situation in which the backed up server is the malicious client, with existing code (except for borg transfer --new-archives-only maybe, but that is optional and can be mimicked with some shell script) - or am I missing something?

@ThomasWaldmann
Copy link
Member

borg transfer already has the --newer option, so one could say "only transfer archives newer than 1d".

If the transfer job is run regularly, that might be easier to use than managing the timestamp using additional scripting (and for that, borg would also need something like --newer-than=timestamp).

@gmatht
Copy link

gmatht commented Oct 18, 2024

By definition the server knew what state the server was in. Perhaps it is easier to backup the server (obviously this wouldn't work if we wanted the server to enforce complex retention strategies, but enough for simple "n days ransomware protection"). Btrfs snapshots should work. overlayfs trickery seems to work too. On a 213M borg repository running mount -t overlay -o upperdir=/tmp/overlay/,lowerdir=$HOME/test.borg,workdir=/tmp/work overlay /tmp/scratch.borg/ appending to a dozen files and doing a borg create only gives a manageable 172K /tmp/overflay. On the other hand the most convienient way to do this would be with a cp -lR, if this actually worked. Perhaps BorgBackup should break hardlinks?

@Best-HeyGman
Copy link

On the other hand the most convienient way to do this would be with a cp -lR, if this actually worked. Perhaps BorgBackup should break hardlinks?

Wait, why wouldn't cp -lR work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

17 participants