-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a "safe" append-/readonly-mode #1772
Comments
Yes, currently one has to be sure about having a "valid" (untampered) repo state before writing to it with append-mode=0. borg list repo, borg list archive, borg extract --dry-run archive can help here, but making really really sure might be difficult (and slow). We could have something better if we could disallow delete tags within a no-delete mode. |
I reviewed the code where
The first ones are more or less expected and unproblematic (we just need to fail them early if there is no delete capability) - they don't need to be done from a not-that-much-trusted client (but can be done from a more trusted machine). The last one is more problematic, can we solve it better than just switching off checkpoints completely? |
We could also just keep checkpoints in "no delete" mode. Still problematic: A client can put chunks that claim to contain the data for some chunk-id but do not (either corrupted, or something else). I don‘t think there is anything we really can do about this. The trusted client could download and check these chunks, but that‘s a bit late. Also a bad client can put chunks not linked from any archive, although borg check would be able to clean this up. |
I'm working on some ideas in this direction, but don't want to commit to anything until I see how it pans out. |
@textshell yes, put is also a problem. :| and we can not ignore non-manifest puts as we are defending against an evil client here. it could just put bad replacement chunks for all content data in the repo and the only way to notice is a very expensive --verify-data operation. it could also additionally replace all metadata to make everything look valid (even for borg check --verify-data) as long as you do not (manually) look at content. I'ld say this is pretty much doomed to be unsolvable without fundamental changes. |
I don‘t think we need to loose all hope for something that works well enough. Fundamentally the borg model distrusts the server, so we can‘t get perfect security here. But i hope we can do enough that borg backups can have a reasonable trust level. We basically want to prevent one evil client to interfere with other clients backups and with backups of the client before it became evil. I don‘t think there is any way (in any setup) to make sure that a client doesn’t sabotage its new backups. Maybe we should think about kinds of attacks here. One that springs to mind is for example the crypto trojan. An evil client just wants to destroy the backups to prevent undoing it‘s damage. For evil clients that want to do data ex-filtration we already have #672 or #1164. What are other major attacks an evil client might want to do? One nice thing would be to be able to restrict clients to a certain (set?) of prefixes. This would likely be another --restrict--something option. I think just using the first put is a viable strategy. Excluding the manifest (maybe just by refusing puts to it‘s id in this mode), a bad client needs to predict the id of an chunk another client will want to save. This should be hard for most client unique data. On the other hand it would be easy for data say from a distribution update. But restore errors in distribution files are just an hassle. Nothing that would force a user to for example pay ransom to a crypto trojan. Even further a client could validate already known chunks with a certain probability. This would guard against non malicious corruption or if a client massively poisons the repository. Ideally it would check "new" chunks with a higher probability. (detecting new chunks would mean tracking trusted chunks (i.e. written from this client) separately on the client, which of course is more work. |
Still don't get how one would defend against a low-level crap-chunk-putting client while being able to run delete or prune now and then (see first post). |
Another threat scenario would be a user that uses some kind of cloud syncing solution. |
To summarize:
All borg clients:
A trusted client that e.g. does purge:
|
Status update for that: Prototype is working. What I've been up to here is essentially a backup system built on top of Borg, where you only have one trusted party, a central backup server that controls access to repositories. This works by having (among some higher level coordination that is kinda required to make it all work) a reverse proxy that the (untrusted!) clients use to access a view of the target repository. This provides:
Code: https://github.com/enkore/borgcube (please heed the notes in the readme) |
Actually, I got a little lost in all those issues about 'hacked-server', 'append-only', 'append-only not save with prune' and so on. So excuse me if I'm not commenting in the right/most-appropriate place... If I understood the current situation correctly:
are those assumptions correct? I'm new to borg and try to get my head around all this stuff, so please correct me if I'm wrong. What I'm thinking about is:
so what about introducing something like an 'incubation-period' aka: That would allow to save some space, prune now and then and have some kind of 'incubation-period' for me noticing that one of my clients got evil without it tampering with all my backups. Depending on the users choice and trust in their machines they could choose a reasonable 'incubation-period' for them to notice something went wrong before that could creep in their backups. As I couldn't get my head around the |
@MK-42 yes, that's correct. repo commits do not have timestamps, so we can't consider time. |
In -ao mode there is the transaction log which could be parsed back, but this sort of thing definitely requires RPC updates -> something for 1.1+ Also I'm not super-convinced that this would be a big improvement over simple -ao, since it requires even more knowledge of internals to grasp and is even harder to use. Either is stop-gappy... |
I created a $100 bounty. I encourage others who would find this useful to contribute! |
To reiterate the problem and make sure that I understand it correctly now, after reading the ~10 various currently existing issues loosely requesting new types of read-only/write-only/etc mode, they all seemingly stem from the fact that "--append-only" mode as it exists right now is mostly broken in real life usage. (It is not technically broken, as it does what says in the docs, but in reality most users will want combine it with pruning old data on the server, which will make every deletion/corruption previously masked and prevented by the append-only mode permanent. Thus if administrators want to use pruning, they are now expected to somehow inspect all repositories before every real prune (which is usually done often using a scheduling mechanism), which is completely unrealistic. The only real use case for append-only, the only time when it can prevent corruption/hack is when an attack has been detected immediately, and an administrator has been notified and reacted immediately to stop pruning batch jobs and started inspecting the state of the repository immediately after the attack. (Or if no prune commands are ever issued on a repository at all.)) The difficulty in implementing a fix seems to be rooted in the fact that the client-server model of Borg allows a client to issue low-level simple commands (who ever thought that up as a viable way to design it?) such as "PUT" or "GET" on individual blocks or indexes or repo files, and most of these commands are required for both creating, deleting, removing and purging at the same time, and so simply banning certain low-level commands does not work because they are used in a normal "create" command as well, and so banning them would prevent any operation (even creating a new backup). Does this assessment sound correct? If so, the only two ways we have to implement "real" append-only/write-only, in a meaningful way that many people expect is
Judging by the number of open issues, the breadth of discussion and the different ideas, the lack of consensus, the timespan, etc., solution 1) is proving to be very difficult to design implement. How far are developers from the decision to invest in the solution 2)? Is it a viable alternative at all, how much reorganization would it require? How long time would it require to implement? Can it be done? Would such a big change even be accepted as a pull request? |
@imperative Not exactly. The basic security model says that the server is the untrusted part. This is needed for (data at rest) encryption to be actually meaningful. So the server can not do much high level operations. This is on purpose. Of course the server always can drop data to make the backup disappear. I've outlined my view of this in #1772 (comment). Which i still think is viable. This adds a bit more trust to the server, as now the server sees encrypted archive data separately instead all in a big block, but this should be tolerable, because it is still encrypted and the previous usage patterns are likely to leak the exact same data for creates (assuming the crypto is good). prune/manifest compaction should not expose to much details either. In a situation with multiple (untrusted) clients accessing one repository it still has the problem that an evil client can poison the repository with chunks claiming an id that does not match the contained data. In my model the (weak) defense against this is having the client check random chunks. A secure defense would be to have a client keep track of validated chunks and download and validate each chunk that is needed in an archive that this client did not yet validate. For single client repos this is not really a problem as long as you keep in mind that only backups done before your client has been compromised are reliable. As those will always have their data already in the repository before the evil client comes along and already existing chunks can not be erased or replaced it can not spoil the old archives. (defend against the crypto malware use case) |
I am testing I would like to share an idea which I don't know if it is realistic. Could we implement the pruning in the server side?. If the server saves the last date of when a chunk was required by any archive, then maybe the server can delete the chunks that have a date older than the configured one for pruning. If a chunk belongs to 3 archives: 1 month old, 1 a week old and 1 a day old. The date for the chunk would be the one of the "day old" archive. If that chunk stops being used in new archives, it will retain that date so when a month (or whatever date) passes, the server can remove the chunk, bypassing the append-only. This way:
However, I don't know if this is feasible or if I am getting anything wrong (probably I am). Anyway I just wanted to share it. |
The server doesn't know when an archive references a chunk due to encryption. |
I guess so, but I was wondering if it could be possible for the server to store that information (the client could send it). The amount of information is minimal and it doesn't look that it could disclose anything about the contents of the backup. Only getting that information should allow to prune the archives server side which is a big improvement in security. The only additional information required is to associate a chunk with a date. |
Hello, I am very new to
If the above is correct, is there anything that can be to prevent this other than manually checking the diff between the current and to-be-pruned archives? I don't see really see effective countermeasures - most primitive would be if a file in the archive at least has a readable header (matching its extension). |
manifest entries (incl. archive timestamps) are generated clientside and are not readable by the server, because they are encrypted. |
#1772 (comment) my recent ideas still leave some problems unresolved: A malicious client could spam the repo with lots of "good looking", but fake and useless archives, making it hard for the admin to find the good archives. The encrypted manifest and archive contents are completely under client control. A helpful countermeasure would be if the server added a server-side timestamp to the manifest entries so that fake clientside-made timestamps could be recognized as fake. But even that doesn't help if malicious behaviour is not recognized for a longer time: an admin wanting to reclaim repo space might just prune away the good archives of the past and keep the more recent bad archives. |
I think making it "good" is better than waiting to make it perfect. As long as server side pruning could be controlled by the timestamps generated by the server, I think Specifically, I don't think we need a solution that solves the problem of malicious behavior not being recognized for a longer time. The specific use-case this ticket addresses is ransomware. A clever attacker might try to encrypt your data over a period of time, but as soon as anything remotely important to your organization is poisoined or encrypted, you will find out, and you will go to your backup and ensure you can restore it. For groups that are still sensitive to this problem, you can always set your prune policy to a longer time period. |
I like the ideas put forth in "Thought experiment". I don't see the use for the separate Below, I want to address some attacks and countermeasures. Attack: push out archivesMalicous client uploads empty archives to prune out the last good ones. Countermeasure: Refuse prune of archives with a negative delta in referenced size of data of more than X (not determining the actual chunks, only the amount of data they reference) (exception for the last archive, obviously) Conditions: Server can reliably determine the amount of referenced data from archive headers without the help of the client. Attack: push out good archives with bogus dataMalicious client uploads malicious archives of the same size of previous archives to prune out the last good ones. Countermeasure: To be pruned archives must reference at least X% of the same chunks as one of the archive that are to stay Conditions:
Attack: prevent deletionMalicious client uploads a completely unrelated archive to prevent rightful deletion. Countermeasure:
As the attacks and countermeasures lay out, you cannot defend against bogus data and allow deletion at the same time. The best you can do is to delay deletion by a reasonable amount of time to give the client side time to react to malicious activity. |
By thinking about these attacks more, I think there are some misconceptions regarding the pruning implementation. Right now, pruning works in keeping a number of archive versions. So the time interval to defend against a "push-out attack" depends on the attackers ability to generate new versions; the faster, the smaller is the defense interval. The attack model consists of three major components:
The simplest solution against the remaining attacks (the ones not already addressed by the upgraded append protocol) is to move checking the pruning contract from the client side to the server side and augment it with timestamps. This way, you can enforce that an archive must have at least an age of T before it can be pruned. The time span T is the recovery interval in which the untrusted client must check successful data recovery. This changes the attack model such that the trusted client must also trust the server's timestamps while informing the server about the intended pruning contract and recovery interval T. Trust could additionally be extended to fix the pruning contract and interval T at repository creation time or as
So, no other changes than keeping timestamps (and additionally, pruning parameters) on the server side are necessary to get comparable security guarantees. Also, this can be blown up to arbitrarily complex time-driven retention policies by making T a list of intervals. |
"server side pruning" is not possible, because archive content is encrypted and the server usually does not have the key. The archives reference the content chunks and chunks need to be refcounted, so borg only deletes what's not used (referenced) any more. Only the client can do that as it can decrypt the archive.
For this to work, we must make sure that a malicious client alone can not fake timestamps of archives, like generating a thousand of empty archives with fake timespans spreading over years. That could lead to all valid backups being pruned away in one prune run. |
@spacefrogg there needs to be a way to list the directory of archives. Currently it is easy, because the manifest contains a list of archive IDs stored at fixed chunkid 0, so it can be easily discovered and then we have the list we need. Problem is that adding/removing an archive needs a read-modify-write operation and thus locking of the repo to make this operation safe. if we just have the archives stored under their aID in the middle of many many data chunks, they are hard to discover and list. maybe we could use some extra bit to tag these chunks in the repo index, then it would be "only" a repo index scan (which is in memory). or, as i proposed, just put the manifest entries as separate files into a separate directory, then we can just list that directory. the countermeasures you propose require the server to know about archive contents. but this is not the case and not wanted for borg. content data and metadata in archives shall be encrypted and not readable by the server. |
@spacefrogg with the server only seeing mostly encrypted chunks (with the only exception of the server-generated timestamps) I don't think a server-side part of the prune policy can work. The server is just seeing DELs for misc. chunkIDs and does not really know what these chunks contain (content data or archive metadata). So I guess we just need that secure trusted client that does the pruning. |
^ we can't do such a big change in rc1. thus, if we do this, it will be in b6 (and it will likely delay borg2 quite a bit due to the amount of changes and testing required). |
When you consider listing the archives in a separate directory, then the list of archives is not secret to the server. If it is not secret, why not let the server manage chunkid 0? Change the protocol such that the untrusted client just announces the aID to append to the list of archives. Another way would be to use a manifest but keep it under its natural chunkid and not chunkid 0. The server generates a new manifest from the current one, either adding an archiveID or admitting a pruning request. The old manifest's chunkid is no longer referenced and it will get garbage collected. No, read-modify-write involved. N.B. That's basically how git tree objects work. |
That separate directory (or in general: namespace) would just show a list of IDs, but not archive names. The only "new thing" that is disclosed is if we add these server-side added timestamps. |
I know the struggle of such decisions, so I was thinking about whether borg2 might be able to prevent attackers from exploiting this limitation by using organizational measures - and I believe that the new Detailed explanation: The borg client running on the backed up server connects to the storage server (running with About 6,5 years ago @enkore more or less had the idea to create a backup system on top of a borg repo - and with The secondary repo can have its own completely separate retention policies. To prevent archive timestamp tampering we should never transfer archives that report to be older than the latest archive in the secondary repo (again, this might be a matter of a shell script running Storing a third copy of the data off-site (i.e. a backup of the backup) is best practice (following the "3-2-1 backup rule") anyway. Since I don't know the implementation of |
@PhrozenByte the default of But you would for implementing that "only newer from src than latest in dst" policy (maybe
One problem of course persists: as soon as you start deleting/pruning archives, you should be sure to not delete stuff you still need (e.g. because all newer archives only contain crap). |
By limiting the transfer to archives that are newer than the latest archive in the secondary repo, only new archives (in both terms of "didn't exist before" and "of a later date") can reach the secondary repo. Therefore an attacker can't "inject" older crappy archives to the secondary repo to tamper with the retention policy, thus e.g. a The crucial point with a retention policy is that you must ensure that you keep archives from before the attack - always, even with the best "safe append-/readonly-mode" imaginable. If you think that you will detect any attack within at least 30 days, you must keep at least one 31 days old archive. Verifying This workaround surely is no solution to the issue, but prevents damage in the worst imaginable situation in which the backed up server is the malicious client, with existing code (except for |
If the transfer job is run regularly, that might be easier to use than managing the timestamp using additional scripting (and for that, borg would also need something like |
By definition the server knew what state the server was in. Perhaps it is easier to backup the server (obviously this wouldn't work if we wanted the server to enforce complex retention strategies, but enough for simple "n days ransomware protection"). Btrfs snapshots should work. overlayfs trickery seems to work too. On a 213M borg repository running |
Wait, why wouldn't |
The docs claim append-mode can be used to used to prevent hacked clients from permanently altering existing archives. This can be achieved be granting only append-mode access to the client. Then changes to the repository are appended to the transaction log/journal and can be reverted by removing the lastet transactions from the journal.
First, this kind of manual roleback is not state-of-the-art. ;)
Second, disk space is not infinite. Sooner or later a trusted client (or the server) itself will need to free disk space. This requires "true" write access to repository and is done by prune. However archives that have been marked as (to-be-)deleted in append-mode will be wiped out by prune even if the retention policy specified along with the prune invokation should have preserved them.
See: #1689 and #1744
Therefore the trusted client the invokes prune on the repository is responsible for checking the integrity of the repository. But how could the be achieved? When a trusted client runs prune at a time when a hack of a client was not detected yet the prune action will apply any malicious trancations permanently. Then even archives might be purges or compromised that have been created before the hack and should not have been purged according to the retention policy. This would make desaster recovery from borgbackup based backups impossible.
I would like to suggest the implementation of a (new) safe append-, readonly-, worm-mode or whatever-mode that restricts clients to add new archives and rejects any action that would delete or change existing archives. Prohibited actions should be rejected immediately and therefore should not go into the journal at all.
The text was updated successfully, but these errors were encountered: