Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Amazon S3? #136

Open
victorhooi opened this issue Nov 24, 2014 · 13 comments
Open

Support for Amazon S3? #136

victorhooi opened this issue Nov 24, 2014 · 13 comments

Comments

@victorhooi
Copy link

Is there any way at all to use Amazon S3 as a remote store? I gather that it's built on git, and hence currently it works as long as your remote host has SSH and the git command.

But I'm curious if a S3 backend would be feasible at all?

@positron
Copy link

I had this idea as well. Obviously using plain S3 wouldn't work since you need something running on the back-end, but using the newly announced AWS Lambda might allow you to use S3 as a file store and short-running lambda processes to do the backend processing.

Caveat: I know next to nothing about attic or Lambda.

@siteroller
Copy link

+1 - If there was some way to support Amazon it would be hugely important.
Considering that even a little bit of rot will destroy a huge amount of backups, it is overly important that the backup is stored in the most sturdy way.
Amazon offers rolling redundancy for $0.03/GB/month or less, and I would trust Amazon much more than I would any other VPS, even if I could find one that offered 100GB space for the same $3/mo.
Even better would be a way to back up to glacier. Meanwhile this is a deal breaker.

@Ernest0x
Copy link
Contributor

Perhaps it would help if those of you who are interested could try something like s3ql and report the results.

@siteroller
Copy link

First I heard of S3QL, looks neat, but haven't been able to get it to install to 12.04.
Will post once we have results.

Considering that S3QL does dedup and encryption, what does Attic add in this case?
Why S3QL instead of any of the many alternatives?

@jdchristensen
Copy link
Contributor

victorhooi: attic is not based on git. It needs to have a copy of attic running on the remote host, so I don't think it could use Amazon S3 directly. But if you mount some amazon s3 storage locally, e.g. using S3QL, it should work fine. (It would be interesting if someone tested this and commented on how fast it is.)

siteroller: I don't know much about S3QL, but it sounds to me like it does deduplication based on fixed block positions. Attic uses a rolling hash method to determine block boundaries, so it should be more space efficient.

@Ernest0x
Copy link
Contributor

@siteroller: S3QL and Attic are two different things. On one hand, Attic is an archiving utility that needs a filesystem to create its repositories on. On the other hand, S3QL can provide such a filesystem on top of S3 (or other storage services). I have not tested this combination myself, there may be some performance issues, but it seems a feasible scenario for those who want to use S3. Of course, as you said, there are alternatives to S3QL. So far, I am not aware of any results with either S3QL or any other alternative.

The extra layers (deduplication, encryption, etc.) that S3QL provides may be useless when used in combination with Attic, since Attic does these things too at a repository level. So, it makes sense if they can be turned off at filesystem level in order to reduce performance penalties. That said, if it was possible to turn off compression of Attic repositories (currently it is not), it would be an interesting experiment to see what would be the benefit of deduplication at file system level (done by S3QL) in combination with deduplication at repository level (done by Attic), so that data across multiple Attic repositories is deduplicated too (at file system level).

@jscinoz
Copy link

jscinoz commented Apr 7, 2015

@positron Unfortunately, Amazon's Lambda functions only officially support nodejs (although python is available in the environment), and can only live for 60 seconds per request. It could be possible to hack something together that did a request per block or per X blocks, but I imagine that'd require some fairly substantial changes to attic.

It could be possible to use boto to instead automatically spin up a minimal docker image on ECS to run attic serve, and once any necessary processing was complete, store data in S3, but this seems a bit of a hack and is sure to be inefficient. You could explore more involved solutions, but potentially more efficient, such as analysing the repository via an ECS hosted instance of attic, but then requiring the client to insert files directly into s3, but this quickly grows rather complex, and I'd argue is somewhat beyond the scope of a simple backup tool.

It may well be best to backup to a locally mounted s3ql instance. although, as @Ernest0x has pointed out, this does result in some duplication of functionality (compression, encryption & de-duplication)

@geckolinux
Copy link

Hi everyone,

I'm interested in using Attic to backup my webserver to an Amazon S3 bucket. I've been using Duplicity, but I'm sick of the full/incremental model, as well as the difficulty of pruning backups. I love the ease of use and features that Attic provides, but I don't really understand the internals and I'm not sure if it will work with Amazon S3 storage.

Specifically, I'm considering mounting my S3 bucket over FUSE, using one of the following three options:

Any comments on which, if any would be more appropriate? And how tolerant would Attic be to S3's "eventual consistency" weirdness?

Additionally, I need to plan against the worst-case scenario of a hacker getting root access to my server and deleting the backups on S3 using the stored credentials on my server. To eliminate this possibility, I was thinking about enabling S3 versioning on the bucket so that files deleted with my server's S3 user account can still be recovered via my main Amazon user account. Then, I would have S3 lifecycle management configured to delete all versions of deleted files after X amount of time. In this case,

  • How much of my S3 data would Attic routinely need to download in order to figure out which files have changed and need to be backup up? (I'm worried about bandwidth costs.)
  • How much accumulated clutter and wasted space could I expect from files that Attic "deletes", (which will actually be retained on S3 due to the versioning)?

Again, my concerns are based on me not really understanding all the black magic that happens with all the chunks and indexes inside a Attic repository, and how much they change from one backup to the next.

Thanks in advance for the help!

@ammojamo
Copy link

ammojamo commented Aug 5, 2015

Another possible solution, what about using s3cmd sync?

This would involve making a backup first to a local directory, then running s3cmd sync --remove-deleted ... to sync the local directory to s3.

Caveats:

  • I haven't tested this
  • Not sure how efficient it would be for large backups

@positron
Copy link

positron commented Aug 5, 2015

@sb56637 Not related to this issue, but you should use IAM roles so if a hacker gets root access to your server the only permissions he has is s3:PutObject to a single bucket.

@ovizii
Copy link

ovizii commented Nov 6, 2015

I'm a bit late to the party but I only discovered borgbackup now :-)

I'm about to convert from duplicity which I must say has been doing a brilliant job for me for years because it supports all sort of dumb remotes.

So my question is:

Are there any other remotes on your roadmap for borgbackup? i.e. S3 or SFTP?

I'm about to start using borgbackup and can either use NFS instead of (S)FTP and am also about to give iSCSI a try.

@ThomasWaldmann
Copy link
Contributor

@ovizii please note that this is the issue tracker for attic.

@ovizii
Copy link

ovizii commented Nov 6, 2015

LOL, really sorry about that, will locate the appropriate place and ask again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants