Skip to content

gzip backup to reduce file size #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stvhanna opened this issue Sep 8, 2021 · 8 comments
Closed

gzip backup to reduce file size #7

stvhanna opened this issue Sep 8, 2021 · 8 comments

Comments

@stvhanna
Copy link

stvhanna commented Sep 8, 2021

Hi Elliott, thanks for creating a useful tool! What are your thoughts on adding an option (can be even the default setting) to gzip the database backup dump to reduce the file size?

This feature is beneficial because it reduces the required storage and thus saves money. Thanks again @eeshugerman for your hard work on this!

@eeshugerman
Copy link
Owner

Howdy! We use pg_dump's custom format, which is already compressed by default (docs), so gzip on top of that would only add compute cost.

@eeshugerman
Copy link
Owner

eeshugerman commented Sep 8, 2021

It might make sense to expose the compression level setting, but in my experience that's usually best left untouched. (Nevermind, one could use PGDUMP_EXTRA_OPTS for this.)

@stvhanna
Copy link
Author

stvhanna commented Sep 8, 2021

@eeshugerman You're right the PG docs clearly state that pg_dump is "compressed by default". I'm comparing the backup size of the same small database using your s3 remote backup tool and this local backup tool (https://github.com/prodrigestivill/docker-postgres-backup-local), which uses gzip.

The backup size of your tool is 245KB compared to 20KB of the other tool. I'm surprised by the results since I expect PG's default compression to be comparable as good as gzip, if not better. If you have time, can you run that test to confirm?

@eeshugerman
Copy link
Owner

Dang, that's a big difference! Yep, I'll look in to it.

@eeshugerman
Copy link
Owner

eeshugerman commented Sep 8, 2021

@stvhanna Would you mind testing with a with a larger DB, say a few hundred MBs? I'm wondering if this is just a sort of overhead that becomes insignificant at scale.

@stvhanna
Copy link
Author

stvhanna commented Sep 8, 2021

@eeshugerman Good point, let me try a larger DB and report back.

@stvhanna
Copy link
Author

stvhanna commented Sep 9, 2021

@eeshugerman On a 1GB DB, your tool compressed the backup to 480MB while the other tool did 457MB, so you are right on the guess that it would be mostly overhead. I don't see a reason to change your implementation as the difference is not significant. Thanks for prompt responses and a great tool! You can close this issue. :)

@eeshugerman
Copy link
Owner

Got it, thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants