Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg2: there is no csize #6763

Merged
merged 12 commits into from
Jun 14, 2022
Merged

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Jun 11, 2022

"what would we lose/win if we would just not track csize (the compressed size of a chunk)", see #2357.

  • do not have csize in the item.chunks list entries
  • do not have csize in the ChunkIndex entries
  • do not track csize in Statistics
  • but: my other PR borg2: repoindex improvements #6705 would make borg track csize in the repo index

What we lose (some stuff could be just done differently):

  • archive info: compressed size and deduplicated-compressed size stats (the latter is based on csize and refcount)
  • archive list: placeholders for csize / dcsize

What we win:

  • get rid of some ugly / complicated code
  • repo-scope recompression would be easily doable, because there is no csize to "fix" in the archives (and in the chunks index)
  • no super expensive fetch_missing_csize and no zero_csize_ids any more for the (still experimental) AdHocCache
  • 10% less memory needs for the chunks index and the item.chunks list (in memory and in the stored archive)

Also:

  • I reimplemented the deduplicate size info, but now based on the uncompressed sizes.

@sophie-h
Copy link
Contributor

Pika Backup currently repots an 'additional backup space used' during backup. It reports the deduplicated size, I'm not even sure how the compression is counted here.

I don't care that much about compression here but the deduplicated size is was always a good indicator for me during a backup: If it's suddenly uploading a lot of data but you would have expected everything to be covered by deduplication, you might have missed excluding something.

However, this all does not matter during scheduled backups.

@ThomasWaldmann ThomasWaldmann force-pushed the remove-csize branch 4 times, most recently from f745242 to 4ac880e Compare June 12, 2022 14:00
@ThomasWaldmann
Copy link
Member Author

@sophie-h I re-added the "deduplicated size", just changed its meaning:

It is now based on the uncompressed size (this is because we do not have the compressed size any more). So this can still be used as an indicator for sudden unexpected backup growth (because some exclude is missing or some crap unintentionally got backed up).

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jun 12, 2022

What's currently missing is an indication of how good the compression worked (info, stats, placeholders).

But guess we could re-add this later in a different way. After #6689 is merged, we have the "csize" (or something closely related) in the repo index, so after adding some api method, we could compute some statistics based on that.

Or, maybe even more efficient: we could just log compressor stats at the end of a backup.

@ThomasWaldmann ThomasWaldmann changed the title there is no csize borg2: there is no csize Jun 12, 2022
@ThomasWaldmann ThomasWaldmann merged commit fc8a289 into borgbackup:borg2 Jun 14, 2022
@ThomasWaldmann ThomasWaldmann deleted the remove-csize branch June 14, 2022 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants