Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg create without a cache – Prototype, Do Not Merge #2350

Closed
wants to merge 4 commits into from

Conversation

enkore
Copy link
Contributor

@enkore enkore commented Mar 28, 2017

@enkore enkore changed the title borg create without a cache borg create without a cache – Protype, Do Not Merge Mar 28, 2017
@enkore enkore changed the title borg create without a cache – Protype, Do Not Merge borg create without a cache – Prototype, Do Not Merge Mar 28, 2017
# All chunks from the repository have a refcount of MAX_VALUE, which is sticky,
# therefore we can't/won't delete them. Chunks we added ourselves in this transaction
# (e.g. checkpoint archives) are tracked correctly.
init_entry = ChunkIndexEntry(refcount=ChunkIndex.MAX_VALUE, size=0, csize=ChunkIndex.MAX_VALUE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csize=0 is another safe choice and has a more compact representation (1 byte vs 5 bytes).

self.files[path_hash] = msgpack.packb(entry)
self._newest_mtime = max(self._newest_mtime or 0, st.st_mtime_ns)
def commit(self, config):
config.set('cache', 'manifest', 'not in sync')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be a hex string

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review all the size/csize stuff.

@@ -2329,6 +2331,8 @@ def process_epilog(epilog):
help='only display items with the given status characters')
subparser.add_argument('--json', action='store_true',
help='output stats as JSON (implies --stats)')
subparser.add_argument('--avoid-cache-sync', dest='avoid_cache_sync', action='store_true',
help='Avoid synchronizing the local cache')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"avoid" sounds a bit like it would still do it for some cases, just not for most.

--no-chunks-cache-sync ?

@@ -32,7 +32,7 @@
TAG_DELETE = 1
TAG_COMMIT = 2

LIST_SCAN_LIMIT = 10000 # repo.list() / .scan() result count limit the borg client uses
LIST_SCAN_LIMIT = 100000 # repo.list() / .scan() result count limit the borg client uses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check the other places where this is used, whether it could have some negative impact?

@@ -180,6 +182,7 @@ def __init__(self, repository, key, manifest, path=None, sync=True, do_files=Fal
self.timestamp = None
self.lock = None
self.txn_active = False
self.txn_set = ['config']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: naming a list a set is a bit strange. also, I usually avoid having a data type in the name.

txn_files maybe?

if do_files:
self.files = FilesCache(self)
else:
self.files = DummyFilesCache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files = FC if do_files else DFC

return None

def memorize_file(self, path_hash, st, ids):
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has no return value, so just use "pass"? (see same below in commit())

"""
Return whether a chunk with *id* was seen. Optionally verify *size* for
enhanced collision resistance.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this seems to either return a boolean (seen / not seen) or an int (refcount, >0 = seen, 0 = not seen).
Can we avoid the type confusion?

result = self.cache.repository.list(limit=LIST_SCAN_LIMIT, marker=marker)
if not result:
break
pi.show(len(result))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this should be:

pi.show(increase=len(result))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, if you have rather huge LIST_SCAN_LIMIT, this will be a rather jumpy progress indicator.
ofc, it depends on repo size...

def add_chunk(self, id, chunk, stats, overwrite=False):
if not self.txn_active:
self.begin_txn()
assert not overwrite, 'Logic Bug'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a few more words than just 'Logic Bug'?

raise Exception("chunk has same id [%r], but different size (stored: %d new: %d)!" % (
id, stored_size, size))
return refcount
return id in self.chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool vs. int, see above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants