-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
borg create without a cache – Prototype, Do Not Merge #2350
Conversation
# All chunks from the repository have a refcount of MAX_VALUE, which is sticky, | ||
# therefore we can't/won't delete them. Chunks we added ourselves in this transaction | ||
# (e.g. checkpoint archives) are tracked correctly. | ||
init_entry = ChunkIndexEntry(refcount=ChunkIndex.MAX_VALUE, size=0, csize=ChunkIndex.MAX_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csize=0 is another safe choice and has a more compact representation (1 byte vs 5 bytes).
self.files[path_hash] = msgpack.packb(entry) | ||
self._newest_mtime = max(self._newest_mtime or 0, st.st_mtime_ns) | ||
def commit(self, config): | ||
config.set('cache', 'manifest', 'not in sync') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this must be a hex string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not review all the size/csize stuff.
@@ -2329,6 +2331,8 @@ def process_epilog(epilog): | |||
help='only display items with the given status characters') | |||
subparser.add_argument('--json', action='store_true', | |||
help='output stats as JSON (implies --stats)') | |||
subparser.add_argument('--avoid-cache-sync', dest='avoid_cache_sync', action='store_true', | |||
help='Avoid synchronizing the local cache') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"avoid" sounds a bit like it would still do it for some cases, just not for most.
--no-chunks-cache-sync
?
@@ -32,7 +32,7 @@ | |||
TAG_DELETE = 1 | |||
TAG_COMMIT = 2 | |||
|
|||
LIST_SCAN_LIMIT = 10000 # repo.list() / .scan() result count limit the borg client uses | |||
LIST_SCAN_LIMIT = 100000 # repo.list() / .scan() result count limit the borg client uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you check the other places where this is used, whether it could have some negative impact?
@@ -180,6 +182,7 @@ def __init__(self, repository, key, manifest, path=None, sync=True, do_files=Fal | |||
self.timestamp = None | |||
self.lock = None | |||
self.txn_active = False | |||
self.txn_set = ['config'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: naming a list a set is a bit strange. also, I usually avoid having a data type in the name.
txn_files maybe?
if do_files: | ||
self.files = FilesCache(self) | ||
else: | ||
self.files = DummyFilesCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
files = FC if do_files else DFC
return None | ||
|
||
def memorize_file(self, path_hash, st, ids): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has no return value, so just use "pass"? (see same below in commit())
""" | ||
Return whether a chunk with *id* was seen. Optionally verify *size* for | ||
enhanced collision resistance. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this seems to either return a boolean (seen / not seen) or an int (refcount, >0 = seen, 0 = not seen).
Can we avoid the type confusion?
result = self.cache.repository.list(limit=LIST_SCAN_LIMIT, marker=marker) | ||
if not result: | ||
break | ||
pi.show(len(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this should be:
pi.show(increase=len(result))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, if you have rather huge LIST_SCAN_LIMIT, this will be a rather jumpy progress indicator.
ofc, it depends on repo size...
def add_chunk(self, id, chunk, stats, overwrite=False): | ||
if not self.txn_active: | ||
self.begin_txn() | ||
assert not overwrite, 'Logic Bug' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a few more words than just 'Logic Bug'?
raise Exception("chunk has same id [%r], but different size (stored: %d new: %d)!" % ( | ||
id, stored_size, size)) | ||
return refcount | ||
return id in self.chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool vs. int, see above.
#2313