-
Notifications
You must be signed in to change notification settings - Fork 187
WIP: Refactor bitfield to replicator & reduce bitfield memory by relying on storage #736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
lejeunerenard
wants to merge
36
commits into
main
Choose a base branch
from
refact-bitfield-to-replicator
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now that the replicator is updating the bitfield in `onhave`, signaling the replicator also updates the in-memory bitfield.
mafintosh
reviewed
Oct 1, 2025
lib/core.js
Outdated
|
||
this.replicator.onupgrade() | ||
this.replicator.onhave(start, length, drop) | ||
this.replicator.onupgrade() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be before onhave so it doesn’t signal out of bounds blocks
Everywhere that updates or reads the bitfield needs to be async since it's going to storage. So many internal functions inherited the bitfield's async.
Awaiting to reduce potential timing issues for reading from the bitfield.
`_updateNonPrimary()` reads from the bitfield for clamping the range request via `clampRange()`. This clamping can hit a race condition where it is updated via `onhave`'s `_setBitfieldRanges()` mid reading. This causes the clamp to revolve the requests before the "primary" can respond etc.
With the change to the bitfield being asynchronous, the previous synchronous's implicit batch reading & writing ensure no conflicts from writing while reading. To enforce that operations are sequential an internal lock was added so if operations are called without awaiting or are called between event loops because of an external message etc, they will still not execute simultaneously. To protect against read & write operations interleaving when they are intended to be sequential, an external lock was added to claim the bitfield roughly per protocol message. Because this theoretically will cover the internal lock scenario by keeping access to a single chain of async calls, it might be possible to remove the internal lock in the future. Ideally these wont be necessary but currently they solve the above issues.
Enabling locks here fixes the test `bigger download range` in `test/replicate.js`. This test would flake when the download event would trigger after the request was resolved. This happened from `_updateNonPrimary()` triggering the range request resolved before the primary processing could emit due to race condition where the bitfield was updated while reading giving a false clamped range. The bitfield was primarily updated in `core.verify()` and setting a lock around the entire `ondata` call chain enabled other `data` messages to queue up instead of verifying blocks and updating the bitfield before previous requests could respond.
Used for bitfield locks.
The `_request*` methods are assumed to be synchronous in the replication state machine. To avoid converting the entire state machine to async, the pages are loaded preemptively and checked synchronously. This currently happens in `_requestSeek()` & `_requestRangeBlock()`.
Ensures the bitfield remains unchanged while iterating through want call.
Part of the previous commit to await all `replicator.onupgrade()` calls.
Without this lock, the `_update()` call in the `onsync` event doesn't know about the remote bitfield update. So while `onsync` doesn't access the `localBitfield` it does rely on updates to the `remoteBitfield` which happen along with the `localBitfield` updates. Fixes the 'restore after cancelled block request' test which could fail because the sync after the new append wouldn't cause a `block` request since the `b` peer assumed the `a` peer doesn't have the block. Since the test waits for an append event (which doesn't mean the `upload` event on `a` will be triggered) and the connection is destroyed afterwards.
Because `.broadcastRange()` called during `onopen` is now async, the peer can be added to the replicator after a sync close is called on the protomux channel. Since `onclose` assumed synchronous calls, it assumes the peer is already added before it's closed. With it not added yet, the peer isn't removed from the replicator and `replicator.destroy()` will loop forever. Added a destroy method to the bitfield for destroying the locks. Not required for the fix, but should be reasonable to destroy regardless.
Now that bitfield operations go to disk and not just to memory, they are slower and need more time for larger number of blocks.
Caused timing errors when it attempts to read from storage but it has been closed.
Now that checking the bitfield is async which makes iterating through ranges async, the `_ranges` array can be mutated elsewhere while awaiting. This means the current index of the range request can be inaccurate when resolving the request. To prevent this, the request index is looked up synchronously when the request is resolved. This way the index is accurate and another request isn't potentially clobbered by the popped head. Since this logic already exists to unref the request (for gc'ing & cancelling), the `_unref()` is reused. A success boolean is needed to update the index in the `_updateNonPrimary()` ranges loop. So all request `_unref()`s are updated to return a success bool.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR currently is only the first step of moving the bitfield into the replicator. Since the bitfield is only required by the replicator for checking local vs remote blocks.
This refactor will require updating the following known locations:
Hyperbee
'spreload
Hyperblobs
's prefetcher_update