Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Fix/minimize rebuild #15

Merged
merged 9 commits into from
Jul 29, 2021
Merged

Fix/minimize rebuild #15

merged 9 commits into from
Jul 29, 2021

Conversation

gammazero
Copy link

This PR builds on, and is a replacement for PR #13.

This PR addresses two issues with the pinner:

  1. An untended shutdown of IPFS can lead to saving pins in an incomplete state, requiring rebuilding indexes at the next start up.
  2. A defect in the pin index rebuild process caused all indexes to be rebuilt, whether needed or not.

These solutions to these problems:

  1. Pins are automatically synced after each modification to pins and indexes. This default behavior can be disabled and re-enabled if needed for high-volume operations. Additionally, syncing pins is done separately from syncing the dag service, so a long duration dag service sync does prolong the time that pins are in an incomplete "dirty" state.
  2. The rebuild process fixes a problem with setting the correct ID when loading pins. Additionally, large numbers of pin and index data are not loaded into maps during this process.

Additional tests have been added to test the correctness of indexes and the rebuild process.

This PR builds on PR #13 to sync the pinner on every pin operation. This PR does the following that #13 does not:

- Sync's dag service separately from pin data
- Does not release and immediately reacquire lock to sync, syncs while still holding pinner lock.
- Syncs only pin data for most operations

In addition to sync of pin data, this PR also revises how indexes are rebuilt.  Instead of reading through all pins and loading all previous index data, now only a single pass through the pins is needed to rebuild missing indexes resulting from incomplete add or delete operations. This is operationally much simpler, but also does not require storing entire pin sets or index sets in memory as did the previous solution.
Fixes an error that would cause all indexes to be rebuilt whether needed or not when dirty flag detected.  This would greatly prolong the rebuilding process as every index for every pin would be rewritten.  No other problems resulted, othen than the unnesessary rebuild of all indexes and large amount of memory used during rebuild for large pin sets.

Tests have been added to verify the correct functioning of the indexer and rebuilding process.
Copy link

@petar petar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks reasonable. The highest value here appears to be the more efficient index rebuilding. This pinner implementation is becoming hard to review and verify. Going forward it should teach us to do the right thing: use transactions or write-ahead logging. I am suspecting that every bug fix iteration on this implementation is comparable in time to re-implementing the whole thing using transactions. In any event, this looks better than the previous version!

dspinner/pin.go Outdated Show resolved Hide resolved
dspinner/pin.go Show resolved Hide resolved
dspinner/pin.go Outdated Show resolved Hide resolved
dspinner/pin.go Show resolved Hide resolved
dspinner/pin.go Outdated Show resolved Hide resolved
@gammazero
Copy link
Author

Using transactions for this is the better fix. However, this means that the pinner must be given a transaction-capable datastore, and the pinner will need to be able to get the transaction interface from the datastore -- which is not difficult to solve. However, if there are pinners currently using datastores that do not support transactions, then this will require a migration to a new datastore, and that may require a larger scoped change.

@gammazero gammazero requested a review from petar July 19, 2021 20:28
Copy link

@petar petar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@aschmahmann should we deploy in the clusters before merging to master?

@aschmahmann
Copy link

Running some live tests here seems reasonable to me if we're pretty confident that things should work and we're close to merge (i.e. we don't want to mess up the cluster nodes if we can avoid it). I would only deploy to one at a time in case there are issues.

As for transactions. I wouldn't worry too much about little using non-transactional datastores, although we can take care of this in a subsequent PR. The main datastores that I know of that are in use here are LevelDB and Badger both of which should support transactions. Maybe there's also a map datastore, but we could either make one that supports transactions or just use the in memory levelDB datastore.

dspinner/pin.go Outdated
@@ -981,15 +981,15 @@ func (p *pinner) rebuildIndexes(ctx context.Context) error {
indexer = p.cidRIndex
// Delete any direct index
err = p.cidDIndex.Delete(ctx, indexKey, pp.Id)
log.Infof("deleting stale pin index for cid %v", pp.Cid.String())
log.Errorf("deleting stale pin index for cid %v", pp.Cid.String())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to log the error message whether or not a stale pin was actually found. Perhaps Delete needs to return a boolean, or there should be a find for the pin first.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not log deletions for now. I'll update the PR.

@petar petar merged commit f708928 into master Jul 29, 2021
@gammazero gammazero deleted the fix/minimize-rebuild branch July 30, 2021 16:39
@aschmahmann aschmahmann mentioned this pull request Aug 23, 2021
62 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants