Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.

Make switchToSharding more efficient #120

Merged
merged 1 commit into from
Jun 16, 2022

Conversation

hsanjuan
Copy link
Contributor

When automatically switching a BasicDirectory to a HAMTDirectory because it
grew too big, the current code:

  • loops every link in the BasicDirectory
  • reads each node referenced by those links
  • adds the nodes to a new HAMTDirectory shard, which in turn:
    • writes the nodes to the DAG service (they were just read from there!)
    • makes a link out of them (identical to the link in the BasicDirectory!)

This would happen to about (~4000 nodes), which are fully read and written for nothing.

This PR adds a new SetLink method to the HAMT Shard which, instead of taking
an ipld.Node like Set(), takes directly an ipld.Link. Then it updates
switchToSharding() to pass the links in the BasicDirectory directy, rather
than reading all the nodes.

Note that switchToBasic() works like this already, only using the links in the
HAMT directory.

When automatically switching a BasicDirectory to a HAMTDirectory because it
grew too big, the current code:

* loops every link in the BasicDirectory
* reads each node referenced by those links
* adds the nodes to a new HAMTDirectory shard, which in turn:
  * writes the nodes to the DAG service (they were just read from there!)
  * makes a link out of them (identical to the link in the BasicDirectory!)

This would happen to about (~4000 nodes), which are fully read and written for nothing.

This PR adds a new SetLink method to the HAMT Shard which, instead of taking
an ipld.Node like Set(), takes directly an ipld.Link. Then it updates
switchToSharding() to pass the links in the BasicDirectory directy, rather
than reading all the nodes.

Note that switchToBasic() works like this already, only using the links in the
HAMT directory.
@hsanjuan hsanjuan self-assigned this Jun 15, 2022
Comment on lines +244 to +246
// FIXME: We don't need to set the name here, it will get overwritten.
// This is confusing, confirm and remove this line.
newLink.Name = ds.linkNamePrefix(0) + name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy-pasted from Set() above.

Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks 🙏.

// given link. This avoids writing the given node, then reading it to making a
// link out of it.
func (ds *Shard) SetLink(ctx context.Context, name string, lnk *ipld.Link) error {
hv := newHashBits(name)
Copy link
Contributor

@aschmahmann aschmahmann Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not required, but if in the future someone wants to extract this into reusable ds.SwapLink function for similar types of optimizations they can do so. Although it's really just a few copied lines of code.

@aschmahmann aschmahmann merged commit e0ef9c4 into master Jun 16, 2022
@aschmahmann aschmahmann deleted the feat-improve-perf-hamt-switching branch June 16, 2022 01:07
@hsanjuan
Copy link
Contributor Author

Thank you for merging so quickly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants