Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to new merkledag walk functions #6499

Merged
merged 1 commit into from
Jul 16, 2019
Merged

Conversation

Stebalien
Copy link
Member

EnumerateChildrenAsync has been renamed to WalkParallel to reflect the fact that:

  1. It visits the root.
  2. It's parallel, not async.

To mirror this change, EnumerateChildren has also been renamed to Walk and now behaves the same (except that it's not parallel).

@MichaelMure
Copy link
Contributor

You already merged ipfs/go-merkledag#41 so I'll comment here.

First, considering how easy it is to add the root cid before using a graph walking function, I'd say that enumerating only children is more useful/clean as a reusable algorithm. Filtering the root after the fact works, but I'm sure you'll agree it's not that great.

With the renaming forcing a re-evaluation of all the call sites, what about providing a more complete set of functions ?

  • EnumerateChildren <-- no root, same as before
  • EnumerateChildrenDepth
  • EnumerateChildrenParallel <-- what EnumerateChildrenAsync should have been
  • EnumerateChildrenDepthParallel
  • Walk <-- with root
  • WalkDepth
  • WalkParallel
  • WalkDepthParallel

At Infura, we actually have an additional need: we have a bunch of bad blocks that make walking the graph fail. An example: Error: missing an unmarshaller for tag 1. Arguably, those could/should be cleaned manually but I'd say that having a bunch of process inside go-ipfs failing as soon as there is a bad block somewhere (GC in particular) is no fun.

Having a way to make those walk not fail when a bad block is found would be very useful. Now, having 16 functions is no fun either, so maybe it's time to leverage the functional options pattern ? It would also allow to get more control for, say, the concurrency factor, a depth limit ...

type WalkOptions struct {
	WithRoot    bool
	MaxDepth    int
	StopOnError bool
	Concurrency int
}

type WalkOption func(*WalkOptions)

func WithRoot() WalkOption {
	return func(walkOptions *WalkOptions) {
		walkOptions.WithRoot = true
	}
}

func ContinueOnError() WalkOption {
	return func(walkOptions *WalkOptions) {
		walkOptions.StopOnError = false
	}
}

func Concurrent() WalkOption {
	return func(walkOptions *WalkOptions) {
		walkOptions.Concurrency = defaultConcurrentFetch
	}
}

func Concurrency(worker int) WalkOption {
	return func(walkOptions *WalkOptions) {
		walkOptions.Concurrency = worker
	}
}

...

We would end up with only two base functions (Depth has a different visit function):

func Walk(ctx context.Context, getLinks GetLinks, c cid.Cid, visit func(cid.Cid) bool, options ...WalkOption) error {
	return nil
}

func WalkDepth(ctx context.Context, getLinks GetLinks, c cid.Cid, visit func(cid.Cid, int) bool, options ...WalkOption) error {
	return nil
}

@MichaelMure
Copy link
Contributor

@Stebalien what do you think about my proposal ?

Copy link
Contributor

@michaelavila michaelavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads much better. I ran CI again as the sharness tests had some unrelated failures (they went away). FWIW, I also think @MichaelMure's proposal is interesting. I like the idea of capturing some of these things as functional options.

@Stebalien
Copy link
Member Author

In general, I completely agree that we need better dag traversal functions. The goal of this PR was to fix the immediate inconsistencies.

I chose Walk over EnumerateChildren as:

  1. It was cleaner to adapt EnumerateChildrenAsync to WalkParallel than EnumerateChildrenParallel (I'm lazy).
  2. 7/9 uses I've seen work better with Walk than with EnumerateChildren (and the other two cases were actually just duplicates of eachother).

Note: IPLD actually has a more complicated read-ahead, in-order walker: https://godoc.org/github.com/ipfs/go-ipld-format#Walker (example https://github.com/ipfs/go-unixfs/blob/06cb1c76adea6c76310b8b8c15444d06db02e745/io/dagreader.go#L171). However, it's complicated enough that I haven't bothered to try using it.

@Stebalien Stebalien force-pushed the fix/enumerate-children branch 2 times, most recently from c39e608 to bfd56d7 Compare July 16, 2019 23:32
EnumerateChildrenAsync has been renamed to WalkParallel to reflect the fact
that:

1. It visits the root.
2. It's parallel, not async.

To mirror this change, EnumerateChildren has also been renamed to Walk and now
behaves the same (except that it's not parallel).
@Stebalien Stebalien force-pushed the fix/enumerate-children branch from bfd56d7 to 9738d81 Compare July 16, 2019 23:34
@Stebalien
Copy link
Member Author

I'm going to merge this as it strictly improves things.

@Stebalien Stebalien merged commit 920572b into master Jul 16, 2019
@Stebalien Stebalien deleted the fix/enumerate-children branch July 16, 2019 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants