Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change pinning to happen in a callback #1274

Merged
merged 4 commits into from
May 30, 2015
Merged

change pinning to happen in a callback #1274

merged 4 commits into from
May 30, 2015

Conversation

whyrusleeping
Copy link
Member

this does the pinning for the add command inline (as it was intended to be) instead of after the fact. This has a significant perf improvement over the current speed of add, and that improvement increases the larger the file being added is.

@whyrusleeping whyrusleeping added the topic/perf Performance label May 21, 2015
@whyrusleeping whyrusleeping self-assigned this May 21, 2015
@whyrusleeping whyrusleeping added the status/in-progress In progress label May 21, 2015
@jbenet jbenet mentioned this pull request May 21, 2015
52 tasks
@whyrusleeping whyrusleeping force-pushed the refactor/importer branch 2 times, most recently from 89d7625 to 8dae027 Compare May 22, 2015 03:08
if err != nil {
res.SetError(err, cmds.ErrNormal)
return
}

mp := n.Pinning.GetManual()
mp.RemovePinWithMode(rnk, pin.Indirect)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it is added below (for all files found) and then re-added here?

why not just not pin anything until the end? i.e. remove the call to pin below and only pin recursively here after adding?

is the worry that something will GC in-between, while adding? (if so, ok makes sense).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just not pin anything until the end?

thats what this PR is trying to avoid doing. in order to pin after the fact, you have to pull every single block in the dag back off the disk, and write the pin for it. This is really expensive when we get into the 50+GB range of add operations. (or in the 10MB+ range on my pi)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the worry that something will GC in-between, while adding? (if so, ok makes sense).

This too, a poorly timed gc could end really badly for us. (although its still not perfectly safe)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mark and sweep is TRTTD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, May 27, 2015 at 11:42:36AM -0700, Jeromy Johnson wrote:

is the worry that something will GC in-between, while adding? (if
so, ok makes sense).

This too, a poorly timed gc could end really badly for us. (although
its still not perfectly safe)

I think the right approach here is to have transactional changes to
the pin tree (see also the commit sessions discussed in #1187).
Coupling that with something like Git's gc.pruneexpire to avoid
garbage-collecting recently-created objects (say, in the last week?)
would give us better safety here. It would also protect folks from
accidentally GC'ing recent objects they'd forgotten to tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coupling that with something like Git's gc.pruneexpire

that's a great solution. of course it comes from git.

i'm really concerned about safety here--

@whyrusleeping @tv42 -- what's the plan long term?

@jbenet
Copy link
Member

jbenet commented May 27, 2015

+1 to the general idea.

  • indirect remove question (i think it's likely TRTTD), but am worried about leaking pins in crash scenarios. @tv42's suggestion for a mark-sweep gc may be the easier option. (i.e. don't store "indirect" pins, calculate them on gc. it's still O(n) so mark-sweep likely similar perf.
  • callback - why not provide the actual block? or mdag node?
  • callback - what is the bool for? named parameters + doc:
  • rebase

@wking
Copy link
Contributor

wking commented May 27, 2015

On Wed, May 27, 2015 at 11:37:19AM -0700, Juan Batiz-Benet wrote:

  • why not provide the actual block? or mdag node?
  • what is the bool for? named parameters + doc:

See also my callback suggestions in #1291. The post-chunk/layout
callback there both accepts and returns *dag.Node objects.

@whyrusleeping
Copy link
Member Author

yeah, i guess i used a key because i already had them in both places i was calling the callback. a node makes sense.

@whyrusleeping
Copy link
Member Author

@jbenet @wking comments addressed, if no further CR, this is RFM

@wking
Copy link
Contributor

wking commented May 28, 2015

On Thu, May 28, 2015 at 03:08:23PM -0700, Jeromy Johnson wrote:

@jbenet @wking comments addressed, if no further CR, this is RFM

I'd still drop ‘root’ from your NodeCB signature, and I'd like to drop
indirect pinning in favor of a time-based grace period before
garbage-collecting nodes 1. But as it stands this PR is still a
step in the right direction, so I'm +1 to merge.

@@ -5,6 +5,13 @@ import (
"github.com/ipfs/go-ipfs/pin"
)

// NodeCB is callback function for dag generation
// the `root` flag signifies whether or not this is
// the root of a dag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean this is weird, right, because every node is the root of a dag.

maybe

the `last` flag signifies whether or not this is the last 
(top-most root) node being added. useful for things like
only pinning the first node recursively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@jbenet
Copy link
Member

jbenet commented May 29, 2015

I'd still drop ‘root’ from your NodeCB signature,

I wold like that too, but i think this needs to switch to mark and sweet.

and I'd like to drop
indirect pinning in favor of a time-based grace period before
garbage-collecting nodes [1].

Well indirect pins need to be permanent. (or you mean only indirect pinning the root?) i think indirect pins can be dropped entirely if we go with mark-and-sweep as @tv42 suggested a while back.

@wking
Copy link
Contributor

wking commented May 29, 2015

On Fri, May 29, 2015 at 12:27:34PM -0700, Juan Batiz-Benet wrote:

and I'd like to drop indirect pinning in favor of a time-based
grace period before garbage-collecting nodes 1.

Well indirect pins need to be permanent. (or you mean only indirect
pinning the root?) i think indirect pins can be dropped entirely if
we go with mark-and-sweep as @tv42 suggested a while back.

Yeah, or tri-color marking 1 or whatever. Explicitly using
pin-markers for every object in a recursive tree seems like you're
forcing the client to help you mark, when that should really be
something handled internally by the GC implementation. Of course,
you'd want the time-based grace period or some such to bridge the gap
between adding an ancestor node and pinning the root node, but that
seems manageable.

@jbenet
Copy link
Member

jbenet commented May 29, 2015

Let's continue the gc discussion here: ipfs/notes#11

jbenet added a commit that referenced this pull request May 30, 2015
change pinning to happen in a callback
@jbenet jbenet merged commit 5d43ebb into master May 30, 2015
@jbenet jbenet removed the status/in-progress In progress label May 30, 2015
@jbenet jbenet deleted the refactor/importer branch May 30, 2015 00:06
wking referenced this pull request in sgotti/oci-store Jul 2, 2016
ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this pull request Apr 7, 2022
change pinning to happen in a callback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/perf Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants