-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve dag/import #9721
feat: improve dag/import #9721
Conversation
5178319
to
d62b713
Compare
That way too clean looking it's suspicious. |
- don't bypass the CoreApi - don't use a goroutine and return channel for `importWorker`, when what's happening is really just a synchronous call - only `PinLock()` when we are going to pin - use `cid.Set` instead of an explicit map - fail the request early if any pinning fail, no need to try to pin more if the request failed already
d62b713
to
2ec59ae
Compare
There is no |
Yes my bad, idk how the code managed to be so bad before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, previously it would concurrently import the dag and pin it in two different goroutines. This cause a situation where we reread from the blockstore what we just wrote to the blockstore.
Also if the rereads are faster than the writes we would try to fetch over bitswap blocks that we will receive in the car for a split second.
Now you do that serialy which makes the latency worst because the rereads are not parallised with the writes.
I think this could be solved by using a new consumer driven API for the pinning, that means you start a pin, the pinner code maintains internal states (cid.Set
and friends) and it has some magic callback where the consumer code (ipfs dag import
) can add more block to the pin state.
So I would expect something like this (pseudo code) (?
is error handling omited):
bs := api.Blockstore()
c, roots, ? := car.NewReader(req.Body)
pinstate := api.Pin().AddClientDriven(bs, roots, options.Pin.Recursive(true)) // we need bs in case we have blocks out of order
forLoop:
for {
block, err := c.Next()
switch err {
case nil:
case io.EOF:
break forLoop
default:
// Handle error
}
? = bs.Put(block)
? = pinstate.Add(block)
}
? = pinstate.Commit() // ensure that the pin state is complete
This still have perf similar to the two goroutine solution but does not double read.
Implementing AddClientDriven
sounds somewhat straightforward (implement a tricoloring (generator or event loop) pattern)
No. Because of https://github.com/ipfs/kubo/blob/master/core/commands/dag/import.go#L55, that goroutine is nothing more than a function call. And yes, all the blocks would be written, then read again.
As per above, I'm not changing that, merely making it explicit. The underlying problem really is that there is no guarantee that the input CAR file has all the blocks required for the pin. (Nor btw, that there is no extra blocks, but that's another matter). To make things a little more funny, it seems like kubo accept multiple CAR files in the same request (is there really a point?) so the DAG can be spread over those multiple files. This is why I have to wait for all CAR to be imported before starting to check the DAGs. This is why this process need to 1) accept and write the blocks and 2) check the pinned DAG completeness. It'd be great to have a way to do that on the fly while writing, but I don't think that exist?
As per above, that doesn't make sense imho, but if you go in that direction I would very much for you to consider ipfs/boxo#378. It would be so much better to have that command, where there is knowledge on the context, what block are being written and so on, to check that DAG and finally trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this all make sense now.
It does sounds like https://github.com/ipfs/go-ipfs-pinner/issues/25 is what I wanted actually, I just didn't knew about it.
Thx for the refactor, the code is much better now.
Unrelated pubsub sharness failure. (I should really fix this flaky CI) |
@MichaelMure if you're worried about slow |
importWorker
, when what's happening is really just a synchronous callPinLock()
when we are going to pincid.Set
instead of an explicit map