Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graphsync/do-not-send-first-blocks extension #225

Closed
hannahhoward opened this issue Sep 28, 2021 · 6 comments
Closed

Add graphsync/do-not-send-first-blocks extension #225

hannahhoward opened this issue Sep 28, 2021 · 6 comments
Labels
effort/hours Estimated to take one or several hours estuary issues identified from estuary usage or requested by code owners of estuary exp/intermediate Prior experience is likely helpful P1 High: Likely tackled by core team if no one steps up

Comments

@hannahhoward
Copy link
Collaborator

What

Since selector traversals are deterministic, the simplest way to resume a previous request is to simply tell the responder to not send the first N blocks, corresponding to the blocks you have had already received at the time the request was interrupted.
Add an extension to go-graphsync that allows the requestor to instruct the responder when responding to a graphsync request to not send full block data for the first N blocks

Proposed Implementation

When sending a request, the requestor sends an extension graphsync/do-not-not-send-first-blocks and encodes the number of blocks to skip as a CBOR encoded int in the data field

The IPLD schema is as follows:

type DoNotSendFirstBlocks Int

This issue covers implementing built-in support for such this extension in the go-graphsync Responder.

Acceptance Criteria

As a client I can call:

var gs graphsync.Exchange
var ctx context.Context
var p peer.ID
var root ipld.Link
var selector ipld.Node
var blocksToSkip int64

gs.Request(ctx, p peer.ID, root, selector, graphsync.DoNotSendFirstBlocks(blocksToSkip))

The requestor will properly encode the graphsync/do-no-send-first-blocks extension and the responder will not send the number of blocks specified at the beginning of the selector. Assuming the client has the first N blocks stored locally, the request will finish as normal.

Proposed Improvement

A possible improvement would be to have the go-graphsync requestor implementation itself attempt to load blocks locally until it no longer can, and then it would encode this extension automatically to avoid receiving those blocks.

This would obviate the need for higher level libraries to track how many blocks are sent (i.e. go-data-transfer)

@hannahhoward hannahhoward added need/triage Needs initial labeling and prioritization estuary issues identified from estuary usage or requested by code owners of estuary effort/hours Estimated to take one or several hours exp/expert Having worked on the specific codebase is important exp/intermediate Prior experience is likely helpful P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization exp/expert Having worked on the specific codebase is important labels Sep 28, 2021
@whyrusleeping
Copy link
Member

This sounds good to me, though im not sure the proposed improvement is actually an improvement. Disk IOPS on giant files get expensive.

@dirkmc
Copy link
Collaborator

dirkmc commented Sep 29, 2021

Sounds good to me 👍

@dirkmc
Copy link
Collaborator

dirkmc commented Sep 29, 2021

Should we just call it "block-offset"?

@dirkmc
Copy link
Collaborator

dirkmc commented Sep 29, 2021

tell the responder to not send the first N blocks

In this case is N the number of unique blocks? It's easier for higher layers to consistently keep track of the number of unique blocks.

@hannahhoward
Copy link
Collaborator Author

In this case is N the number of unique blocks? It's easier for higher layers to consistently keep track of the number of unique blocks.

Is it though? We track giant lists of CIDs in go-data-transfer, but what if we just incremented a counter in DataReceived?

@hannahhoward
Copy link
Collaborator Author

@whyrusleeping

this is a proposed improvement just makes things easier for the higher level library. I agree, it's not super efficient.

marten-seemann pushed a commit that referenced this issue Mar 2, 2023
associate graphsync requests to dt requests, record channel sends as they happen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours estuary issues identified from estuary usage or requested by code owners of estuary exp/intermediate Prior experience is likely helpful P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

3 participants