Add option to tell link system storage is trusted and we can skip hash on read #149

hannahhoward · 2021-03-05T02:34:20Z

Goals

When the underlying storage system is local, we may want to skip hash on read to increase performance. This adds a boolean value that tells the LinkSystem to skip hash checking when loading nodes.

Extra bonus: this will be fine as an alternative to the bytesBuffer optimization!

warpfork · 2021-03-05T12:58:51Z

Devils advocate / another option to consider:

What if BlockReadOpener was instead defined to return a boolean: func(LinkContext, Link) (content io.Reader, plzTrustMe bool, err error)?

Would still hit the goal of making hashing disableable
Lets the storage layer express whether or not it believes its content is already validated -- which means we could be loading part of a walk from trusted storage, and other part from not-yet-trusted network content, and it all sorts itself out
Still lets the storage functions be free of the complexity of having to deeply understand hashing (all currently drafted approaches seem to do this, just remembering it outloud for clarity).

Would this be relevant? Do we have mixed trust walks like that (or might we, in the future)?

Would it make sense to combine these things (maybe have LinkSystem contain a flag that must be one of {FollowStorageTrustHint|AlwaysHashReads|NeverHashReads})?

Thinking out loud at the moment, do not yet have strong opinions.

warpfork · 2021-03-05T13:03:25Z

Should we have a way to tell a storage system that the io.Reader it returned gave us bollocks that failed hash verification?

I don't know if that's come up yet, but I could imagine that being relevant information if the storage system is network-backed and involves a piece of code with enough agency to want to Do Something in the network protocol based on higher level errors like that.

hannahhoward · 2021-03-05T23:44:21Z

In short, I think the cases where we need to be making a distinction based on CID as opposed to the storage layer are very few. More likely we will have different storage layers for which we will construct different link systems. I think this a much less cumbersome solution that solves the main problem for now.

warpfork · 2021-03-08T12:14:41Z

Distinction based on CID: agree, rare to never. Let's go with 'never'.

Working with different storages: Less sure about this. Sometimes, true, one can just use different LinkSystem.

I think sometimes using a whole different LinkSystem won't suit though. For example: if I'm doing graphsync-y things... I'm evaluating a Selector, so I'm in the middle of one big traversal.Walk call -- so I have one LinkSystem of note. And in grahpsync it would be natural that some of the data I'm loading would be being streamed to me from a network peer (so it needs hashing), while maybe some of the data is blocks I already had locally, so I might not want to rehash them.

I think to be able to compose the two different storages in that case would be solvable in one of two ways:

make the hash verification the problem of the storage reader function. (Not great: pushes complexity to the storage function.)
let the storage function return info that hints whether this storage should be considered trustworthy, and let the LinkSystem decide what to do. (Not great: expands the size of the storage reader function interface slightly.)

Between those two, the latter seems less bad to me.

warpfork · 2021-03-08T12:19:54Z

I think I'd be game to merge this approach with the bool in LinkSystem, too, if we want to try it immediately. (In terms of how this would box us in and create more interface churn, it's really mild: As long as the default is "yes, hash", and any future change we could imagine doing also defaults to "yes, hash", then the changes only affect people doing Interesting things, and those people are probably interested in those changes so they should be minimally problematic.)

But I'm still grinding through this train of thought, because if we do end up adding some more hint bools to the BlockReadOpener function interface, it would probably also be ideal to do those now-ishly, since #143 will already be causing an interface change there.

Have boolean that specifies whether the storage is trusted. if it is, skip hashing on reads

hannahhoward · 2021-03-24T01:31:59Z

@warpfork this is rebased -- can we merge it? I removed the byte buffer optimization. which ultimately makes this code simpler.

I think the number of cases where you're data is untrusted but also still being read locally such that you have it in a bytes.Buffer is very rare, and I think we should just live without for now.

warpfork · 2021-03-24T08:31:02Z

With the forward-looking note I still bet way north of 50/50 on if we will still end up wanting to add an info bit about this to BlockReadOpener somehow in the future -- yeah, I'm also good with this. Merging.

Base automatically changed from linksystem to master March 12, 2021 16:50

ribasushi mentioned this pull request Mar 23, 2021

ipfs dag export is slower than ipfs cat ipfs/kubo#8004

Open

feat(linksystem): add option to disable hash on read

6a262a3

Have boolean that specifies whether the storage is trusted. if it is, skip hashing on reads

hannahhoward force-pushed the feat/configure-hash-on-load branch from df2f100 to 6a262a3 Compare March 24, 2021 01:28

warpfork merged commit dc342a9 into master Mar 24, 2021

aschmahmann mentioned this pull request May 14, 2021

Release v0.9.0 ipfs/kubo#8058

Closed

71 tasks

mvdan deleted the feat/configure-hash-on-load branch August 13, 2021 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to tell link system storage is trusted and we can skip hash on read #149

Add option to tell link system storage is trusted and we can skip hash on read #149

hannahhoward commented Mar 5, 2021

warpfork commented Mar 5, 2021

warpfork commented Mar 5, 2021

hannahhoward commented Mar 5, 2021

warpfork commented Mar 8, 2021

warpfork commented Mar 8, 2021

hannahhoward commented Mar 24, 2021

warpfork commented Mar 24, 2021

Add option to tell link system storage is trusted and we can skip hash on read #149

Add option to tell link system storage is trusted and we can skip hash on read #149

Conversation

hannahhoward commented Mar 5, 2021

Goals

warpfork commented Mar 5, 2021

warpfork commented Mar 5, 2021

hannahhoward commented Mar 5, 2021

warpfork commented Mar 8, 2021

warpfork commented Mar 8, 2021

hannahhoward commented Mar 24, 2021

warpfork commented Mar 24, 2021