-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Tsize
values, fix HAMT and make recursive builder output match go-unixfs
#21
Conversation
Tsize
values, fix HAMT and match recursive builder output match go-unixfsTsize
values, fix HAMT and make recursive builder output match go-unixfs
I did not find good documentation the last time i went through this on what the size of directories should be set as. Is that something we can have a spec document on and point to in comments? |
Note that there is no agreement on what cc @achingbrain |
Kind of lacking on the spec front - there's this vague area above the encoding format layer (specs I was responsible for) https://ipld.io/specs/codecs/dag-pb/spec/ and the unixfs layer https://github.com/ipfs/specs/blob/master/UNIXFS.md. These changes get in the middle there. When writing the dag-pb spec, I didn't consider it necessary to go into detail about the proper contents of the various fields, I probably thought that was an IPFS spec concern. But it looks like IPFS speccing was higher level still. So our specs for this are in the form of code for now. At one stage there was talk of deprecating |
I've changed the test for the sharded directory and added another one such that we now test the boundary condition where it flips from unsharded to sharded by adding an extra file. Again I constructed the data using go-ipfs and found the boundary condition with it. The CIDs and final reported size are calculated from go-ipfs outputs too. No extra fixes required, this lib is getting the boundary right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know - I certainly saw maintaining absolute byte compatibility with things like switchover point as a somewhat lower priority for new implementations. As long as either style can decode properly, I'm less worried about them generating and switching over at the same point.
I know there are some edge cases that would like it, but it's equally true that in setting that expectation of 'these are the only bytes that should be generated for this input' we make it harder to evolve the format and move to updated versions of unixfs.
OK, this started as something entirely different, I'll get to that in another PR later. But while using this I noticed that
Tsize
isn't getting set properly on directories. Then while fixing that, I noticed that the HAMT doesn't quite work properly, so I fixed that.There's two main tests in here, one for a smaller directory structure and one for a very large one that breaks out the sharding. I've generated the CIDs from go-ipfs by
ipfs add
ing them. (and final Tsize values calculated by looking at link sizes in root blocks and adding the root block size to the total:ipfs dag get bafybei... | jq '[.Links[].Tsize | tonumber] | add'
&ipfs block get bafybei... | wc -c
).It's a breaking change because all of the builder
BuildX()
functions returndatamodel.Link, unit64, error
now so the size is included (mainly because it needs to be for recursive internal collection, but the reported size is an accurate representation of the structure's weight so should be useful in some situations too).While in here, I replaced the block size calculation hack (load saved block as raw and use byte length) with a new, even more exotic hack, to count the bytes as they are pushed out of the encoder by wrapping the LinkSystem, EncoderChooser and Encoder at each save point. Not trivial, but saves the round-trip out of the block store. @warpfork @mvdan I think the need for block size at the
Store()
call for well formed dag-pb graphs makes a pretty strong case that this should be easier to retrieve out of the API—perhaps justStore()
should return 3 values?