-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocks don't have a nice size #2053
Comments
good point, we should probably adjust the block size to fit neatly there. the extra 14 bytes is the protobuf wrapping on the data. |
@robcat great observation -- agreed. this will shift a bit with incoming ipld, too. we should chunk as
|
Yes, this would be the obvious fix. An alternative solution would be to make a special exception for blobs: use 2^n sizes for chunks, and the blockstore could transparently strip the wrapping when writing and then re-add it when reading it from the filesystem (the special status of the stored block may be encoded in the filename or in one attribute). |
This makes a lot of sense. and we should definitely investigate how this would play out. Optimizing this will matter significantly, as we're trying to put enormous amounts of data on ipfs :) -- the timeline for implementation can wait (if easy do it sooner), but certainly want to figure it out soon for specs. I suspect this would get a TON easier if we were to allow the ipld format that's just the raw data (no wrapping) -- obviously strictly for leaves, without any links. This has been requested many times, (and im not opposed to it, we just need to make damn sure that we distinguish it correctly). |
Sorry if I go a bit offtopic, but while I was focused on the file archival scenario I forgot to think about the links. Why are the links stored together with the binary data in the first place? |
@jbenet My last post was not just a rant about links: if they were stored separately, this block size issue would be immediately solved. |
@robcat hash links are part of the data model. this is why it's a merkleized data structure. an entire object is one item. i understand that this seems odd when you think of ipfs as just a way to track posix files. but ipfs is much more beyond that, the point is to make ipfs objects a data model where people can construct arbitrary data structures -- and dump all their existing ones -- and have merkle links as first class primitives. this is not a great rendering, but take a look:
|
btw, a storage model that rips out the merkle links, storing them separately, and then puts them back together when constructing the object logically (and to send it, and hash it to make sure things are ok, etc) screams implementation pain and suffering. this is why #875 will be hard. caveat is that by extending IPLD to allow just raw data edge nodes, we can make #875 easy to implement, and get most of the way you want here. also note that none of this prevents us from maintaining indexes of the links in a graph database (for backlinks etc), but that's a different layer. (there's a lot wrapped into these decisions) |
@jbenet The current graph structure (as in the IPFS paper):
If I just want to walk on the graph, I necessarily have to get all the binary blobs and hash them (to confirm the object integrity), even if I don't really care about them. That's because But walking on the graph is a so common use case! The result is that, in practice, objects become specialized: they usually contain either only links or only binary data. This is the structure I have in mind:
Pro:
Contra:
|
We're still on different pages. see https://github.com/ipfs/specs/blob/ipld-spec/merkledag/ipld.md and the talk i linked earlier. ipfs objects will be arbitrary data objects. the great majority of them will not be posix files nor blocks of them. they will be small pieces of interlinked data in applications. |
Sorry @jbenet, I watched and read all the linked material, but according to me they sit on different levels. IPLD is the definition of a common language that allows to interpret and canonicalize the nodes of the graph; here I was instead discussing about implementation details (do we really need to specify a serialization format for object that are already flat?)
I understand this vision perfectly, but maybe we have "dual" views. According to me, the Also, we maybe don't agree on what is small. According to me, the content of the But if we are not still understanding each other, I would like to ask:
|
IPLD is not yet deployed, we're moving away from the current format to IPLD. This transition is not complete yet. The old format will continue to be supported -- backwards compatible -- so links do not break.
I don't understand the question. they will be stored in the same serialized object. See https://github.com/ipfs/go-ipld/
It's not "binary data" specifically. it's just "data".
You distinguish "data carrying nodes" as leaves, and special. IPLD does not, and embeds data directly in the same object. Also, you do not have the same data model as IPLD (full json-like tree), you have the old merkledag link-list model. Other points for consideration:
|
This patch fixes some of the inconsistencies in the IPLD spec, as well as resolving some existing issues, namely: - ipfs/kubo#2053 - allow raw data in leaf nodes - ipfs/kubo#1582 - use EJSON for encoding (non-UTF8) binary data - mandate CBOR Strict Mode to avoid having to deal with duplicated keys - change `link` to `content` (although we could also `data` or something similar), as file content needn't be merkle-links, and can in fact be directly embedded into small files - removed the extraneous `subfiles` wrapper, as file content can be either (a merkle-link to) a bytestring, a unicode string, or a list thereof
@jbenet I've suggested some changes to the IPLD spec in ipfs/specs#111 to address this issue, comments welcome :) |
Here - #2122 - we also also talking about limiting block size to 256KiB to allow better aligment and packing in storage (as block size is usually 4KB but it can be bigger), which was initial topic of this issues. |
Note that it should be possible to adopt my filestore code (#875, #2634) to separate out the payload and store it separately and basically implement what I think @robcat is proposing. This was actually something I was going to propose and can have some nice performance benefit. For example the garbage collector only cares about the links in a node; by separating out the payload you could implement something like adding a GetLinks() method to the If this idea is implemented it might not necessary be a good idea to set the block size to 256KiB as that means the payload could then have odd sizes and the odd sizes might not be compatible with a filesystems deduplication. If the payload size is exactly 256KiB than the filesystem has a much better chance of deduplicating the block, if that data is also present in another file. Something to think about. |
It is true, but you have to consider how many nodes will be pure raw data nodes which also will happen to be in file that user publishes and how many of them will be linking, storing metadata and use custom structures. Users are mostly consumers in our times (1% to 99% rule). Your filestore is nice thing but for consuming majority it won't be that useful. Other thing is that filesystem will be only able to dedup it if the content of file begins at 4KiB boundary, which would require exactly On the other hand if raw nodes where to be introduced (DAG that consist only from raw uninterpretable for IPFS data) they would store precisely 256KiB of given file, making it aligned perfectly with file system boundaries, allowing for filesystem deduplication. And linking nodes also are 256KiB and align to filesystem boundaries not wasting any space. |
@Kubuxu, my idea was that the metadata be stored separately from the payload (i.e. the contents of the file). I.e. the metadata could be stored in a leveldb (as it will be small) while the payload could be stored in a separate file on disk. The size of the metadata in this case will be irrelevant. As far as raw nodes without any metadata, that will also solve the problem. Has anyone given any idea on how that would be implemented? Without any sort of header how would you know what you are getting when you retrieve a block? |
To summarize my point from IRC unless raw nodes are used for all data, at some point in the future it might make sense to store the data separately from the metadata in repo. If the data is stored separately than it is the size of the data component of the block that is important as for as optimizing storage goes. I would propose we limit the size of the data to 256KiB - 4KiB unless raw nodes are being used. The block size can than still be limited to 256KiB. |
@kevina see my suggestions in ipfs/specs#111 |
lets get this done alongside the ipld stuff. With the new CIDv1 stuff we will be able to address completely raw blocks! |
@whyrusleeping note this: ipfs/specs#130 (comment)
|
Raw blocks are now a thing, running Closing this issue now, please go ahead and reopen this or open a new issue if further discussion is needed. |
(also worth noting that you can |
Most filesystems have blocks of a standard size of 4k or 16k.
The blocks generated by the ipfs default chunker are stored in 2¹⁸+14 bytes files (262158 b).
These extra bytes don't really play nice with most filesystems and disks (an extra block is allocated for just these 14 bytes).
Are these bytes an essential part of the block that can't be moved elsewhere (e.g. in its filename or in the database)?
The text was updated successfully, but these errors were encountered: