-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does it mean to be a directory in the UnixFS layer? #5157
Comments
Yes. Sharded directories came later so they got a different type.
So, all links must go in the However, sharded directories span multiple nodes. Therefore, for better performance, we cache these shards when we load them. That's why we have the children map. |
Oh, could you point me to that part of the code? I thought a shard was encoded in a single |
So, while working on #5160 I realized that MFS is the single consumer of the UnixFS directory abstraction, the definition of the interface is basically what the MFS layer needs from UnixFS. |
I'm not sure what kind of restriction is that. I could just save the links in the |
Take a look at
You could. However, tools like
So, there are multiple types of unixfs directories. At the end of the day, they should all expose a standard interface (really, iterate and lookup). However, we're always going to have multiple very different representations. Currently, we have:
We actually have the same split in files. We have files that are raw bytes and sharded files. So, for context, HAMT stands for hash array mapped trie. To insert a key/value (name/file) into a HAMT, you first hash the key. Then, roughly,
However, instead of storing sparsely populated arrays as suggested by this algorithm, we use a bitfield. To lookup a value in the array, you:
|
Extra 👍 for the detailed HAMT explanation. |
This issue has been addressed in a couple of PRs now (the last one being ipfs/go-unixfs#16) we can close it. |
It would be great to add this information on how the UnixFS HAMT works to the UnixFS spec which currently doesn't have any info on how the HAMTShards are actually encoded and supposed to be used. This is the best explanation here that I found when googling for a spec of the UnixFS HAMT. |
The
Directory
type of a UnixFS object (contrary to its name) doesn't actually indicate a directory, or at least it's not the only possible type of a directory, theHAMTShard
type also indicates a directory, leading to confusing comparisons like:https://github.com/ipfs/go-ipfs/blob/ecf7d157a6d5e525b122079367f4b6c2ba25e951/mfs/dir.go#L159-L161
So
Directory
is only the "plain implementation" type of a directory, it's not the directory type (a very similar confusion arises with theFile
andRaw
types). What are the requisites of a UnixFS directory? It seems that there is no interface that would define it, at this point the closest that can be find is theDirectory
from theunixfs.io
package,https://github.com/ipfs/go-ipfs/blob/ecf7d157a6d5e525b122079367f4b6c2ba25e951/unixfs/io/dirbuilder.go#L28-L36
but this structure emerged just to accommodate the HAMT implementation (#3042, it evolved from a previous
directoryBuilder
structure), not as a clear and documented definition of a directory.The biggest consequence of all this (IMO) is that a new reader needs to go through the
*hamt.Shard
structure pointer to understand what's happening when content is added to a directory (with the major danger of diving in the really complex code of thehamt
package).So the main objective is to provide the user a clear explanation of what it can expect from a directory, possibly presenting a documented interface, while also using it to hide as much as possible the HAMT directory variant, to avoid confusing code lines when the user is following the execution path of how are files added to a directory in the MFS hierarchy,
https://github.com/ipfs/go-ipfs/blob/ecf7d157a6d5e525b122079367f4b6c2ba25e951/unixfs/io/dirbuilder.go#L101-L107
For example, looking for common characteristics of the plain and the HAMT (which I mostly do not understand) implementations I'm observing that its children are still referenced as DAG links (HAMT contains a map for a quick access to them but the links remain at the DAG layer); on the contrary, whereas the plain implementation does not hold information in the
Data
field the HAMT stores the bit field there.The text was updated successfully, but these errors were encountered: