-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Recognize and render an mtime metadata field for "Directory" unixfs nodes #3451
Comments
I think we should use the metadata unixfs type for this. https://github.com/ipfs/go-ipfs/blob/master/unixfs/pb/unixfs.proto#L18 We havent gotten around to actually doing anything with this yet, but its basically what it was designed for |
@whyrusleeping I briefly considered it, and faced the following trouble:
|
the The metadata node links to the object it describes. They are designed to only point to a single child node. |
Hm... then... how does it work with a file in a dir? Let's take QmW4UsYZRtxSBtQgCkHEsFpJDMNYv8j4eFpEU97w1fEHtp:
Where do various Metadata nodes attach, and how does a containing (parent) directory find out about them for display? |
Think about metadata nodes as semi-transparent wrappers. They can wrap any other unixfs node to augment it with some extra information. If you point directly to a file, and not the metadata object, then you obviously don't have metadata for that file. |
Understood. Not ideal from a low-level PoV as it doubles the amount of nodes one needs to use to represent a directory listing, but it is workable. I will add some tentative examples to the "unorthodox-dag" I am putting together. |
@whyrusleeping thinking about it more... I still think it is incorrect to have an extra metadata node as a standalone object. Consider the earlier example listing. One would ideally want to expand this with:
Currently for any of these we would need to do an extra fetch for every entry in the directory, which even for moderately large directories will be expensive. If we add metadata nodes - then there are potentially two lookups needed (at least for the Instead if we forget about the existence of metadata at the unixfs level, and instead redefine:
Then a lot of the problems with lookups go away and link nodes can remain lean if they simply omit the metadata sub-structure. @jbenet thoughts on the above? |
P.S. Yes, I am aware of the IPLD specification effort. But it doesn't seem to cover any of this either... |
Hey @mib-kd743naq -- some quick comments:
To confirm we're understanding each other, there would be at most one Metadata object for each file, and not one Metadata object per metadata key-value entry. It's one dict. So instead of N nodes per entry in each dir, 2N nodes. And not kN (where k is the number of metadata entries used). The reasoning for separating metadata out and making it a wrapper around the file (instead of embedded within the file node OR another node pointed to by the file node) is to make sure that all raw data versions of the file are deduped, and that changing the mode does not constitute creating a new root object. Granted, the bulk of the data is at the leaves, so those wouldn't change anyway, only the very root, but im not sure that preventing the extra node fetch is ideal here. Let me ask it this way: why is an additional node a problem?
If it's (1), then we can proceed and optimize bitswap instead. if it's (2), then we should be working on optimizing the datastore. I don't think either of this is reason not to create a larger graph-- in that a good ipfs implementation should make additional nodes not very expensive. They are indeed expensive now, so could we work on making them not expensive? instead of hiding the expense by changing the data model. separately, i would like to know whether you think -- perf concerns completely aside -- the metadata object wrapper is confusing or bad from a data modeling perspective. I.e. is this the wrong thing to do? or just a "currently expensive but optimizable" thing to do? |
quick note: since the addition of IPLD and "raw data" nodes, "unixfs-file" has become differentiated from the raw data they contain. Therefore, it may be that storing the metedata directly in the file is no longer that big of an issue. That said, we should note that we would need to do this twice, once for files and once for dirs, instead of once (once for metadata, wrapping both files or dirs). Another point pro metadata: it can be fetched comfortably by an |
@jbenet this is not what I was suggesting. I am suggesting that the metadata become property of the unixfs directory link itself. I.e. having this thing become this:
Could you reread my original comment/question with that in mind and respond to that? A lot of what you wrote above is very reasonable, but alas does not apply to what I said. Sorry for the confusion... :/ If my reasoning still isn't clear - I will try to explain this in much greater detail sometime towards end next week: work is going crazy... |
I would like to push this forward (willing to contribute code) as having optional mtime metadata could enable novel use cases such as:
I am perfectly okay with the current Metadata-wrappers. Compared to @mib-kd743's suggestion, the only advantage I see in separate Metadata nodes is smaller memory/disk consumption when metadata is not present (and thus less to copy around). The different root hash argument is invalid as far as i can see, because the directory's root hash also changes by the introduction of a wrapper node. Metadata retrieval cost for large directories could be prohibitive, but I agree with @jbenet that that can be solved with bitswap. If we are to extend the Metadata node with an optional ModTime field, is everyone okay with the timespec type proposed by @mib-kd743naq? I considered the possibility of having only second-level accuracy, but i believe that metadata should be preserved as accurately as possible. |
@kalmi thank you for pushing this further! Really glad someone else showed interest, as I myself got completely swamped at work and likely won't be able to get back to IPFS goodness for another 2 months :( On the subject of where the metadata should live: I think I was misunderstood by pretty much everyone in this thread. I am still convinced that the mtime and other content-unaffecting metadata should be part of the link structure itself. The reason isn't efficiency during data access, but efficiency during metadata access ( i.e. browsing ). To illustrate this differently: one of the projects for which I need this functionality is serving a standard ftp mirror over http but backed by IPFS. Under my proposal in order to display this listing one needs to download very few blocks. Under the "separate metadata nodes" proposal one needs to grab thousands of them. This is a non-trivial difference. All of this notwithstanding - I will take anything at this point. |
@mib-kd743naq, I totally get what you are saying, and yes, the performance implications also seem scary to me, but I also get the others who wish to keep PBLink lean and continue the existing path, the Metadata node. I am bound to accept that it can eventually be solved efficiently with bitswap selectors if @jbenet says so. |
I suggest adding a --save-mtime boolean flag to ipfs add that would wrap all unixfs nodes in Metadata containing ModTime. The default behaviour wouldn't change one bit in either case. After we have that, I will also suggest adding a Experimental.TrustModTimes boolean config option in a different issue. When this config option is true, IPFS could make use of some optimizations to make repeated adds and gets faster. I will get into that later in more detail. First, I just want to get ModTime into the Metadata structure, and implement the --save-mtime and --restore-mtime flags and their behaviour. I would love to have some assurance that these additions would be welcome. |
Moved timespec message format definition discussion to: ipfs/notes#60 |
Actually this was never explained. There was an unambiguous "we are not modifying UnixFS anymore, all new work is going into IPLD", but there was no reasoning nor timeline behind this. I follow and am excited about many new developments: the OpenBazaar integration work, the Peergos stuff, and many other neat things build on top of IPFS. But as @mguentner said back in November: the fs-stuff in IPFS is still not really usable. @jbenet @whyrusleeping: could you please add some clarity on what plans if any are there to be able to faithfully represent content like this? I know it is not sexy work, but still - inquiring minds want to know ;) |
*** THIS IS A BREAKING CHANGE *** as per [1]: "Required is forever" Nevertheless this seems like a good idea at this time: there are no known producers ( nor consumers ) of MetaData nodes, and the current requirement of MimeType has an extremely narrow application scope. This change could very well be rejected in lieu of implementing a new type of node ( e.g. TheRealMetadata ) in the DataType enum. Based on ipfs/kubo#3451 (comment) License: MIT Signed-off-by: Mib Kd743naq <mib.kd743naq@gmail.com> [1] https://developers.google.com/protocol-buffers/docs/proto#specifying-field-rules
Will be fixed by #6920. |
*** THIS IS A BREAKING CHANGE *** as per [1]: "Required is forever" Nevertheless this seems like a good idea at this time: there are no known producers ( nor consumers ) of MetaData nodes, and the current requirement of MimeType has an extremely narrow application scope. This change could very well be rejected in lieu of implementing a new type of node ( e.g. TheRealMetadata ) in the DataType enum. Based on ipfs/kubo#3451 (comment) License: MIT Signed-off-by: Mib Kd743naq <mib.kd743naq@gmail.com> [1] https://developers.google.com/protocol-buffers/docs/proto#specifying-field-rules This commit was moved from ipfs/go-unixfs@e94be52
( Please move to
notes
if this is the wrong forum )Version information:
N/A
Type:
Feature/Enhancement
Priority:
P4
Description:
Compare the experience of visitng a conventional mirror listing and the experience of an IPFS-rendered directory.
The presence of
mtime
( at least for me ) is a very valuable indicator of "how stale is this mirror exactly?". I am aware that the number can be faked, and this is fine: it can't be authoritative anyway, only the content can be. Yet it is really useful when eyeballing things.IPFS currently does not make a provision for storing this information ( both the current implementation and whatever I could find about the IPLD plans ). I think it would be a large step in usability to make this possible.
On implementation: allowing specification of mtime within a UnixFS (or other type of ) content-node itself is a bad idea. Instead
mtime
s ( and possible other future metadata ) should be an optional property of links. This fits nicely into the sharded directory implementation we have going as well. Something like (note - this could very well be an IPLD-only feature, using mdag to kick-start discussion):or maybe even:
Thoughts?
The text was updated successfully, but these errors were encountered: