Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file metadata and permissions to UnixFSv1 #220

Closed
wants to merge 5 commits into from

Conversation

mikeal
Copy link

@mikeal mikeal commented Aug 30, 2019

Prior discussion in #217

mikeal added a commit to ipfs/js-ipfs-unixfs that referenced this pull request Aug 30, 2019
This is based on the upcoming spec spec change ipfs/specs#220
@mikeal
Copy link
Author

mikeal commented Aug 30, 2019

Question: what happens when a message containing these new properties is parsed by someone using the old protobuf schema? Are the properties ignored or does it fail?

unixfs/README.md Outdated
/* file metadata */
optional uint32 mtime = 7;
optional uint32 atime = 8;
optional uint32 ctime = 9;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djdv pointed out that windows supports creation time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not supporting fractional time is an oversight. Especially provided we might be stuck with this stop-gap for quite a while. I urge you to consider doing the following:

message Timespec {
       // Use of int64 is deliberate - negative epoch is super unlikely
       // Using sint64 would introduce zig-zag encoding ( harder to eyeball )
       // The varint representing the time of writing this is 5 bytes long
       // It will remain so until October 26, 3058 ( 34,359,738,367 )
       required int64 UnixEpoch = 1;

       // fixed32 ( always 4 bytes, no varint ), as nanosecs are often > 2^28
       // https://developers.google.com/protocol-buffers/docs/proto#scalar
       // NOTE: on the wire this value will be *little* endian
       optional fixed32 UnixEpochNsec = 2;
}

And using Timespec for mtime/ctime/etc ( atime is a really odd choice to carry but since it is optional it's not a big deal )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to offer some perspective from the field in terms of what I have and have not, ever, needed when working with package management as well as containerland:

  • mtime comes up sometimes.

  • atime is completely useless. It will often get set again in the course of... anything else you're doing. The atime vs relatime vs noatime flags on common filesystems have also muddied the field beyond all recognition. There is approximately one program in the universe known to look at atime, and it's something you've never heard of. atime should be disregarded. atime provides purely negative value.

  • ctime is completely useless. It can't even be expressed unless you're writing filesystem drivers. While it so happens that we are, and we can, that doesn't mean it's a good idea, that it's useful, or that we should. For example, if we were to support commands that just unpack/manifest files on an existing (e.g. ext4) filesystem, we would not be able to preserve ctime anyway. Most other tooling exists within this limitation as well, and as a result, the world has evolved to effectively disregard ctime for any significant purposes; e.g., if there was a piece of software that would not work correctly if ctimes changed, it would fail to work in any container system, anything that uses tarballs, etc; effectively, it would be nonviable, and resultingly, such software tends not to exist.

I'd think long and hard before perpetuating these pieces of metadata just because we 'can'. They're remarkably useless, unused, and in many contexts unusable.

  • fractional timestamps I also have very mixed feelings on. We can observe that most tar systems don't do them, in practice. Yes, I know there's an extension for that. Go check if your tar command actually does that by default. Go stdlib tar doesn't. Docker and such systems have elided fractional components for ages as well, since they've been using the go stdlib tar libraries. By and large, nobody seems to have noticed or cared. We can look at the rate of serious problems being reported about timestamp precision in e.g. docker and use it as a barometer for how much fractional timestamps matter. I'd say it's "not much".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only want to store what is needed for the top level use cases (package managers) given this is not a permanent or even our ideal solution to these problems. My assumption is that they need something close to what stat gives them, which includes atime and ctime, but if nobody needs this we shouldn’t preserve this data.

My experience with fractional timestamps is that most FS API’s don’t give them to you by default and require an additional flag or option in order to see them, which didn’t meet the bar I had set, but again, if package managers need this then we can do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that high precision 64bit timestamps are in the current schema for unixfsv2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what’s the consensus here? keep all 3 or cut down to ctime and mtime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a prerequisite for adding these info like 'ctime' to unixfs should be to have documented answers to:

  • If a command like ipfs get is going to unpack content from unixfs into a tree of dirs and files on a regular (say, ext4, just to pick something) filesystem, and this were run by a non root user, what should the behavior be?
  • Is there any use to ctime other than being effectively a random number generator in the input data, which users cannot for any practical purposes control? Should commands to add files to unixfs have a default behavior to set it to a "zero" value? If so, does that change the expected behavior of either 'unpack' or mount commands?

... because whoever implements code around this will discover these questions about five minutes after we merge this.

(Or maybe not, and we quietly introduce a defacto random number generator to our nice content addressable system without noticing, which would be... worse.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note I have fairly strong and (I'd like to think) fairly informed opinions about this, because I've previously wrestled with it in the Timeless project... and while I'm not here to tell you I had The Solution over there, I will tell you that underestimating the complexity of the normalizing filters system that's almost certain to end up growing around this area would be a serious mistake. It's a touchy thing and has been a source of many headaches and bugs for me. Treat it cautiously and especially be wary of adding data if there's no plan for what its normalizers would look and act like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could split this PR into two parts - merge the mode property (which seems relatively uncontentious and solves some real-world problems our users have today) and bottom out the *time related stuff separately?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s two questions here that seem to be getting merged together:

  1. Should anyone ever be allowed to add these time properties to the File.
  2. How should IPFS use these properties, and should that behavior be optional or by default.

I’m inclined to follow @achingbrain’s advise and split it at this point. However, I’m going to keep mtime in the schema since it seems to be the least controversial and it gives us some room to experiment with *time issues in IPFS. Without it being in the schema we have no room to even experiment sufficiently.

We can work through the *time issues in the initial implementations and decide before we ship whether or not we actually want to use it in IPFS.

unixfs/README.md Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

Question: what happens when a message containing these new properties is parsed by someone using the old protobuf schema? Are the properties ignored or does it fail?

Ignored.

unixfs/README.md Outdated Show resolved Hide resolved
@warpfork
Copy link
Member

warpfork commented Sep 2, 2019

I'd like to just mention a couple links to prior art that's not merely prior art, but also particularly easy to read and review for inspirations:

Both of these (as well as the specs of tar, which I'm assuming everyone's at least given a cursory glance at already) are highly worth a quick skim just to see what other people have covered when trying to map this terrain.

@ivan386
Copy link

ivan386 commented Sep 2, 2019

I think all file or dir metadata must be in separate metadata block. This will be more flexible than put it in block with links to sub blocks. Don't make mistakes as in bit torrent. Separate metadata block can be included in dirrectory block for each file or directory that it contains. It will protect links between dir block and data that it contains. Metadata block can wrap first file or dir block if it need (wery small files or dirs). It can contain links to blocks in different format (raw).

Put it here:

message Metadata {
	optional string MimeType = 1;	


	/* file metadata */
	optional uint32 mtime = 7;
	optional uint32 atime = 8;
	optional uint32 ctime = 9;
	
	/* file permissions */
	optional uint32 permissions = 10;
	optional uint32 uid = 11;
	optional uint32 gid = 12;
}

@mib-kd743naq
Copy link

It will protect links between dir block and data that it contains.

@ivan386 could you elaborate what do you mean by this?

@achingbrain @warpfork note that my suggestions are just that: suggestions, based on the obscenely long time I've spent thinking about all this. The proposed PR is not usable by me in its entirety ( due to 1:1 mapping of entries/metadata-objects ), and that is fine: a properly of a robust framework is the ability to have multiple experiments run simultaneously. Please do not take my suggestions seriously enough to potentially derail this PR ;)

@ivan386
Copy link

ivan386 commented Sep 2, 2019

It will protect links between dir block and data that it contains.

@ivan386 could you elaborate what do you mean by this?

@mib-kd743naq

Every change of metadata change block hash that contain it. If include file/dir metadata to it root block it will be many root hashes for same file/dir with diffrent metadata. Leaf nodes will be same but root block different. This root block can be lost becouse it wery rare. It break chain and file will be inacceseble from parent block. All file data will be awaliable in IPFS but we can't get it without root block.

Only way to return that root block in IPFS is to generate it from file data again. But if file root block will contain ctime or other metadata we can't gues what it was. And we will generate diffrent root hash without ability recover old root hash for that file.

Separate Metadata block wery small for each file/dir it can be included in parent dir block by using of identity hash. This dir block will contain all files metadata and links to file/dir root block without metadata that always same for same data and not so rare. In this case we can lost links to root blocks to files/dirs only with parent dir block. And can recover root file block only by using file data.

@mikeal
Copy link
Author

mikeal commented Sep 3, 2019

Every change of metadata change block hash that contain it. If include file/dir metadata to it root block it will be many root hashes for same file/dir with diffrent metadata. Leaf nodes will be same but root block different. This root block can be lost becouse it wery rare. It break chain and file will be inacceseble from parent block. All file data will be awaliable in IPFS but we can't get it without root block.

The file node is already, effectively, a root node of metadata pointing to linked file data. It’s not clear to me what another layer of indirection (metadata message) would preserve if it’s still what everyone needs to link to. The current use case for the metadata message is to define a content-type for HTTP which makes sense since filesystems do not store this information and people may want to serve the same file with different content types. This new metadata is all related to the representation of the file in a filesystem, which is what the File message already is, and adding a layer of indirection would degrade performance.

We discussed this in the issue and the consensus seemed to be that this was the best path forward.

@ivan386
Copy link

ivan386 commented Sep 3, 2019

@mikeal content-type is optional field. mtime can be used in HTTP to.

This example link to dir block that contain:

  1. direct link to root
    This link will work untill file root block will be lost. Root block have many copys.
  2. link to separate metadata block
    This link will work untill metadata block will be lost. Metadata block have no so many copys as file root block. It can be lost soon and file will be inaccessible by that link.
    Upd: metadata block already lost. Link not work.
    It example of file with changet root block.
  3. metadata included in dir block
    This link will work untill file root block will be lost. Metadata block included in parent block. Here metadata block will be lost only with parent dir block.

All three links point to same file.

@momack2
Copy link
Contributor

momack2 commented Sep 3, 2019

Wanted to quickly check whether "mode" will fit the use case described for guix such that executable bit can be contained within file metadata? According to @Ambrevar at IPFS camp, that's one of the few remaining blocker to get the guix-ipfs-cache over the finish line - https://github.com/ipfs/camp/blob/master/DEEP_DIVES/package-managers/README.md#guix-patch-notes-that-may-be-helpful-in-other-package-manager-efforts

@Stebalien
Copy link
Member

@momack2 yes. The mode allows encoding the executable bit along with all the other bits (sticky, read, write, etc.). Although, we should probably define how we're using the mode as we don't want bits like "directory".

@Ambrevar
Copy link

Ambrevar commented Sep 5, 2019

@momack2: Just to confirm what @Stebalien said, yes that would do it! Note that the link to https://github.com/fps/guix-ipfs-cache is a different (more hacky) proof of concept. The proper patch waiting to be merged can be reviewed at https://issues.guix.gnu.org/issue/33899.

@mib-kd743naq
Copy link

Want to raise one last question from my corner of the ring, I am afraid this is being lost...

In #220 (comment) @Stebalien said:

Although, we should probably define how we're using the mode as we don't want bits like "directory".

Given that this detour is taken for the sake of the package management / nix-like-things groups: does it actually make sense to do this all as a new generic wrapper node that we will have to support forever?

To reiterate on (again @Stebalien)'s point in #217 (comment):

Names are already a part of the directory. Making metadata a part of the directory isn't all that odd. I'd expect most tools to reference files relative to directories anyways.

As far as I can tell, for the original set of users waiting on this, there is no usecase where a piece of data needs POSIX attributes without a filename. The filename is already only available as a directory wrapper node, there is no other way to attach name to an "anonymous stream of data". What makes other attributes different enough to warrant a non-directory anonymous wrapper-node?

@mikeal
Copy link
Author

mikeal commented Sep 12, 2019

The current document represents the closest thing we have to a consensus but it doesn’t feel like it’s solid enough to merge yet given the current spec in master is quite stable.

At this point, I think that we should move forward with an implementation in at least one language (probably JS) and come back to the spec once we have some feedback from the implementation. I’ll update the PR to js-unixsfs to reflect the most recent updates to the spec and we can move forward from there.

@mikeal
Copy link
Author

mikeal commented Sep 20, 2019

Update:

  1. This change has been given the version number “1.5” ;)
  2. After a good sync discussion it was decided that a better approach is to put this metadata in the directory rather than the file message. This PR can be closed and @Stebalien is working on a new proposal.

@mikeal mikeal closed this Sep 20, 2019
@@ -37,6 +37,12 @@ message Data {

optional uint64 hashType = 5;
optional uint64 fanout = 6;

/* file permissions */
optional uint32 mode = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew which bits of the mode do we need for package managers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain may know this too...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executable is the main one, I think everything else is "nice to have".

@achingbrain
Copy link
Member

After a good sync discussion it was decided that...

@mikeal was this documented anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants