Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlink does not consider if a file has been modified #245

Closed
twifty opened this issue Jun 14, 2017 · 11 comments · Fixed by #345
Closed

Symlink does not consider if a file has been modified #245

twifty opened this issue Jun 14, 2017 · 11 comments · Fixed by #345

Comments

@twifty
Copy link

twifty commented Jun 14, 2017

This is one of those edge cases nobody likes to think about.

A few things I've noticed with symlinks:

  1. If the file.basename is renamed before being piped into symlink, it will generate broken links.
  2. If the file.contents is modified, a symlink is still generated (an error would be preferred).

Admittedly, I'm still using v2.4.4, but I don't see anything in master to suggest this has been fixed.

In my use case, I'm wrapping gulp to play nicely with a PHP framework. I've had to override dest to auto rename files and to log written filenames. One of the problems I'm trying to overcome is that if file contents have not been modified, they must be linked rather than copied. As it stands I have no way to detect this without also overriding src and possibly hashing the contents.

A great solution would be to add a revision property to the file object. This would start out at 0 and be increased any time the content is modified. This will also help solve problem (2) above.

In the mean time, can anybody suggest an ideal workaround for detecting changed contents?

@erikkemperman
Copy link
Member

Currently symlink just looks at the basename of the incoming vinyl file (the target of the symlink) and then creates a link with the same name in the destination directory. So if you change the basename (to something that doesn't exist) you will indeed get a broken symlink.

I guess it's reasonable to expect this to work, i.e. that the basename change only applies to the source but not the target of the link, but not sure how that could be implemented. I suppose we could examine the vinyl's history but that would be problematic with several renames (how far back in history should we go?).

We've looked at detecting changes to the contents, specifically to reflect accurate modification times in the vinyl's stat property:
gulpjs/vinyl#72

But at the time I did not see a clean way to do this. It's not enough to look at just overwrites of the contents property (with a setter) because that won't catch modification of an existing buffer. Wouldn't want to monkeypatch that, I thought. Honestly if you want to be sure, checksum / hashing doesn't sound too bad?

@twifty
Copy link
Author

twifty commented Jun 16, 2017

In my own implementation I simply added a checksum to the vinyl object and use the first history entry as the target. I'm guessing the performance shouldn't be an issue since most of the time, when specifying sources with symlinking in mind, one would pass the buffer: false option. But the file rename is a problem. In a real world case, file names would have to be made url safe, which cannot be done with the current implementation of symlink.

I don't see how detecting the target would be a problem? wouldn't it always be the first history entry, resolved if that itself is a symlink? Any other filename in the chain cannot be trusted to point to a physical file/inode. Even if a file is created on the fly, as with something like gulp-concat, those files are in-memory (have a modified buffer) which should be impossible to create symlinks from.

@erikkemperman
Copy link
Member

I think -- but would need a bit more time to figure out for sure -- that it might be tricky in multi-stage pipelines like this:

gulp.src('in/foo')
  .pipe(rename({basename: 'bar'}))
  .pipe(gulp.dest('out'))  // writes out/bar
  .pipe(rename({basename: 'qux'}))    
  .pipe(gulp.symlink('out')) // expect symlink from out/qux to out/bar, not to in/foo

Pretty contrived though, I admit :-)

@twifty
Copy link
Author

twifty commented Jun 16, 2017

Ahh, I never thought about continuing to use the stream after a dest call.

I would class that as an edge case though, which can easily be fixed by either the user A. piping to another stream which pushes a new file object (with history[0] pointing to out/qux) or B. using one of the existing, or new dedicated, prepareStream options to modify the file object (or return a new basename if adding a dedicated option). Something like:

gulp.src('in/foo')
  .pipe(rename({basename: 'bar'}))
  .pipe(gulp.dest('out'))  // writes out/bar
  .pipe(rename({basename: 'qux'}))    
  .pipe(gulp.symlink('out', {basename: (file) => {return 'bar';}})) // write /out/qux -> out/bar

If basename is not passed, default to using history[0]. Pretty easy to implement IMO, don't know about BC though.

As for the other problem of symlinking to memory, it's a shame the buffer object doesn't emit events on write. It would make tracking changes a lot easier. I wonder if issuing a warning if file.isBuffer would be enough. The end user could then either use the buffer: false option or add on('warning', doSth). At least that way symlink is covering its bases.

@erikkemperman
Copy link
Member

erikkemperman commented Jun 16, 2017

With a dedicated option, the rename({basename: 'qux'}) stage would not be needed, right?

There might be valid reasons to subclass / patch the Buffer class, such as the mtime discussion I linked earlier.

But if the goal is just to prevent symlink from creating dangling links, I think we should just guard against that separately (possibly behind an option like strictLinks or something like that)?

@twifty
Copy link
Author

twifty commented Jun 17, 2017

In the example given, the second rename would still be required. The basename option (perhaps a better name would be linkTarget) is only to aid symlink in resolving the correct symlink target. The basename of the file object would remain the same. Currently, the target is derived from history[ length - 2], which cannot be guaranteed to be a real file, the basename (renamed to linkTarget) would allow a user to override this behaviour and default to history[0] if not given. (I'm saying drop the length - 2 since that's what is causing broken links).

How would the strictLinks option work? If the idea is to validate the target, and cause an error on broken links, how would the user handle such errors? They can't pipe to rename since that is what caused the error, besides it's symlink using the wrong history item where the real problem lies.

Subclassing / patching the Buffer class will likely lead to problems down the road. As in plug-ins will not be able to call file.contents = new Buffer(). Existing plugins would need to be updated to made aware of the new class. I don't know the Buffer internals well enough to offer a solution in this instance. I think it would need to instead be wrapped, inside the File class. But this doesn't catch modifications done outside of the class (on a stored buffer reference).

@erikkemperman
Copy link
Member

@twifty There is some work going on around symlinks at the moment -- I'm going to keep your suggestions in mind and hopefully we can end up with something that makes sense for most scenarios!

@erikkemperman
Copy link
Member

@twifty Yeah, if any patching of Buffer is done things would have to be wrapped in the vinyl setter, here:
https://github.com/gulpjs/vinyl/blob/master/index.js#L174
That way even if plugins do file.contents = new Buffer() they would end up with a patched instance afterward. But it's not pretty.

@phated
Copy link
Member

phated commented Nov 10, 2017

I guess I should chime in here:

  1. This is the correct behavior as I see it - Vinyl objects in-stream are supposed to represent a filesystem and if you change the file path (through basename), you are pointing to a new file in that filesystem.
  2. This is something we want to support but it's a massive overhaul and won't end up in 3.0

I'm going to mark this as "enhancement" because of item 2

@phated phated added this to the Future milestone Nov 10, 2017
@dargmuesli
Copy link

@phated what's your recommended way to set a different name for the symlink then, keeping the same target?

@phated
Copy link
Member

phated commented Jun 11, 2023

Going over this thread again, I don't believe there is a good way to detect/guess this. I'm going to change file.symlink = file.path; to file.symlink = file.symlink || file.path; so a user can set the file.symlink property at some point in the pipeline ahead of calling symlink() because they will know best that they plan to symlink and what the name should be.

This also means that this works

vfs.src('in/foo')
  .pipe(vfs.symlink('out')) // will link to in/foo
  .pipe(vfs.symlink('something')) // will link to out/foo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants