-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TAR: Implement extraction and archiving of hardlinks. #286
base: master
Are you sure you want to change the base?
Conversation
My intention with this is to be able to create TAR archives with hardlinks with maven-assembly-plugin. However, I am not sure if this is the best possible way to do this. I am not sure if it fits right into the PlexusIoResource framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing. Hard links is something we still missing to better support tar files.
This change does not cover some edge cases. Lets say file
and link
are file and hard link in a tar file. In the most straight forward case we are going to extract both files and everything will work fine. But there are other cases.
We can extract only link
. If file
is not present then I'm not sure the proper exception is caught in order to just create the file. But more importantly if file
do exist on the file system, a hard link to it would be created instead of creating new file.
Plexus Archiver have a feature named file mappers. So after extracting file
may be called someFile
. What we should do in such cases?
Also does the order of the entries matter. Is it possible that link
is extracted before file
and in such case would file
be hard link to link
?
if (entry.getType() == ArchiveEntry.SYMLINK) { | ||
final SymlinkDestinationSupplier plexusIoSymlinkResource = | ||
(SymlinkDestinationSupplier) entry.getResource(); | ||
final SymlinkDestinationSupplier plexusIoSymlinkResource = (SymlinkDestinationSupplier) ioResource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to hard links? It seems that it is just refactoring to avoid calling entry.getResource()
multiple times. If this is the case I would rather have this in separate PR or at least separate commit as the change is already complex enough to understand on its own.
final Path file = fileResource.getFile().toPath(); | ||
if (Files.exists(file)) { | ||
final BasicFileAttributeView fileAttributeView = | ||
Files.getFileAttributeView(file, BasicFileAttributeView.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather use LinkOption.NOFOLLOW_LINKS
to be on the safe side, although it seems that symbolic links are already handled (haven't tested it).
@@ -233,18 +241,43 @@ protected void tarFile(ArchiveEntry entry, TarArchiveOutputStream tOut, String v | |||
} | |||
} | |||
|
|||
boolean doCopy = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this name confusing. isLink
, or something else, probably would be more appropriate as it seems that the flag indicates whether the entry is link (symbolic or hard) to some other entry.
if (Files.exists(file)) { | ||
final BasicFileAttributeView fileAttributeView = | ||
Files.getFileAttributeView(file, BasicFileAttributeView.class); | ||
if (fileAttributeView != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to extract this check to separate method to make this method easier to read.
} | ||
} | ||
} | ||
if (te == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I don't follow why this is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
te
is assigned either from the symlink branch or the hardlink branch. But the hardlink branch can also not assign value at all. This handles that case plus all other cases of non-symlink&non-hardlink.
@@ -278,8 +280,9 @@ protected void extractFile( | |||
String entryName, | |||
final Date entryDate, | |||
final boolean isDirectory, | |||
final boolean isSymlink, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking change. I wonder if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how.
Also we need to be extra careful when extracting multiple copies instead of creating hard links. If a tar file contains single 1 MB file and thousands of hard links, then extracted (if files are copied) could be 1 GB, possibly resulting in DoS attack. |
doCopy -> isLink
Use NOFOLLOW_LINKS to get file attributes for hardlink.
FYI, I will not be pursuing this pull request further. I was curious if it could be done. But it is unclear to me how to do this so that it matches all the requirements and design criteria of the library. Feel free to pick it up or close it. |
This implements hardlinks in TAR. There is also a test to prove it.