-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRESOLVER-372] Rework the FileUtils collocated temp file #364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this helps, ATOMIC_MOVE will replace an existing file already as far as I know, the problem described in MRESOLVER-372 is that the file is opened by another process so you can't write to that file, whatever you try under windows (linux is different!)
So what happen is this:
- Resolver/Maven/... has downloaded this file previously (or maven install)
- Now some IDE, Tool, Mojo, ... open that file for reading
- while the file is open another build request that file and there is a new SNAPSHOT (from snapshot repo) and resolver tries to download and replace the SNAPSHOT file --> AccessDeniedException
So there are some ways to circumvent this:
- When resolver is asked for file and can't move it return the temp file before the move -> maybe results in multiple downloads but better than nothing
- Use time-stamped filenames instead of -SNAPSHOT filename
- Use some kind of "link-file" named -SNAPSHOT that points to the "real" file
I really hope "atomic on windows" will NOT replace the file and then throw 😄 otherwise, am really unsure what that OS is able to guarantee for at all. |
If you check older resolvers (and let's assume "they worked"), they used pre-nio2 copy+delete, so they also did this but with a "window of possibility" to allow other process to read partially written file. That was implemented using pre-nio2 Java. All that resolver 1.9.x did, is to migrate the code to nio2 AND use atomic. My guts are telling me that the "atomic" is the problem here... Otherwise, same problems would be reported with resolver 1.8, as it also:
Really, 1.9 did change NOTHING in this respect, it merely rewrote FileProcessor to use NIO2 and Atomic FS Operations |
You can just look into the Widnows File system implementation of the JDK to see whats going on.
"Old" code simply ignored when replace goes wrong, so you have the problem that afterwards you maybe use the old file. I don't say that new way is "wrong" just that not using atomic move does not solve the problem reported in the issue that is some other process (or the current) is currently opend that file and in such case you cam't overwrite/delete/replace/move ... to that file under windows. |
@slawekjaranowski @laeubi reworked fully... After a LOT of reading, seemingly we had several issues:
|
Fixes: * move() call should NOT perform the move, as writer stream to tmp file may still be open * move the file move logic to close, make it happen only when closing collocated temp file * perform fsync before atomic move to ensure there is no OS dirty buffers related to newly written file * on non-Win OS fsync the parent directory as well. --- https://issues.apache.org/jira/browse/MRESOLVER-372
c7928cc
to
9a73c1d
Compare
Frankly, I'd remove the fsync on directory, as unlike Lucene (that depends on readability of new commit marker), Maven uses layout, so any other process will calculate the path and go directly for artifact jar.... it does not need to "list" the directory to discover new artifacts... |
if (!IS_WINDOWS) { | ||
fsyncParent(tempFile); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here ... delete tempFile can also change directory content ... so maybe should be sync after delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fsync
is not important, then this commit is nearly wrong. If should simply be a matter of making sure the calls are ordered correctly ?
Line 496 in c6b8a72
tempFile.move(); |
The call to move should be performed after the commit which actually writes to the file ?
maven-resolver-util/src/main/java/org/eclipse/aether/util/FileUtils.java
Show resolved
Hide resolved
I tested this on Windows and it does not fix the problem. The way I tested it was:
I agree with @laeubi's assessment. If a file is open for any reason, it can't be replaced. At least, I've never heard of any way of doing it on Windows. Linux is a different story. |
} | ||
|
||
@Override | ||
public void close() throws IOException { | ||
if (wantsMove.get() && Files.isReadable(tempFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not safe. The CollocatedTempFile
should provide a way to create an OutputStream
so that it can be closed before the CollocatedTempFile
. This is needed so that buffered streams are fully written to disk before moving the file.
See
Lines 90 to 96 in e721b01
try (InputStream in = new BufferedInputStream(Files.newInputStream(source.toPath())); | |
FileUtils.CollocatedTempFile tempTarget = FileUtils.newTempFile(target.toPath()); | |
OutputStream out = new BufferedOutputStream(Files.newOutputStream(tempTarget.getPath()))) { | |
long result = copy(out, in, listener); | |
tempTarget.move(); | |
return result; | |
} |
The code is using a
BufferedOutputStream
so that it may not be completely written to the disk. I'm not sure if there's any guarantee on the ordering of the close()
methods in the try/finally
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so order is specified as being the reversed order of declaration, which makes sense of course, so this should work because the BufferedOutputStream will be closed just before the CollocatedTempFile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case for output stream being buffered should be handled correctly imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit more in depth look at the code, I think this is a good improvement and closes some holes where the writer could write to the file after the move. If this is not sufficient, it will surely help.
There is one use case I have no idea what to do with, but I do know what I don't want to do:
In this case, if resolver would "silently give up" (for example an installation attempt), it would mean it is intentionally leaving your local repository in an inconsistent state. And this is unacceptable for me. Hence, "ignoring IOExceptions" i consider as a non-option: no software should intentionally make transition from "consistent" to "inconsistent" state. Least I can do is implement similar trick as for directory fsync, and make resolver broken on windows. But this would imply, that we have to state somewhere (release notes? site?) that "by design, resolver is not supported on Windows"? Who will support then errors happening on windows, JIRAs like "I installed something with mvn install and IDE/other mvn process does not see the change?" |
Similarly, @laeubi i don't see pre1.9 resolver code as "forgiving" and swallowing IOEx, here is the same method from 1.6.0: If target is open by something else (ie IDE), line 166 would fail, no? So to me it seems "when file is opened by something else" failed Resolver 1.6.0 as well. So, imho, this use case is red herring: we cannot do anything here. @rdicroce ping also ^ |
So for shits and giggles, I decided to paste the 1.6.0 file copy code into the 1.9.x branch. Here's what it looks like:
Then I tried the same test as before. This code did NOT throw an exception. And it appears to have actually worked. The file on disk has different timestamps inside the JAR. Eclipse even seems to have noticed that it changed, although that's difficult to say for sure. Why does this work and the other code doesn't? I have no idea. Probably has something to do with the fact that this code actually copies the content of the file, byte by byte. Whereas the MoveFile approach probably only tries to change the actual location on disk that the file points to. |
Wow, @rdicroce Thanks for testing this! So, this for me proves even more, that problem lies within WindowsFS.... As I said, my intent with use of "atomic moves" was actually to prevent other processes to end up reading partially written file content... And this code @rdicroce tested, while does work, does not prevent this.... So, maybe IF win old_code ELSE new_code? Hm |
I propose this last change as final PR, will ask for re-review from everyone adding review so far. |
@cstamas I'm a bit late to the party but as @rdicroce already tested showed I cna only tell that Maven 3.8.x ("old"? resolver) has worked under Windows why upgrade to 3.9.x ("new"? resolver) shows reproducible problems in this regard. I'm not a windows expert enough to tell but think @olamy has done already some analysis in the past about the Move/Sync problem. In general I think it would really be beneficial if resolver would simply use time-stamped SNAPSHOTs instead of "normalized", maybe one can even for a while simply write the SNAPSHOT as an additional file (on windows) and symlink under Linux but resolver use/returns the timestamped file to start a migration? |
@laeubi me having done some analysis about file locking on windows. uhmmm LOL you may confuse with someone else ;) |
Hm... it should have been @michael-o sorry for the confusion no idea why github has suggested your name and I didn't notice :-D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
// Logic borrowed from Commons-Lang3: we really need only this, to decide do we fsync on directories or not | ||
private static final boolean IS_WINDOWS = | ||
System.getProperty("os.name", "unknown").startsWith("Windows"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have plexus utils code for this, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and SystemUtils
in commons-lang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if external library is not used I would not like add next dependency for one simple function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if external library is not used I would not like add next dependency for one simple function
With this, I agree, but if we use it anyway, I wouldn't write the code myself.
Files.deleteIfExists(tempFile); | ||
} | ||
}; | ||
} | ||
|
||
/** | ||
* On Windows we use pre-NIO2 way to copy files, as for some reason it works. Beat me why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beat, why?
So, to recap:
To explain this last bullet: there was https://issues.apache.org/jira/browse/MRESOLVER-325 with solution #259 that "seems similar", as originally this code was used by both, and in that case "high frequency atomic moves" was applicable. But this now happens seemingly on "install" and "cache" (download, place it in local repo) that has way less frequency that tracking file write.... |
Fixes: * move() call should NOT perform the move, as writer stream to tmp file may still be open * move the file move logic to close, make it happen only when closing collocated temp file * perform fsync before atomic move to ensure there is no OS dirty buffers related to newly written file * on windows go with old code that for some reason works (avoid NIO2) * on non-Win OS fsync the parent directory as well. --- https://issues.apache.org/jira/browse/MRESOLVER-372 Backport to 1.9.x branch of the #364
Fixes:
So, to recap:
To explain this last bullet: there was https://issues.apache.org/jira/browse/MRESOLVER-325 with solution #259 that "seems similar", as originally this code was used by both, and in that case "high frequency atomic moves" was applicable. But this now happens seemingly on "install" and "cache" (download, place it in local repo) that has way less frequency that tracking file write....
https://issues.apache.org/jira/browse/MRESOLVER-372