-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task remote init: fix strange tempfile issue #2976
task remote init: fix strange tempfile issue #2976
Conversation
Python tempfile module docs for So.. it seems suprising that "file not found" would occur for access via the filename, but not via the filehandle, if the file actually exists?? (and if it doesn't exist, then the filehandle ain't going to help!) Just wondering! |
No, I don't understand it either. The failure is very repeatable on that system without the fix, so this fix is not a coincidence. I'll polish it up slightly tomorrow, and may even add a unit test for the new logic. |
Had a look at https://bugs.python.org/ but found nothing that could explain it. Reading our code in
In the comment "time passes"... we have a reference to the name only, which would make the object eligible to GC. But we also pass an array with the reference to the file, so it shouldn't be collected... unless that array is not stored anywhere of if it's variable is Note from their documentation for TemporaryFile: It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected). I think it would be applied to That's my best guess :-) but this fix looks good for me 👍 |
(and there are differences for GC between Python versions, as well as OS's... so it could be some special behaviour... Py 3.7 I think is getting a threaded GC... so that would be... 2 GC algorithms for Python I think, with one being able to run in a thread? I think the JVM has something like 10? anywho) |
It only happens on a platform that runs Python 2.6.9 (but it does not happen on our other platforms that run Python 2.6.6 and Python 2.7.5). Any way, I'll not investigate further and just treat it as a brain surgery - we poke at the system and hope to see better result. (In this case, we have a working system on the platform + seemingly better code.) |
Garbage collection in the main process does sound like a plausible explanation here, as the temporary file is used in the subprocess??. If so, are you sure that use of filehandle instead of file name is not in danger of suffering the same problem? (I still can't see why that would make any difference!). Python tempfile docs says the file... Perhaps we should be using non-temporary files and managing their deletion ourselves? (Apart from not understanding this, the code looks fine!) |
The file handle is not closed until the callback, so it should be safe. |
Ah, so your fix does make sense from a GC perspective? (Are references to the file object lost earlier on master?) |
No, in the master version, the file name is passed down to the command context, but the file handle should also be preserved as the callback argument. There should be no GC. |
I think this one should work. Assuming it was GC, now that we are explicitly keeping an instance of the file handle, I think it should not be collected. |
OK, let's merge this. |
Sorry @matthewrmshin - this just got conflicted by the Python 3 merge! (you'll also need a 7.8.x PR). |
973ac51
to
6c4fb4a
Compare
Tests happy again. |
@wxtim please sanity check. |
Users are experiencing issues on one of the platforms we deloy Cylc where remote submission fails intermittently with a file not found error complaining that a temporary file cannot be found. (The file is used for piping service files via STDIN (to the SSH) command.) In this change, we pass the temporary file handle directly instead of passing the name of the temporary file to the process pool. This appears to fix the issue. (Note: Before cylc#2590, we were unable to pass the file handle because the context needs to be serialised for the multiprocessing.Pool. This is no longer a requirement in our own subprocess pool, so it is now safer to pass the handle.) Add basic unit tests for running command with STDIN in subprocess pool. Rename SuiteProcPool to SubProcPool.
6c4fb4a
to
430b880
Compare
Squashed, rebased, broke up tests into separate methods.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty sane to me. 👍
Users are experiencing issues on one of the platforms we deploy Cylc
where remote submission fails intermittently with a file not found error
complaining that a temporary file cannot be found. (The file is used for
piping service files as a tar archive via STDIN (to the SSH) command.) This should not
be the case, as the temporary file handle should still be referenced,
but clearly there was an issue.
In this change, we pass the temporary file handle directly instead of
passing the name of the temporary file to the process pool. This appears
to fix the issue. (Note: Before #2590, we were unable to pass the file
handle because the context needs to be serialised for the
multiprocessing.Pool. This is no longer a requirement in our own
subprocess pool, so it is now safe to pass the handle.)
Previously flow of logic roughly looks like this:
The new flow of logic roughly looks like this:
Also renamed
SuiteProcPool
toSubProcPool
to match module name andSubProcContext
.