Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing temporary file creation for Windows #665

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

theunraveler
Copy link
Contributor

@theunraveler theunraveler commented Feb 5, 2020

We were having issues with the temporary files created by this library on Windows not being readable by the xmlsec process. I don't totally understand the problem, but it seems like IIS was retaining some kind of lock on the created files, so the xmlsec process couldn't open them for reading. This resulted in 2 different errors:

  • First, xmlsec --verify failed because it could not read the temporary PEM file that was created for the operation
  • After that was fixed, the xmlsec --verify hung indefinitely because it could not access the file created for --output to go into.

I also see that NamedTemporaryFile has some odd behavior on Windows as noted in the Python documentation: https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile

This PR changes the way tempfiles are written on Windows. It still writes them to the temp directory as determined by tempfile.gettempdir(), but just uses regular open instead of NamedTemporaryFile. uuid4() is used to generate a unique-ish filename. This fix seems to work well on our Windows server and locally on Windows machines. I'm not sure if this is the best way to do it though---open to other implementations if you don't like this one.

Thanks!

@codecov
Copy link

codecov bot commented Feb 5, 2020

Codecov Report

Merging #665 into master will increase coverage by <.01%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #665      +/-   ##
==========================================
+ Coverage   65.05%   65.06%   +<.01%     
==========================================
  Files         102      102              
  Lines       25670    25676       +6     
==========================================
+ Hits        16700    16705       +5     
- Misses       8970     8971       +1
Impacted Files Coverage Δ
src/saml2/sigver.py 70.32% <88.88%> (+0.08%) ⬆️
src/saml2/validate.py 73.64% <0%> (ø) ⬆️
src/saml2/request.py 70.71% <0%> (ø) ⬆️
src/saml2/s2repoze/plugins/sp.py 3.17% <0%> (ø) ⬆️
src/saml2/mdstore.py 70.92% <0%> (ø) ⬆️
src/saml2/config.py 84.09% <0%> (ø) ⬆️
src/saml2/assertion.py 86.8% <0%> (ø) ⬆️
src/saml2/response.py 74.62% <0%> (ø) ⬆️
src/saml2/entity.py 85.13% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e759a4...b489b43. Read the comment docs.

@theunraveler
Copy link
Contributor Author

Oh also, the method for creating tempfiles on Windows does not do anything with the delete_tmpfiles flag, but my though was that it's probably OK to just leave them and let IIS clean them up, no?

@peppelinux
Copy link
Member

You can even run a schedular to delete .xml files older than 10minutes

@ioparaskev
Copy link
Contributor

the delete_tmpfiles is simply a flag that is passed to Python's NamedTemporaryFile function in order to trigger the automatic deletion. In your case it seems like NamedTemporaryFile is not working at all the way it should be.

You mentioned two errors:

First, xmlsec --verify failed because it could not read the temporary PEM file that was created for the operation
After that was fixed, the xmlsec --verify hung indefinitely because it could not access the file created for --output to go into.

Can you identify what was that fixed the first error and led to the second error? The first issue sounds like file permissions and the second sound like a strange race condition with file locks.

On the diff per se, using uuid may sound like a fail proof solution but there is no mechanism to control what will happen if the uuid named file already exists. And this is even more possible since there is no automatic deletion mechanism so the files on the TEMP folder will be deleted when the server restarts (or some cron job deletion script runs).

Both of the above two scenarios are handled by NamedTemporaryFile. But there seems to be an issue with Windows NT and later versions that probably affects us mentioned in Python's docs:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

This might probably affect us since the file is opened, written, and then passed around. It seems from reading this issue that the real problem is that when the temp files are created, because the system is on windows the O_TEMPORARY flag must be set. This is mentioned in the issue and you can also see it in the source code of the NamedTemporaryFile:

# Setting O_TEMPORARY in the flags causes the OS to delete
# the file when it is closed.  This is only supported by Windows.
if _os.name == 'nt' and delete:
    flags |= _os.O_TEMPORARY

So I would suggest if possible to test if the issue still exists with the O_TEMPORARY flag set on Windows and modifying the _run_xmlsec function here to also include this flag.

If this works what we have to do is update the documentation and set the _run_xmlsec function when using Popen to have this env flag set to whatever the system has it set.
If this doesn't work, I suggest we use tempfile.mkstemp (if it works) instead of creating files with uuid. This would solve the name clashing possibility but we will still have to invent a way to delete these files. So my hopes are that by setting the O_TEMPORARY flag we could avoid intricate OS dependent mechanisms.

@theunraveler
Copy link
Contributor Author

Thanks for doing the research and the detailed response.

I don't see a way to pass the O_TEMPORARY flag anywhere in the code (I think you're suggesting to pass it to Popen()? I don't think that will work, and NamedTemporaryFile does not provide a way to pass flags).

If we use mkstemp, do we really need to delete the tempfiles? Most OSes have some mechanism for deleting tempfiles on a scheduled basis, right? It just seems really difficult to do with the way that open file objects are passed around and read from in the code. I guess I could write a context manager class that does this though?

@theunraveler
Copy link
Contributor Author

Can you identify what was that fixed the first error and led to the second error? The first issue sounds like file permissions and the second sound like a strange race condition with file locks.

The first error was fixed by removing NamedTemporaryFile from make_temp, which was used to make the temporary PEM and XML files. The second was fixed by using the new _make_temp function (the same mechanism) in _run_xmlsec, which was used to make the file that xmlsec uses for --output.

@ioparaskev
Copy link
Contributor

Can you identify what was that fixed the first error and led to the second error? The first issue sounds like file permissions and the second sound like a strange race condition with file locks.

The first error was fixed by removing NamedTemporaryFile from make_temp, which was used to make the temporary PEM and XML files. The second was fixed by using the new _make_temp function (the same mechanism) in _run_xmlsec, which was used to make the file that xmlsec uses for --output.

When you say "removing NamedTemporaryFile" you mean that you replaced it with some other file creation function?
Now I'm a bit confused as to what the first issue was about exactly. Was it that the file got automatically deleted (if so did you try setting the delete_tmpfiles option to False to see if this error would still exist?) or was it because Windows held a lock on the file and didn't allow it to be opened from another app (but if that was the case it would mean that your solution would also not work, right?)

Thanks for doing the research and the detailed response.

I don't see a way to pass the O_TEMPORARY flag anywhere in the code (I think you're suggesting to pass it to Popen()? I don't think that will work, and NamedTemporaryFile does not provide a way to pass flags).

Pass it to Popen by writing Popen(com_list, stderr=PIPE, stdout=PIPE,env={'O_TEMPORARY': 1}). It should also be set to the environment that is running/using pysaml

If we use mkstemp, do we really need to delete the tempfiles? Most OSes have some mechanism for deleting tempfiles on a scheduled basis, right? It just seems really difficult to do with the way that open file objects are passed around and read from in the code. I guess I could write a context manager class that does this though?

Since we provide a way to *nix systems to automatically delete these temp files (and don't rely to some third party script/cron job to do it) for the sake of consistency I think we should also provide a way to do it in Windows as well

@natehawkboss
Copy link

When you say "removing NamedTemporaryFile" you mean that you replaced it with some other file creation function?

This does not function as expected in Windows and this issue is referenced in the python documentation.

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
https://docs.python.org/3.8/library/tempfile.html

Essentially, we could not open this file again on our windows environment, which meant that xmlsec was unable to use the file for verifying the signature.

# `NamedTemporaryFile` is not very reliable on Windows, so we'll make a
# tempfile a different way.
if sys.platform == 'win32':
return open(os.path.join(gettempdir(), '%s.%s' % (uuid4(), suffix)), 'w+b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this ever be cleaned up?

@c00kiemon5ter
Copy link
Member

c00kiemon5ter commented Feb 28, 2020

This is very nasty on Python's part. I am really sad for this state of things.


The option being discussed, is to implement "NamedTemporaryFiles" some other way. To be complete, this should support deleting/cleaning-up those files, when closed. Is there something like that out there? What do other projects do?


Another option is to stop using files and use actual bindings to the xmlsec lib. I would prefer we invest time in this option.

There is already a wrapper for xmlsec: https://github.com/mehcode/python-xmlsec
We must make sure the API coverage is enough to cover our use-cases. One thing that I notice is that it seems that it only works along lxml; this is fine with me - as part of the task to refactor all xml operations I had though of only supporting lxml, as it seems to provide way better tools to handle xml, than the builtin parsers.

Another lib we can look into is signxml: https://github.com/XML-Security/signxml

Would someone like to see how we can use this to sign/verify/encrypt/decrypt an xml document?

@ghost
Copy link

ghost commented Jul 24, 2020

I've encountered this issue as well, it seems to be entirely a Windows problem in so much that temporary files created with the O_TEMPORARY flag seem to be only accessible by the creating process and any subprocesses (i.e. xmlsec) are denied access. I ended up solving it by monkey patching tempfile._TemporaryFileCloser and tempfile.NamedTemporaryFile so that O_TEMPORARY was never set and that closing the file would manually delete it, instead.

Naturally, I wouldn't recommend that solution to everyone, though.

@ioparaskev
Copy link
Contributor

I've encountered this issue as well, it seems to be entirely a Windows problem in so much that temporary files created with the O_TEMPORARY flag seem to be only accessible by the creating process and any subprocesses (i.e. xmlsec) are denied access. I ended up solving it by monkey patching tempfile._TemporaryFileCloser and tempfile.NamedTemporaryFile so that O_TEMPORARY was never set and that closing the file would manually delete it, instead.

Naturally, I wouldn't recommend that solution to everyone, though.

thank you for the feedback since I didn't have a windows vm to try this solution out. As it seems, the fact that we use a subprocess call to xmlsec beats the windows flag workaround. So we'll have to find some other way to do this -not monkeypatching internal libraries of course but something different in terms of handling these files.
I'll have a look and see what else we can do

@talebi1
Copy link

talebi1 commented Sep 9, 2020

Having this same issue on windows, any ETA when this will be merged?

@ioparaskev
Copy link
Contributor

Having this same issue on windows, any ETA when this will be merged?

Not yet to be honest. This needs some discussion on how to handle and when to free up these files. We should come up (if possible) with a common solution for all OS instead of adding flags to check if it's windows or some other *nix OS

@earonesty
Copy link
Contributor

earonesty commented Jan 25, 2023

i ran into this problem. NamedTemporaryFile should never be used on windows or linux. it is fully bugged and can even result in an an infinite loop. uuid mechanism is the best mechanism. check out: python/cpython#66305 and https://stackoverflow.com/a/58955530/627042

@c00kiemon5ter
Copy link
Member

c00kiemon5ter commented Jan 31, 2023

hi @earonesty, and thanks for commenting back on this. I think using uuid is a viable way forward. The code needs to be reworked to clean up the files. But your gist is also interesting to hide the inner details behind a familiar interface.

The alternative choice would be to start using the xmlsec bindings and avoid creating files altogether, but that would go into a separate backend.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 65.06%. Comparing base (4e759a4) to head (b489b43).
Report is 482 commits behind head on master.

Files with missing lines Patch % Lines
src/saml2/sigver.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #665   +/-   ##
=======================================
  Coverage   65.05%   65.06%           
=======================================
  Files         102      102           
  Lines       25670    25676    +6     
=======================================
+ Hits        16700    16705    +5     
- Misses       8970     8971    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants