-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled OpenSSL.SSL.WantReadError and WantWriteError raised by PyOpenSSL #245
Comments
Try using the I think, we may need to implement some retry logic based on: |
I was getting WantWriteError on very large js files and we did this monkey patch
|
@jpeckham Thanks a lot for this! @webknjaz and for any one else: I was able to fix this by replacing to read call at
with
safe_call is specifically defined in pyopenssl.py for this reason. @webknjaz This fix looks good? |
Works for me. Is that Rohitbels's workaround safe for fixing the issue @webknjaz ? |
I suggest changing the description of this issue to "OpenSSL.SSL.WantReadError and WantWriteError", because both errors occur for similar reasons. (I encountered the one with WantWriteError, which is probably the more likely of the two because you're most likely to send big data than receive, but I'm only guessing.) Thanks a lot @jpeckham, your workaround helped me. If anyone can't reproduce the issue, this indicates that the problem might be OS-dependent. And, by the way, pyca/pyopenssl#176 says that sendall() should not be used at all until fixed because the same exception can occur and will cause loss of state. @webknjaz, I was originally using the |
I'd guess, it's more of an OpenSSL backend dependent. It may be pyOpenSSL is a wrapper around cryptography and cryptography is a wrapper around OpenSSL. So it may be compiled against many different OpenSSL versions that may behave differently, or just have different compilation settings in general.
Interesting. We may want to integrate a tool like that into our test suite one day... As for why WantReadError and WantWriteError happen: it's usually because some buffers get full and the expectation is that the calling code will do retries. |
Faced with above issue, builtin module is working perfectly, but when audit suggest to use only tlsv1.2 and tls1.3, i didn't now how to implement with builtin, so chanhed logic with pyopenssl and using context allowing only tlsv1.2 and tls1.3. Unhandled exceptions raised when server response big js files, architecture is react+Django. After applying changes from @vashek everything works perfectly and there is no error on server side. When do you plan to accept above change from @vashek and create release. |
I suppose you could use the context setter on the adapter: https://github.com/cherrypy/cheroot/blob/473c546/cheroot/ssl/builtin.py#L258. Just make sure to repeat some logic from the initializer.
Good to know! |
In cherrypy documentation it says that context is valid only for pyopenssl module. thanks for your advice and help, it works great with builtin ssl. |
I moved an old cherrypy web app to a new environment and started getting both WantReadError and WantWriteError errors and had to apply two different fixes to get things to work:
So far this seems to have fixed the problems. Am surprised these things haven't been fixed already considering the age of this issue. Am I missing something? cherrypy v18.8.0 |
I'm wondering about that, too. Perhaps a Tidelift subscription would help. (I actually checked that out but there's no pricing on the website.) |
Yeah I checked after you mentioned it, but Tidelift said their minimum subscription was for a team of 25 developers and even then they wouldn't tell me how that costs but I am guessing $$$$. So not really an option as far as I am concerned. And even then, what is the probability they have fixed this particular issue? |
Tidelift's customers are corporate, not individuals. Their model spins around redistributing some funds acquired from the subscribers across the participating projects' maintainers. AFAIK, they ask the clients to run some software that collects what they use. And the distribution of funds to the maintainers highly depends on the use of said OSS project. My impression is that most projects with low-to-middle popularity get under $50 or, more often, zero (there's some threshold for starting getting some cash for a project). Those funds are per-project and if a project has more maintainers, they are split (usually equally). Say, a project has 2 maintainers, and just crossed the threshold for being lifted, that means they'd be getting like $25 a month each. This is to say that Tidelift is a say for businesses to give back to OSS, but (1) it doesn't replace people's salaries, nor (2) is it a good fit for individuals. If you want to support maintainers as a non-business, direct donations may be more impactful. Though, it's typical for people to treat them as little $1 tokens of appreciation rather than full-fledged support. Another thing that might be unobvious with Tidelift β they let the maintainers lead their projects without influencing them too much. For example, they don't file feature requests or stuff like that. Though, they ask us to verify some packaging metadata, licenses, the security of accesses to package indexes etc., as well as communicate some vulnerability details (we can optionally delegate accepting security reports to them). Note that both maintainers for the CherryPy projects have many other projects going on. Some get more attention, then others, naturally. I also burn out occasionally, up to various degrees. When this happens, I have much less (or zero) energy for FOSS. Last year, it was rather intense (still is, really) β you can see the activity drops in the contribution graph in my profile and how they correspond with pings that accumulate in the old PRs. I'm trying to get to every PR, but sometimes they get lost or stuck for different reasons.
Now, back to this particular issue. I don't remember exactly what blocked @vashek's #332, but I think that originally, it was introducing unrelated changes. They were reverted half a year later, but the notification got lost (maybe, because the PR was marked as a draft). I'll take another look and maybe merge as is β I've just clicked βrebaseβ there to see if the CI explodes. Looks like there's a linting violation to fix at least. I recall that unrelated change sending me off trying to make some fixes upstream, but I didn't succeed implementing tests for them β pyca/pyopenssl#954 / pyca/pyopenssl#955. It'd be nice if somebody could help out with those. I'm also aware of a few seemingly similar issues that would be great to fix. But testing TLS stuff automatically doesn't always work or behaves weird in CI. I've been mostly working on ways of making the CI and local tests as stable as possible. I think that we need more contributions improving testing of TLS. It's rather hard to make a full matrix since some components are coming from the GHA VMs and not PyPI. With that in mind, it's hard to judge if PRs like #332 would contribute to the flakiness or stability. Hence, this causes indecisiveness on my part, sometimes. |
So further to my previous post, I am still seeing "bad write retry" occasionally although not as often as I was. So, evidently, the fixes I applied have not solved all of the problems although they seem to be less severe than they were. One problem with debugging these errors is that they are hard to recreate reliably. Their occurrence is infrequent and somewhat random. Any suggestions? Here's a typical error trace: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/cheroot/server.py", line 1313, in communicate
req.respond()
File "/usr/local/lib/python3.10/dist-packages/cheroot/server.py", line 1082, in respond
--
File "/usr/local/lib/python3.10/dist-packages/cheroot/wsgi.py", line 139, in respond
self.write(chunk)
File "/usr/local/lib/python3.10/dist-packages/cheroot/wsgi.py", line 221, in write
self.req.write(chunk)
File "/usr/local/lib/python3.10/dist-packages/cheroot/server.py", line 1140, in write
self.conn.wfile.write(chunk)
File "/usr/local/lib/python3.10/dist-packages/cheroot/makefile.py", line 86, in write
res = super().write(val, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/cheroot/makefile.py", line 29, in write
self._flush_unlocked()
File "/usr/local/lib/python3.10/dist-packages/cheroot/makefile.py", line 38, in _flush_unlocked
n = self.raw.write(bytes(self._write_buf))
File "/usr/lib/python3.10/socket.py", line 723, in write
return self._sock.send(b)
File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1723, in send
self._raise_ssl_error(self._ssl, result)
File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1637, in _raise_ssl_error
_raise_current_error()
File "/usr/lib/python3/dist-packages/OpenSSL/_util.py", line 57, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', '', 'bad write retry')] |
@julianz- it's useful to see a traceback with different call stack. It would be helpful, though, if you could have a branch in a fork with the exact patches applied along with such traces. And if you make changes, maybe make branches with corresponding patches each time so that we know what each traceback corresponds to. It is very important to understand how to reproduce the problem because it would allow the maintainers to validate possible fixes / PRs, especially when there are no tests bundled. One of the things people here could help with is getting pyca/pyopenssl#955 / pyca/pyopenssl#954 working β switching to I'm not experienced with debugging TLS' internals deeply, unfortunately, but I recall @mxii-ca contributing related code and being helpful with debugging/pointers. I'm tagging them in hopes that they may be open to helping out here. |
Thanks webknjaz. Very helpful. Will try to set up a branch/fork with my changes. However, in the meantime I did want to let you know a couple of observations I made doing some experiments. In the modified version of _flush_unlocked() I am using (see below - based on jpeckham's suggestion in this thread, I have also added a short sleep time before the retry which borrows from the way safe_call() in pyopenssl.py handles this error -- although I should add that the safe_call version doesn't include an instruction to restore the buffer which is presumably why I was seeing bad write retry errors when I tried using it? In my version, though, I tried commenting out the line n=0 in the exception handler, and then I occasionally see bad length errors coming from openSSL instead of bad write retry errors. Restoring that line, and increasing the sleep time to 0.1 seconds instead of 0.01 seconds, I see occasional retries but so far no errors. So it seems pretty clear that recreating these errors depends on some kind of stress testing which is what I think you were doing in pyca/pyopenssl#955 although the title implies it might only be testing for WANT_READ errors? I see some WantWriteError exception handlers in your code so I wasn't sure.
|
my understanding is that when pyopenssl does not appear to do this. nor does it expose a constant for this value, however, it should be possible to enable it via for more details see:
|
Thanks for the pointers! I'll have to explore them further... |
Thank you mxii-ca! That's great to know. Contrary to my initial impressions, my change to the sleep time in _flush_unlocked() did not get rid of bad write retry errors. While it did seem to make them less frequent, that could have just been the placebo effect. Your explanation makes perfect sense so I have now implemented a change to see whether changing the mode to accept a moving buffer clears the problem once and for all. I wasn't sure the best way to do this but made the following change in OpenSSL/SSL.py:
which replaces the line:
So far this change looks promising (I see occasional retries with zero bad write retry errors) but maybe there is better way or place to do this? |
Just wanted to follow up by saying the fixes I have applied described above have completely cured the bad rewrite errors I was getting. Can these be incorporated into the official code somehow? |
This constant was removed in pyca@895de04 but it is still needed to deal with an issue in PyOpenSSL described here cherrypy/cheroot#245 and PR pyca/pyopenssl#1242.
This constant was removed in 895de04 but it is still needed to deal with an issue in PyOpenSSL described here cherrypy/cheroot#245 and PR pyca/pyopenssl#1242.
β¦NG_WRITE_BUFFER In order to fix issue described here cherrypy/cheroot#245, we need to use this constant that was removed from https://github.com/pyca/cryptography but now restored
@Kodiologist @julianz- @mxii-ca
Yes, that is a PyOpenSSL file, but the suggestion that this problem lies with PyOpenSSL and not cheroot is absolutely, 100% misguided. It is not PyOpenSSL's job to expose a retry option-- it is cheroot's. PyOpenSSL just implements OpenSSL, which in turn exposes this error as described at https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html, which in turn states: "There is no fixed upper limit for the number of iterations that may be necessary until progress becomes visible at application protocol level" The correct solution is for cheroot to support a configuration option for retry-- possibly similar to how curl provides retry options-- that, as appropriate, implements a retry of the |
I've figured out a workable solution for handling SSL_WANT_WRITE errors which implements a retry-- similar to what ssl/pyopenssl.py does (note however that although the code in ssl/pyopenssl.py appears to address SSL_WANT_WRITE, it does not, at least in some cases, because it is entirely bypassed when the above self.raw.write() call is made) and consistent with the pyopenssl documentation. I'd be happy to share this, but it's not sufficient-- see more below. However, I have not been successful in handling SSL_WANT_READ for very large files. It would appear that the problem resides in either cheroot or pyopenssl, given that the issue does not arise when using the built-in SSL (rather than pyopenssl). So to summarize:
To address this, either of these should be done:
It seems like the first option above is easily done today. Any reason not to go ahead with that? Are there any features of using the 'pyopenssl' adapter that have no corresponding equivalent when 'builtin' is used? |
Don't think we are talking about exposing a new retry option in PyOpenSSL. The retry logic remains in Cheroot. The only change I proposed was to allow PyOpenSSL to call OpenSSL with a buffer that might have moved to a new location. Currently OpenSSL doesn't allow this although it used to. Getting Cheroot to retry sending the buffer is pointless once it's moved to a new location given the way PyOpenSSL currently talks to OpenSSL. Hence the need for small fixes in both PyOpenSSL and OpenSSL and not in Cheroot. |
@julianz- perhaps changes to pyOpenSSL are needed but without a doubt, changes to cheroot are needed to fix these bugs. The existing cheroot retry code does not even get touched when raw.write() is invoked and so the PyOpenSSL exception just gets immediately raised without any form of handling. In other words, it appears that the authors of the cheroot retry code might be under the incorrect impression that their retry code is covering all the places it should be covering. It isn't. |
@liquidaty I had forgotten that I also made some changes to cheroot/makefile.py along the lines suggested by @jpeckham and @Rohitbels earlier in this thread so yes you are right on this point. The bug has been around so long that it took me a while to figure out what I had done to fix it locally. Specifically, the changes in makefile.py I made are as follows: def _flush_unlocked(self):
self._checkClosed('flush of closed file')
while self._write_buf:
try:
# ssl sockets only except 'bytes', not bytearrays
# so perhaps we should conditionally wrap this for perf?
n = self.raw.write(bytes(self._write_buf))
except io.BlockingIOError as e:
n = e.characters_written
#added retry logic
except (SSL.WantReadError,SSL.WantWriteError, SSL.WantX509LookupError) as e:
n = 0
del self._write_buf[:n]
def read(self, *args, **kwargs):
"""Capture bytes read."""
#val = super().read(*args, **kwargs)
#according to https://github.com/cherrypy/cheroot/issues/245 to prevent WantReadError()
# use the following alternative
val = self._safe_call(True, super().read, *args, **kwargs)
self.bytes_read += len(val)
return val Maybe the _safe _call method can be used for the write method too but this didn't work when I tried it originally. With the moving buffer fix maybe it might though I think we still need to reset the number of written bytes back to 0. The moving buffer fix depends on the release of version 42.0.0 of cryptography which is still under active development with no release date yet. So we are kind of stuck until this becomes available. |
@julianz- Got it. I can see where those changes are coming from but I think they need further adjustment to work correctly. I think I have half the puzzle solved, but not the whole thing. On the "write" side, using your changes under my tests:
Putting these together, here is what worked for me:
Re read:
|
@liquidaty It seems like we are experiencing very different phenomenology. The fixes I described completely solved the errors I was seeing. Under what circumstances do you see these WantReadErrors and WantWriteErrors? How often do you get these errors? For me, the errors occur seemingly randomly and not very often - which made this issue hard to troubleshoot. I haven't found a way to produce them on command. Are you able to reliably reproduce these errors? If so, maybe you could propose a test script? Regarding your write code, why did you remove the line that sets n to 0? I found this necessary as the retry needs to resend the whole buffer again. Also your retry is not handled by an exception, so if the retry fails your code is going to break no? And having a delay on the order of half a second seems way too long for the kind of retries I am experiencing - the default delay in safe_call is 0.01 seconds which seems more reasonable. |
@julianz- I can reliably reproduce both errors, but this is in the context of proprietary code so it may take me a while to come up with code to reproduce that I can share. That said, the conditions seem pretty simple so maybe that will be easy The write error occurs when a request is made by a process that is consuming the response in chunks, and the caller (consumer) cannot quite keep up with the speed of the response. I'm using a file size of 170MB and I'm guessing there is some initial buffer that can absorb the consumer process speed variability up to a point, after which it has to throttle the SSL connection which leads to the WantWrite errors. The read error occurs with a 170MB file that is POSTed by the client and then attempted to be read in chunks via Both of the above are running all processes (client and server) on the same machine, and occur under older and newer Macs, Intel and Silicon. And in all cases, the issues all go away (and no exception handling is required at all) when using 'builtin' instead of 'pyopenssl'. |
In the changes I made, that line became moot: either the loop is short-circuited via
I'm not following your logic-- in my code, the retry is achieved by simply executing
All the more reason for delay and retry parameters to be configurable |
Stepping back further: why bother supporting pyopenssl at all? Wouldn't it be both more stable, and less maintenance, to eliminate that altogether and only support 'builtin'? |
Maybe so. I haven't tried the builtin module actually so I should give that a try. What was the reason pyOpenSSL.py was added as an alternative? @webknjaz mentioned earlier in this thread he saw some problems with the builtin module - he said it failed some tlsfuzzer tests but I don't know enough about that to know its significance. |
@julianz- I'm not sure but it's not surprising that it was supported in earlier versions given that back in Py2 days the built-in SSL module didn't work well, and pyopenssl was more popular (and possibly in many cases the only working choice) |
It was released 16 hours ago FYI. |
Need v42.0.0 or later of Cryptography as this restored the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER constant which is required for possible fix per cherrypy/cheroot#245
IIRC over a decade ago, there was no good |
# This is the 1st commit message: Updated SSL.py to fix problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors. When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion. # This is the commit message pyca#2: fixed format for flake8/black # This is the commit message pyca#3: E721 errors raised by flake8
parent 7f3e4f9 author julianz- <julianrbrown@gmail.com> 1692386750 -0700 committer julianz- <julianrbrown@gmail.com> 1706310924 -0800 # This is a combination of 3 commits. # This is the 1st commit message: Updated SSL.py to fix problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors. When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion. # This is the commit message pyca#2: fixed format for flake8/black # This is the commit message pyca#3: E721 errors raised by flake8 # This is the commit message pyca#5: Update setup.py Need v42.0.0 or later of Cryptography as this restored the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER constant which is required for possible fix per cherrypy/cheroot#245 # This is the commit message pyca#7: resolved conflicts
author julianz- <julianrbrown@gmail.com> 1692386750 -0700 committer julianz- <julianrbrown@gmail.com> 1706310924 -0800 Updated SSL.py to fix problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors. When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion.
author julianz- <julianrbrown@gmail.com> 1692386750 -0700 committer julianz- <julianrbrown@gmail.com> 1706310924 -0800 fix for problem caused by SSL_WANT_READ or SSL_WANT_WRITE errors. When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion.
When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion.
When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion.
When SSL_WANT_READ or SSL_WANT_WRITE are encountered, it's typical to retry the call but this must be repeated with the exact same arguments. Without this change, openSSL requires that the address of the buffer passed is the same. However, buffers in python can change location in some circumstances which cause the retry to fail. By add the setting SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER, the requirement for the same buffer address is forgiven and the retry has a better chance of success. See cherrypy/cheroot#245 for discussion.
β I'm submitting a ...
π Describe the bug. What is the current behavior?
This issue is reproducible against huge post request like multiple lines of textarea in form.
We are upgrading from python 2.7 to 3.7 and hence also upgraded the cheroot 8.2.0.
Cherrypy : 18+
π Details
π Environment
π Additional context
I have went through this #113
but still was unable to figure out the solution
The text was updated successfully, but these errors were encountered: