-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_socket failing in solaris #73455
Comments
I started looking into this failure to see if I could figure out why but it looks like I'd have to spend more time than I have available to figure out the cause. Environment/setup:
When I run test_socket I see the following 4 failures; please note, I'm hand typing the results so I may typo something: ERROR: testCount (test.test_socket.SendfileUsingSendfileTest) Error: testCount (test.test_socket.SendfileUsingSendfileTest) Error: testWithTimeout (test.test_socket.SendfileUsingSendfileTest) Error: testWithTimeout (test.test_socket.SendfileUsingSendfileTest) Error: testCountWithOffset (test.test_socket.SendfileUsingSendfileTest) Ran 539 tests in 69.166s FAILED (failures=1, errors=4, skipped=324) |
I'm having the same problem with gcc 6.2. Is there any update? |
Getting the same test_socket errors on Solaris 11 with Python 3.5.3. ====================================================================== Traceback (most recent call last):
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5204, in testCount
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5107, in recv_data
MemoryError ====================================================================== Traceback (most recent call last):
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 266, in _tearDown
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 278, in clientRun
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5197, in _testCount
File "/usr/local/src/Python-3.5.3/Lib/socket.py", line 286, in _sendfile_use_sendfile
raise _socket.timeout('timed out')
socket.timeout: timed out ====================================================================== Traceback (most recent call last):
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5274, in testWithTimeout
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5107, in recv_data
MemoryError ====================================================================== Traceback (most recent call last):
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 266, in _tearDown
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 278, in clientRun
File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5269, in _testWithTimeout
File "/usr/local/src/Python-3.5.3/Lib/socket.py", line 286, in _sendfile_use_sendfile
raise _socket.timeout('timed out')
socket.timeout: timed out Ran 530 tests in 54.577s FAILED (errors=4, skipped=315) |
Since it seems like Solaris is dying, I'm not sure that it still makes sense to fix Python issues specific to Solaris. Here, I don't understand the issue, no patch is proposed and I'm not really interested to investigate :-/ |
Well here we are in 2020 and Solaris systems are still running just fine. In fact, some big Fujitsu SPARC systems are running in production for years and years and also, no surprise, this test still fails horrifically on old stable Solaris 10. Python is turning into a piece of supposedly open source software with many commercial interests with their hands inside of it. I am not sure how to get this bug fixed but I can certainly report that it is still broken in 3.7.8 on a very stable and reliable platform. |
Solaris will be around for at least another 10-15 years. The least you could do is look into it and offer some speculations. |
What do you expect us to do? No Python core dev has access to a Solaris machine. We cannot debug the issue and have to rely on external contributions. We have not declared Solaris as unsupported yet because people are still contributing fixes. If you are looking for wild speculations: I guess Solari' sendfile() is either broken or does not behave like on other platforms. |
Someone has to write a fix. You may contact Solaris vendor or a company using Solaris who wants to pay a developer to write a fix. |
Christian, you did exactly what I needed. Thank you. I don't have the means to do a git bisect to find where it broke. It wasn't a problem around 3.3 timeframe and I'm not sure when this sendfile stuff was implemented. The man page for sendfile says "The sendfile() function does not modify the current file pointer of in_fd, (...)". In other words the read pointer for the input descriptor won't be advanced. They expect you to use it like this: offset = 0;
do {
ret = sendfile(in, out, &offset, len);
} while( ret < 0 && (errno == EAGAIN || errno == EINTR) ); ... though making that change in posixmodule.c would break this test severely since the send & receive code is running on the same thread. In posixmodule.c I don't see anything that attempts to return the number of bytes successfully sent. Since the input file descriptor won't have its read pointer advanced, the variable "offset" must be set to the correct offset value, otherwise it just keeps reading the first 32k of the file that was generated for the test. |
I accidentally hit submit too early. I tried changing the code in posixmodule.c to use lseek(), something like the following: offset = lseek( in, 0, SEEK_CUR ); do { ... however, in addition to readfile not advancing the file pointer it also doesn't seem to cause an EOF condition. In my first attempt at the above I was doing this after the loop: lseek( in, offset, SEEK_CUR ); ... and it just kept advancing the file pointer well beyond the end of the file and sendfile() had absolutely no qualms about reading beyond the end of the file. I even tried adding a read() after the 2nd lseek to see if I could force an EOF condition but that didn't do it. |
I tried reproducing this with Python 3.5.3 on the current Oracle Solaris 11, and I couldn't do it. Since this report, there has been a sendfile related fix (#85853, #22040 #22128), which fixed some issues, but the traces there looked different. But since then, we haven't seen any sendfile issues on either Oracle Solaris 11 or OpenSolaris forks (AFAIK; others helped me test the fix with SmartOS and OpenIndiana). I am wondering whether this might be related to the test machine setup? The fix went into Python 3.9 and newer so you can try those. |
Python 3.5 and 3.9 no longer accept bugfixes: https://devguide.python.org/versions/ If someone can reproduce the issue on recent Python versions, please open a new issue (with a link to this one). I close this issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: