Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_socket failing in solaris #73455

Closed
phantal mannequin opened this issue Jan 13, 2017 · 12 comments
Closed

test_socket failing in solaris #73455

phantal mannequin opened this issue Jan 13, 2017 · 12 comments
Labels
3.7 (EOL) end of life tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@phantal
Copy link
Mannequin

phantal mannequin commented Jan 13, 2017

BPO 29269
Nosy @tiran, @blastwave

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-01-13.20:55:24.427>
labels = ['3.7', 'type-bug', 'tests']
title = 'test_socket failing in solaris'
updated_at = <Date 2020-08-04.16:01:56.392>
user = 'https://bugs.python.org/phantal'

bugs.python.org fields:

activity = <Date 2020-08-04.16:01:56.392>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Tests']
creation = <Date 2017-01-13.20:55:24.427>
creator = 'phantal'
dependencies = []
files = []
hgrepos = []
issue_num = 29269
keywords = []
message_count = 10.0
messages = ['285440', '296475', '296546', '301914', '374660', '374723', '374754', '374767', '374790', '374794']
nosy_count = 5.0
nosy_names = ['christian.heimes', 'petriborg', 'blastwave', 'phantal', 'Jim Crigler']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue29269'
versions = ['Python 3.7']

@phantal
Copy link
Mannequin Author

phantal mannequin commented Jan 13, 2017

I started looking into this failure to see if I could figure out why but it looks like I'd have to spend more time than I have available to figure out the cause.

Environment/setup:

  • air-gapped network (no internet access)
  • sparc / Solaris 10
  • Built with gcc 6.3.0
  • Altered configure script to change -std=c99 to -std=gnu99 (see bpo-29264)
  • The only configure flags used were --prefix and --with-universal-archs=all

When I run test_socket I see the following 4 failures; please note, I'm hand typing the results so I may typo something:

ERROR: testCount (test.test_socket.SendfileUsingSendfileTest)
Traceback:
File "(...)/test_socket.py", line 5248, in testCount
File "(...)/test_socket.py", line 5151, in recv_data
MemoryError

Error: testCount (test.test_socket.SendfileUsingSendfileTest)
Traceback:
File "(...)/test_socket.py", line 277, in _tearDown
File "(...)/test_socket.py", line 289, in clientRun
File "(...)/test_socket.py", line 5241, in _testCount
File "(...)/Lib/socket.py", line 296, in _sendfile_use_sendfile
socket.timeout: timed out

Error: testWithTimeout (test.test_socket.SendfileUsingSendfileTest)
Traceback:
File "(...)/test_socket.py", line 5318, in testWithTimeout
data = self.recv_data(conn)
File "(...)/test_socket.py", line 5151, in recv_data
chunk = conn.recv(self.BUFSIZE)
MemoryError

Error: testWithTimeout (test.test_socket.SendfileUsingSendfileTest)
Traceback:
File "(...)/test_socket.py", line 277, in _tearDown
raise exc
File "(...)/test_socket.py", line 289, in clientRun
test_func()
File "(...)/test_socket.py", line 5313, in _testWithTimeout
sent = meth(file)
File "(...)/Lib/socket.py", line 296, in _sendfile_use_sendfile
socket.timeout: timed out

Error: testCountWithOffset (test.test_socket.SendfileUsingSendfileTest)
Traceback:
File "(...)/test_socket.py", line 5287, in testCountWithOffset
self.assertEqual(len(data), count)
AssertionError: 4376231 != 100007

Ran 539 tests in 69.166s

FAILED (failures=1, errors=4, skipped=324)
test test_socket failed

@phantal phantal mannequin added tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Jan 13, 2017
@JimCrigler
Copy link
Mannequin

JimCrigler mannequin commented Jun 20, 2017

I'm having the same problem with gcc 6.2.

Is there any update?

@petriborg
Copy link
Mannequin

petriborg mannequin commented Jun 21, 2017

Getting the same test_socket errors on Solaris 11 with Python 3.5.3.

======================================================================
ERROR: testCount (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5204, in testCount
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5107, in recv_data
MemoryError

======================================================================
ERROR: testCount (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 266, in _tearDown
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 278, in clientRun
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5197, in _testCount
  File "/usr/local/src/Python-3.5.3/Lib/socket.py", line 286, in _sendfile_use_sendfile
    raise _socket.timeout('timed out')
socket.timeout: timed out

======================================================================
ERROR: testWithTimeout (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5274, in testWithTimeout
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5107, in recv_data
MemoryError

======================================================================
ERROR: testWithTimeout (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 266, in _tearDown
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 278, in clientRun
  File "/usr/local/src/Python-3.5.3/Lib/test/test_socket.py", line 5269, in _testWithTimeout
  File "/usr/local/src/Python-3.5.3/Lib/socket.py", line 286, in _sendfile_use_sendfile
    raise _socket.timeout('timed out')
socket.timeout: timed out

Ran 530 tests in 54.577s

FAILED (errors=4, skipped=315)
test test_socket failed
1 test failed again:
test_socket

@vstinner
Copy link
Member

Since it seems like Solaris is dying, I'm not sure that it still makes sense to fix Python issues specific to Solaris. Here, I don't understand the issue, no patch is proposed and I'm not really interested to investigate :-/

@blastwave
Copy link
Mannequin

blastwave mannequin commented Aug 1, 2020

Well here we are in 2020 and Solaris systems are still running just fine. In fact, some big Fujitsu SPARC systems are running in production for years and years and also, no surprise, this test still fails horrifically on old stable Solaris 10. Python is turning into a piece of supposedly open source software with many commercial interests with their hands inside of it. I am not sure how to get this bug fixed but I can certainly report that it is still broken in 3.7.8 on a very stable and reliable platform.

@blastwave blastwave mannequin added the 3.7 (EOL) end of life label Aug 1, 2020
@phantal
Copy link
Mannequin Author

phantal mannequin commented Aug 3, 2020

Solaris will be around for at least another 10-15 years.

The least you could do is look into it and offer some speculations.

@tiran
Copy link
Member

tiran commented Aug 3, 2020

What do you expect us to do? No Python core dev has access to a Solaris machine. We cannot debug the issue and have to rely on external contributions. We have not declared Solaris as unsupported yet because people are still contributing fixes.

If you are looking for wild speculations: I guess Solari' sendfile() is either broken or does not behave like on other platforms.

@vstinner
Copy link
Member

vstinner commented Aug 3, 2020

I am not sure how to get this bug fixed (...)

Someone has to write a fix. You may contact Solaris vendor or a company using Solaris who wants to pay a developer to write a fix.

@phantal
Copy link
Mannequin Author

phantal mannequin commented Aug 4, 2020

Christian, you did exactly what I needed. Thank you.

I don't have the means to do a git bisect to find where it broke. It wasn't a problem around 3.3 timeframe and I'm not sure when this sendfile stuff was implemented.

The man page for sendfile says "The sendfile() function does not modify the current file pointer of in_fd, (...)". In other words the read pointer for the input descriptor won't be advanced. They expect you to use it like this:

offset = 0;
do {
  ret = sendfile(in, out, &offset, len);
} while( ret < 0 && (errno == EAGAIN || errno == EINTR) );

... though making that change in posixmodule.c would break this test severely since the send & receive code is running on the same thread.

In posixmodule.c I don't see anything that attempts to return the number of bytes successfully sent. Since the input file descriptor won't have its read pointer advanced, the variable "offset" must be set to the correct offset value, otherwise it just keeps reading the first 32k of the file that was generated for the test.

@phantal
Copy link
Mannequin Author

phantal mannequin commented Aug 4, 2020

I accidentally hit submit too early.

I tried changing the code in posixmodule.c to use lseek(), something like the following:

offset = lseek( in, 0, SEEK_CUR );

do {
ret = sendfile(...);
} while( ... );
lseek( in, offset, SEEK_SET );

... however, in addition to readfile not advancing the file pointer it also doesn't seem to cause an EOF condition. In my first attempt at the above I was doing this after the loop:

lseek( in, offset, SEEK_CUR );

... and it just kept advancing the file pointer well beyond the end of the file and sendfile() had absolutely no qualms about reading beyond the end of the file.

I even tried adding a read() after the 2nd lseek to see if I could force an EOF condition but that didn't do it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@kulikjak
Copy link
Contributor

kulikjak commented Dec 6, 2023

I tried reproducing this with Python 3.5.3 on the current Oracle Solaris 11, and I couldn't do it.

Since this report, there has been a sendfile related fix (#85853, #22040 #22128), which fixed some issues, but the traces there looked different. But since then, we haven't seen any sendfile issues on either Oracle Solaris 11 or OpenSolaris forks (AFAIK; others helped me test the fix with SmartOS and OpenIndiana). I am wondering whether this might be related to the test machine setup?

The fix went into Python 3.9 and newer so you can try those.

@vstinner
Copy link
Member

vstinner commented Dec 6, 2023

Python 3.5 and 3.9 no longer accept bugfixes: https://devguide.python.org/versions/

If someone can reproduce the issue on recent Python versions, please open a new issue (with a link to this one). I close this issue.

@vstinner vstinner closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants