Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix EINPROGRESS very rarely occurring on synchronous php_network_connect_socket() #13252

Open
wants to merge 1 commit into
base: PHP-8.2
Choose a base branch
from

Conversation

mauritsd
Copy link

php_network_connect_socket() in blocking mode does a select() to block on a connect() call that reported EINPROGRESS. However, it then checks for select() errors using getsockopt(..., SO_ERROR, ...), which will return the earlier EINPROGRESS since it wasn't cleared by php_socket_errno(). I believe this is the cause of phpredis/phpredis#1881 and possibly other issues.

Note that, since the issue only happens very sporadically, I'm unable to write a test that reproduces it, or to test that my fix works in practice. Also new to contributing to PHP, so apologies in advance if I've done something wrong. :-)

@devnexen
Copy link
Member

Thanks for your PR. However, PR needs to be from the lowest release branch when the fix apply, thus here PHP-8.2, PHP-8.0 is unsupported and PHP-8.1 is only for careful security fixes.

@mauritsd mauritsd changed the base branch from PHP-8.0 to PHP-8.2 January 26, 2024 15:54
@mauritsd
Copy link
Author

Thanks, I think I've updated it to now be against the correct base.

@bukka
Copy link
Member

bukka commented Feb 10, 2024

So this was quite tricky to recreate actually. It's just not easy to recreate EINPROGRESS on TCP. I initially thought that I could just add sleep between listen and accept on the server but that's not how it works. The connect returns once the 3 way TCP handshake is done and accepting is irrelevant. So I had to find another way which is delaying SYN-ACK packet which I had to do using iptable and tc. I wrote a testing C client and server and once I recreated it in C, I did the same in PHP. You can see all my test files for this here: bukka/php-util@5fdc4df .

Basically I wanted to confirm what I thought from the code which is that this is not correct change. You are right that errno is not cleared but getsockopt does not care about errno - it just returns the actual socket error which is 0 after the poll. I verified that this is exactly what is happening (specifically added a printf to the logic which showed that). So this PR does not seem correct or at least I can't think how this could resolve the issue. Have you actually deployed this to production and did it fix your issue for you?

The only other thing that came to my mind is that the errno would be checked but I did a bit of search and don't see any obvious place where it could be problematic. Might need to search more but guess we can reset it - you can test it with errno = 0. Otherwise I might need to investigate the Redis issue more. Maybe if you could test it with the SYN-ACK delay, that would be great.

@mauritsd
Copy link
Author

Ah, that's unfortunate that it doesn't seem to fix it. I haven't been able to verify that my fix works for us in production since the issue both crops up quite infrequently (in bursts every 3 weeks or so) and we don't compile our own PHP, so pushing this to (one of) our workers would be quite difficult to do.

Thanks for figuring out a way to reproduce the EINPROGRESS! One thing I noticed in your test code is that you (correctly, I think) test fds[0].revents & POLLOUT, whereas the PHP implementation does not. While you only poll for POLLOUT, the PHP implementation actually polls PHP_POLLREADABLE|POLLOUT. Could it be that the return value of php_pollfd_for is one of the bits in PHP_POLLREADABLE and not POLLOUT, and it simply reports EINPROGRESS again because the connection is still in progress? It should definitely test for & POLLOUT in the PHP implementation, right?

Regardless, I will see if I can reproduce it now on a test setup with PHPRedis and your SYN+ACK delay. Thanks for the work you've put in so far!

@bukka
Copy link
Member

bukka commented Feb 17, 2024

Thanks for figuring out a way to reproduce the EINPROGRESS! One thing I noticed in your test code is that you (correctly, I think) test fds[0].revents & POLLOUT, whereas the PHP implementation does not. While you only poll for POLLOUT, the PHP implementation actually polls PHP_POLLREADABLE|POLLOUT. Could it be that the return value of php_pollfd_for is one of the bits in PHP_POLLREADABLE and not POLLOUT, and it simply reports EINPROGRESS again because the connection is still in progress? It should definitely test for & POLLOUT in the PHP implementation, right?

Think it's mainly to catch error case but I wouldn't think that EINPROGRESS should be ever such case because it should timeout in such case. But might be wrong so it would be good to test it. I will try to do a bit more research / testing and see.

@georaldc
Copy link

Could this be what causes intermittent "Operation Now In Progress" errors with various PHP functions that make network calls? For a long time now, I've seen it occur with FTP functions, stream_socket_client(), PDO trying to connect to a MySQL server, and CURL (specifically using Guzzle, that would use PHP curl) and while it is difficult to replicate, it seems to occur more often during high load on an affected server. One example we used to have was a server that ran a letsencrypt cron job every night to check for certificate renewals. That specific server had a bunch of configurations that pointed to domains that were no longer active, and I believe this caused letsencrypt's certbot to take longer than normal to complete its renewal process. It's also during this period where we would almost always see an uptick in "Operation Now In Progress" errors from various processes that could have been doing any of the actions I just described.

@jeffdafoe
Copy link

I've become very interested in this as we now see both PHPRedis and PRedis encountering EINPROGRESS, and PHPRedis uses a non-blocking socket.

My question is when php_network_connect_socket is called with asynchronous=0, can the caller assume that EINPROGRESS will not be returned, since that error is specific to nonblocking connects?

I ask this since most callers may think that asynchronous = 0 means "blocking socket". I believe there's a code path where php_network_connect_socket receives EINPROGRESS on the connect, however the socket finishes connecting and is ready to read during the polling. The error variable never gets cleared so callers who are expecting blocking operations receive an unexpected EINPROGRESS return.

@bukka
Copy link
Member

bukka commented Oct 25, 2024

So first of all there's non blocking switch as I see you noticed in that linked Redis issue that I went through as well. When going through it I came across redis/hiredis#1213 which showed the interrupted poll (errno == EINTR) can stop the poll and return initial error. After a while of looking to the stream (PHP network) code I realised that this is what is probably happening. If such situation happens, it goes to the else condition where it set err to -1 but it then uses the old error which is set to EINPROGRESS . I put together an initial (untested) code change that might potentially fix it: #16606 . I will do more testing next week and will see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants