-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue where readSource may not be resumed #599
Conversation
When using GCDAsyncSocket, we have noticed that our reads can sometimes hange before all of the data is read. The end result is that our read times out and the connection is closed. This issue seems to be similar to ones that others have reported (e.g. robbiehanson#89, robbiehanson#185, robbiehanson#225, and possibly The problem seems to be that `doReadData` can get into a state where the method finishes, we have not read all of the bytes for the current read, but we do not resume the `readSource`, preventing `doReadData` from being triggered again. Here is a confusing but valient attempt to explain the issue: When we enter `doReadData` we calculate the `estimatedBytesAvailable`. This value includes how much data we think is available on the socket plus how much data is available on some of our internal buffers. If that number is greater than 0, we will initialize a `waiting` flag to false. We then proceed to read from the socket, since there are bytes available. This is where it gets tricky. We calculate the number of `bytesToRead` from the socket by taking the minimum of A.) The `estimatedBytesAvailable` plus 16 kb or B.) The remaining bytes of the current read. The trickiness comes from the fact that when `B` is larger than `A` and the socket has the extra 16kb all ready to be read, then the call to `SSLRead` will return with no error. This is problematic because we rely on a result of `errSSLWouldBlock` in order to toggle the `waiting` flag to true. Finally, when we reach the end of the `doReadData` method and we have not completed the current read and `waiting` is not set to true. Because `waiting` is false, we do not attempt to resume the readSource. Since the `readSource` may be in a suspended state, there is nothing to trigger another call to `doReadData`. This results in what looks like a "hung" or "frozen" state from the client. Even though data is flowing to our socket, we never finish reading it out. This fix in this commit, as simple as it is, just sets `waiting` to true in the case of the "read to length" and "read to term" cases where we are expecting to see more data. Another possible fix could be pass the remaining bytes to read to SSLRead to trigger the `errSSLWouldBlock`. The issue with this is that I don't fully understand the logic around `optimalReadLengthWithDefault:shouldPreBuffer:`. It may be more desirable to not trigger that error and just wait until the readSource says that there is more available from the file descriptor. Another possible fix would be in how we initially set `waiting`. Instead of setting it to `!hasAvailableBytes`, you could change it to `(estimatedBytesAvailable + [preBuffer availableBytes]) < (currentRead->readLength - currentRead->bytesRead). I am not sure if this is desireable and I would neet some input from the maintainers.
Thanks for the investigation! Is it possible to reproduce this via a failing test case? |
@chrisballinger I can give it a shot. We've been able to reproduce this scenario pretty frequently in our iOS app. I'll take a look at the other tests and see if I can come up with something. |
Added GCDAsyncSocketReadTests with a single test that reproduces the issue. Being async by design, the test itself is hard to write. I created a couple of wrappers to provide a synchronous API that cleaned up the test. We could potentially use the wrappers for other tests in the future, if we wanted. Since this issue requires TLS to reproduce, I've added a p12 file containing a test certificate / private key pair. We can read that in to simiulate an actual SSL server, and just choose not to validate it on the client end. To see the test fail, you can comment out the "waiting = YES;" on line 5500 of GCDAsyncSocket.m. Tested this on the iPhone 8+ (11.2) simulator.
@chrisballinger I put up the test. Let me know what you think. The test is currently only added to the project in Tests/iOS. If you comment out line 5500 in GCDAsyncSocket.m then you can see it fail. Here is the output of a failed run on my machine:
|
Looks like the test passed on Travis:
|
Wow awesome work! That test scaffolding will definitely be helpful going forward as well. |
Issue
When using GCDAsyncSocket, we have noticed that our reads can sometimes
hangs before all of the data is read. The end result is that our read
times out and the connection is closed. This issue seems to be similar
to ones that others have reported (e.g. #89, #185, #225, and possibly #456)
The problem seems to be that
doReadData
can get into a state where themethod finishes, we have not read all of the bytes for the current read,
but we do not resume the
readSource
, preventingdoReadData
frombeing triggered again.
Example
Here is a confusing but valiant attempt to explain the issue: When we
enter
doReadData
we calculate theestimatedBytesAvailable
. Thisvalue includes how much data we think is available on the socket plus
how much data is available on some of our internal buffers. If that
number is greater than 0, we will initialize a
waiting
flag to false.We then proceed to read from the socket, since there are bytes
available. This is where it gets tricky. We calculate the number of
bytesToRead
from the socket by taking the minimum of A.) TheestimatedBytesAvailable
plus 16 kb or B.) The remaining bytes of thecurrent read. The trickiness comes from the fact that when
B
is largerthan
A
and the socket has the extra 16kb all ready to be read, thenthe call to
SSLRead
will return with no error. This is problematicbecause we rely on a result of
errSSLWouldBlock
in order to toggle thewaiting
flag to true. Finally, we reach the end of thedoReadData
method and we have not completed the current read andwaiting
is not set to true. Becausewaiting
is false, we do notattempt to resume the readSource. Since the
readSource
may be in asuspended state, there is nothing to trigger another call to
doReadData
. This results in what looks like a "hung" or "frozen" statefrom the client. Even though data is flowing to our socket, we never
finish reading it out.
This fix in this commit, as simple as it is, just sets
waiting
to truein the case of "read to length" and "read to term" where we
are expecting to see more data.
Alternatives
Another possible fix could be to pass the remaining bytes of the current read to
SSLRead
. This would then trigger theerrSSLWouldBlock
result. The issue with this is that Idon't fully understand the logic around
optimalReadLengthWithDefault:shouldPreBuffer:
. It may be moredesirable to not trigger that error and just wait until the readSource
says that there is more available from the file descriptor.
Another possible fix would be in how we initially set
waiting
.Instead of setting it to
!hasAvailableBytes
, you could change it to`(estimatedBytesAvailable + [preBuffer availableBytes]) < (currentRead->readLength -
currentRead->bytesRead). I am not sure if this is desireable and I
would need some input from the maintainers.