Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where readSource may not be resumed #599

Merged
merged 2 commits into from
Feb 24, 2018

Conversation

jdeff
Copy link
Contributor

@jdeff jdeff commented Feb 23, 2018

Issue

When using GCDAsyncSocket, we have noticed that our reads can sometimes
hangs before all of the data is read. The end result is that our read
times out and the connection is closed. This issue seems to be similar
to ones that others have reported (e.g. #89, #185, #225, and possibly #456)

The problem seems to be that doReadData can get into a state where the
method finishes, we have not read all of the bytes for the current read,
but we do not resume the readSource, preventing doReadData from
being triggered again.

Example

Here is a confusing but valiant attempt to explain the issue: When we
enter doReadData we calculate the estimatedBytesAvailable. This
value includes how much data we think is available on the socket plus
how much data is available on some of our internal buffers. If that
number is greater than 0, we will initialize a waiting flag to false.
We then proceed to read from the socket, since there are bytes
available. This is where it gets tricky. We calculate the number of
bytesToRead from the socket by taking the minimum of A.) The
estimatedBytesAvailable plus 16 kb or B.) The remaining bytes of the
current read. The trickiness comes from the fact that when B is larger
than A and the socket has the extra 16kb all ready to be read, then
the call to SSLRead will return with no error. This is problematic
because we rely on a result of errSSLWouldBlock in order to toggle the
waiting flag to true. Finally, we reach the end of the
doReadData method and we have not completed the current read and
waiting is not set to true. Because waiting is false, we do not
attempt to resume the readSource. Since the readSource may be in a
suspended state, there is nothing to trigger another call to
doReadData. This results in what looks like a "hung" or "frozen" state
from the client. Even though data is flowing to our socket, we never
finish reading it out.

This fix in this commit, as simple as it is, just sets waiting to true
in the case of "read to length" and "read to term" where we
are expecting to see more data.

Alternatives

Another possible fix could be to pass the remaining bytes of the current read to
SSLRead. This would then trigger the errSSLWouldBlock result. The issue with this is that I
don't fully understand the logic around
optimalReadLengthWithDefault:shouldPreBuffer:. It may be more
desirable to not trigger that error and just wait until the readSource
says that there is more available from the file descriptor.

Another possible fix would be in how we initially set waiting.
Instead of setting it to !hasAvailableBytes, you could change it to
`(estimatedBytesAvailable + [preBuffer availableBytes]) < (currentRead->readLength -
currentRead->bytesRead). I am not sure if this is desireable and I
would need some input from the maintainers.

When using GCDAsyncSocket, we have noticed that our reads can sometimes
hange before all of the data is read. The end result is that our read
times out and the connection is closed. This issue seems to be similar
to ones that others have reported (e.g. robbiehanson#89, robbiehanson#185, robbiehanson#225, and possibly

The problem seems to be that `doReadData` can get into a state where the
method finishes, we have not read all of the bytes for the current read,
but we do not resume the `readSource`, preventing `doReadData` from
being triggered again.

Here is a confusing but valient attempt to explain the issue: When we
enter `doReadData` we calculate the `estimatedBytesAvailable`. This
value includes how much data we think is available on the socket plus
how much data is available on some of our internal buffers. If that
number is greater than 0, we will initialize a `waiting` flag to false.
We then proceed to read from the socket, since there are bytes
available.  This is where it gets tricky. We calculate the number of
`bytesToRead` from the socket by taking the minimum of A.) The
`estimatedBytesAvailable` plus 16 kb or B.) The remaining bytes of the
current read. The trickiness comes from the fact that when `B` is larger
than `A` and the socket has the extra 16kb all ready to be read, then
the call to `SSLRead` will return with no error. This is problematic
because we rely on a result of `errSSLWouldBlock` in order to toggle the
`waiting` flag to true. Finally, when we reach the end of the
`doReadData` method and we have not completed the current read and
`waiting` is not set to true.  Because `waiting` is false, we do not
attempt to resume the readSource.  Since the `readSource` may be in a
suspended state, there is nothing to trigger another call to
`doReadData`. This results in what looks like a "hung" or "frozen" state
from the client. Even though data is flowing to our socket, we never
finish reading it out.

This fix in this commit, as simple as it is, just sets `waiting` to true
in the case of the "read to length" and "read to term" cases where we
are expecting to see more data.

Another possible fix could be pass the remaining bytes to read to
SSLRead to trigger the `errSSLWouldBlock`. The issue with this is that I
don't fully understand the logic around
`optimalReadLengthWithDefault:shouldPreBuffer:`. It may be more
desirable to not trigger that error and just wait until the readSource
says that there is more available from the file descriptor.

Another possible fix would be in how we initially set `waiting`.
Instead of setting it to `!hasAvailableBytes`, you could change it to
`(estimatedBytesAvailable + [preBuffer availableBytes]) < (currentRead->readLength -
currentRead->bytesRead). I am not sure if this is desireable and I
would neet some input from the maintainers.
@chrisballinger
Copy link
Collaborator

Thanks for the investigation! Is it possible to reproduce this via a failing test case?

@jdeff
Copy link
Contributor Author

jdeff commented Feb 23, 2018

@chrisballinger I can give it a shot. We've been able to reproduce this scenario pretty frequently in our iOS app. I'll take a look at the other tests and see if I can come up with something.

Added GCDAsyncSocketReadTests with a single test that reproduces the
issue. Being async by design, the test itself is hard to write. I
created a couple of wrappers to provide a synchronous API that cleaned
up the test. We could potentially use the wrappers for other tests in
the future, if we wanted.

Since this issue requires TLS to reproduce, I've added a p12 file
containing a test certificate / private key pair. We can read that in to
simiulate an actual SSL server, and just choose not to validate it on
the client end.

To see the test fail, you can comment out the "waiting = YES;" on line
5500 of GCDAsyncSocket.m.

Tested this on the iPhone 8+ (11.2) simulator.
@jdeff
Copy link
Contributor Author

jdeff commented Feb 24, 2018

@chrisballinger I put up the test. Let me know what you think. The test is currently only added to the project in Tests/iOS.

If you comment out line 5500 in GCDAsyncSocket.m then you can see it fail. Here is the output of a failed run on my machine:

Test Case '-[CocoaAsyncSocketTestsiOS.GCDAsyncSocketReadTests test_whenBytesAvailableIsLessThanReadLength_readDoesNotTimeout]' started.
/Users/jdeff/Source/Personal/CocoaAsyncSocket/Tests/Shared/GCDAsyncSocketReadTests.swift:20: error: -[CocoaAsyncSocketTestsiOS.GCDAsyncSocketReadTests test_whenBytesAvailableIsLessThanReadLength_readDoesNotTimeout] : failed - Socket was disconnected
<unknown>:0: error: -[CocoaAsyncSocketTestsiOS.GCDAsyncSocketReadTests test_whenBytesAvailableIsLessThanReadLength_readDoesNotTimeout] : Asynchronous wait failed: Exceeded timeout of 0.5 seconds, with unfulfilled expectations: "Read data".
/Users/jdeff/Source/Personal/CocoaAsyncSocket/Tests/Shared/GCDAsyncSocketReadTests.swift:26: error: -[CocoaAsyncSocketTestsiOS.GCDAsyncSocketReadTests test_whenBytesAvailableIsLessThanReadLength_readDoesNotTimeout] : XCTAssertEqual failed: ("0") is not equal to ("102400") - 
Test Case '-[CocoaAsyncSocketTestsiOS.GCDAsyncSocketReadTests test_whenBytesAvailableIsLessThanReadLength_readDoesNotTimeout]' failed (0.603 seconds).

@jdeff
Copy link
Contributor Author

jdeff commented Feb 24, 2018

Looks like the test passed on Travis:

GCDAsyncSocketReadTests
    ✓ test_whenBytesAvailableIsLessThanReadLength_readDoesNotTimeout (0.038 seconds)

@chrisballinger
Copy link
Collaborator

Wow awesome work! That test scaffolding will definitely be helpful going forward as well.

@chrisballinger chrisballinger merged commit f7df4b9 into robbiehanson:master Feb 24, 2018
@chrisballinger
Copy link
Collaborator

@jdeff Please take a look at #600 and see if you can resolve these the test failures. Fortunately the failure isn't related to your waiting = YES patch itself, just the tests you added.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants