-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tls-eio: allow cancelling reads #459
Conversation
- The mock_socket has changed slightly. Previously, it would request a size to read, do the read, and then wait for another size for the next read, which would typically just be the EOF for that request. This made it harder for the fuzzer to reach useful states. Now you can do a single transmit with a single test action. - Simplified the write in Tls_eio slightly by using the new `Flow.write` operation. - Only log a send when we actually send some data. - Allow the fuzzer to perform all the actions needed for the Tls handshake in one go instead of requiring it to find them by searching. This provides it a way to get to the interesting cases quickly.
Previously, Tls would treat any exception from a read as a permanent error, causing all future operations to fail. However, it is reasonable to cancel a read and then try again later. Note that the Lwt version appears to have the same problem. Possibly we should only be catching `Eio.Io` exceptions in the first place, or maybe none at all. I also extended the fuzz tests to check for this case.
afl-fuzz finds that this can fail, because a TLS read operation may need to perform a write. So when the application cancels a TLS read, it may actually end up cancelled a write, which we can't recover from. |
| exn -> | ||
(match t.state with | ||
| `Error _ | `Eof -> () | ||
| `Active _ -> t.state <- `Error exn) ; | ||
raise exn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to completely remove this bit instead? I believe this is trapping other exceptions in addition to Cancelled
, such as 'Poisioned' and intermittently this https://github.com/ocaml/ocaml/blob/trunk/stdlib/effect.ml#L59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the idea is that the flow should stop working on any unexpected error. Otherwise, you might fail to read a packet, retry, and then hang (because the sender thinks you already have the data). Possibly we should call Printexc.get_raw_backtrace
here and store that in t.state
too. Then you'd know where the original error came from.
But note that the original exception is still raised, with the original backtrace, the first time. So the first error you get should include the correct location in all cases. It's just if you keep trying to use it after an error then future errors don't give the exact location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But note that the original exception is still raised, with the original backtrace, the first time. So the first error you get should include the correct location in all cases. It's just if you keep trying to use it after an error then future errors don't give the exact location.
Indeed, that's what I expected. However, I was getting other exceptions being reported - as mentioned in the comment above - as being the cause of error. As a user of tls-eio
, that led me to unproductive debugging sessions since the actual error was quite something else. I have a feeling that the interaction of Eio.Fiber.any/first
, tls_eio exception handling here and the like is giving unpredictable exceptions. I was only able to move forward with identifying the correct issue after I removed this catch all exception handling. Removing the catch all exception handling also don't seem to impact any observable tls-eio behaviours, i.e. the tests pass and End_of_file exception is correctly reported to users of tls-eio.
Closing, as we decided not to do this (#464 (comment)):
|
Previously, Tls would treat any exception from a read as a permanent error, causing all future operations to fail. However, it is reasonable to cancel a read and then try again later. I extended the fuzz tests to check for this case.
Note that the Lwt version appears to have the same problem. Possibly we should only be catching
Eio.Io
exceptions in the first place, or maybe none at all.The first commit cleans up the fuzz testing a bit (see the commit comment for details) and the second one adds support for cancellation.
One place I'm not sure about is when
drain_handshake
is called fromreneg
. Is it OK to abort there? I don't know what the function does and it's already marked withXXX
. In the normal case (whendrain_handshake
is called during flow creation) it's safe because the caller never gets theTls_eio.t
.I'm not sure whether we really want to encourage users to cancel reads and resume later, but this should at least fix the confusing
Cancelled: Eio__core__Fiber.Not_first
errors reported by @bikallem in #458.