tls-eio: allow cancelling reads #459

talex5 · 2022-12-15T11:30:06Z

Previously, Tls would treat any exception from a read as a permanent error, causing all future operations to fail. However, it is reasonable to cancel a read and then try again later. I extended the fuzz tests to check for this case.

Note that the Lwt version appears to have the same problem. Possibly we should only be catching Eio.Io exceptions in the first place, or maybe none at all.

The first commit cleans up the fuzz testing a bit (see the commit comment for details) and the second one adds support for cancellation.

One place I'm not sure about is when drain_handshake is called from reneg. Is it OK to abort there? I don't know what the function does and it's already marked with XXX. In the normal case (when drain_handshake is called during flow creation) it's safe because the caller never gets the Tls_eio.t.

I'm not sure whether we really want to encourage users to cancel reads and resume later, but this should at least fix the confusing Cancelled: Eio__core__Fiber.Not_first errors reported by @bikallem in #458.

- The mock_socket has changed slightly. Previously, it would request a size to read, do the read, and then wait for another size for the next read, which would typically just be the EOF for that request. This made it harder for the fuzzer to reach useful states. Now you can do a single transmit with a single test action. - Simplified the write in Tls_eio slightly by using the new `Flow.write` operation. - Only log a send when we actually send some data. - Allow the fuzzer to perform all the actions needed for the Tls handshake in one go instead of requiring it to find them by searching. This provides it a way to get to the interesting cases quickly.

Previously, Tls would treat any exception from a read as a permanent error, causing all future operations to fail. However, it is reasonable to cancel a read and then try again later. Note that the Lwt version appears to have the same problem. Possibly we should only be catching `Eio.Io` exceptions in the first place, or maybe none at all. I also extended the fuzz tests to check for this case.

talex5 · 2022-12-15T12:31:10Z

afl-fuzz finds that this can fail, because a TLS read operation may need to perform a write. So when the application cancels a TLS read, it may actually end up cancelled a write, which we can't recover from.

bikallem · 2022-12-16T12:31:42Z

eio/tls_eio.ml

    | exn ->
      (match t.state with
       | `Error _ | `Eof -> ()
       | `Active _ -> t.state <- `Error exn) ;
      raise exn


Wouldn't it make sense to completely remove this bit instead? I believe this is trapping other exceptions in addition to Cancelled, such as 'Poisioned' and intermittently this https://github.com/ocaml/ocaml/blob/trunk/stdlib/effect.ml#L59

I assume the idea is that the flow should stop working on any unexpected error. Otherwise, you might fail to read a packet, retry, and then hang (because the sender thinks you already have the data). Possibly we should call Printexc.get_raw_backtrace here and store that in t.state too. Then you'd know where the original error came from.

But note that the original exception is still raised, with the original backtrace, the first time. So the first error you get should include the correct location in all cases. It's just if you keep trying to use it after an error then future errors don't give the exact location.

But note that the original exception is still raised, with the original backtrace, the first time. So the first error you get should include the correct location in all cases. It's just if you keep trying to use it after an error then future errors don't give the exact location.

Indeed, that's what I expected. However, I was getting other exceptions being reported - as mentioned in the comment above - as being the cause of error. As a user of tls-eio, that led me to unproductive debugging sessions since the actual error was quite something else. I have a feeling that the interaction of Eio.Fiber.any/first, tls_eio exception handling here and the like is giving unpredictable exceptions. I was only able to move forward with identifying the correct issue after I removed this catch all exception handling. Removing the catch all exception handling also don't seem to impact any observable tls-eio behaviours, i.e. the tests pass and End_of_file exception is correctly reported to users of tls-eio.

talex5 · 2023-02-16T12:27:04Z

Closing, as we decided not to do this (#464 (comment)):

I agree, we can stick (at least for now) to "cancelling an operation on a flow cancels that flow". Again, I don't see a concrete use case to allow cancelling a read/write and afterwards using the flow for a different (or the same) purpose.

talex5 added 2 commits December 15, 2022 11:17

talex5 marked this pull request as draft December 15, 2022 11:49

bikallem reviewed Dec 16, 2022

View reviewed changes

hannesm mentioned this pull request Dec 19, 2022

tls-eio: fix shutdown semantics of tls_eio flow #460

Closed

talex5 mentioned this pull request Dec 21, 2022

tls-eio: improve fuzz tests #463

Merged

talex5 mentioned this pull request Jan 4, 2023

tls-eio: decide on improvements needed #464

Open

talex5 closed this Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tls-eio: allow cancelling reads #459

tls-eio: allow cancelling reads #459

talex5 commented Dec 15, 2022 •

edited

Loading

talex5 commented Dec 15, 2022

bikallem Dec 16, 2022

talex5 Dec 21, 2022

bikallem Dec 21, 2022

talex5 commented Feb 16, 2023

tls-eio: allow cancelling reads #459

tls-eio: allow cancelling reads #459

Conversation

talex5 commented Dec 15, 2022 • edited Loading

talex5 commented Dec 15, 2022

bikallem Dec 16, 2022

Choose a reason for hiding this comment

talex5 Dec 21, 2022

Choose a reason for hiding this comment

bikallem Dec 21, 2022

Choose a reason for hiding this comment

talex5 commented Feb 16, 2023

talex5 commented Dec 15, 2022 •

edited

Loading