-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dns client eio #458
Dns client eio #458
Conversation
write_t t resp ; | ||
raise (Tls_failure failure) | ||
with | ||
| ( End_of_file | Tls_alert _ | Tls_failure _ ) as ex -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure but it seems when handling exceptions in effects/eio, we need to be more explicit as to what we are handling. At one time exceptions such as this https://github.com/ocaml/ocaml/blob/trunk/stdlib/effect.ml#L59
was being trapped in here. Perhaps this should go in the eio
error handling guidance ?
|
||
let write t cs = writev t [cs] | ||
|
||
(* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer applicable since the only place we need to worry about interleaving read/write is when reading from the tls_eio flow.
blit res 0 buf 0 n ; | ||
let rec read t flow_buf : int = | ||
|
||
let write_application_data data = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the name of this function, this sounds like "only application data will ever be written here", but AFAICT there could as well be handshake data that is being written. how did you choose the name?
eio/tls_eio.ml
Outdated
| `Send -> close_tls t ; Flow.shutdown t.flow `Send | ||
| `All -> close_tls t ; Flow.shutdown t.flow `All | ||
| `Receive -> close_tls t ; Flow.shutdown t.flow `Receive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one but I have added a corresponding close on the t.flow
believing this version to be more semantically correct than the original. However, please advise if this is not the case.
push_linger t cs; drain_handshake t | ||
if not (Tls.Engine.handshake_in_progress t.tls) then t | ||
else | ||
let application_data = read_t t in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think the naming is correct here. the buffer returned by read_t
is some encrypted TLS data, no? and this may be handshake, application data, alerts, heartbeat, ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, sorry, read_t
now indeed returns application data.
eio/tls_eio.ml
Outdated
|
||
let copy_from t src = | ||
try | ||
while true do | ||
let buf = Cstruct.create 4096 in | ||
let got = Flow.read src buf in | ||
let got = Flow.single_read src buf in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just replaces deprecated Flow.read
Thanks for your PR, unfortunately this is pretty complex, and since I don't know any eio guarantees, I'm not sure when I'll find time to properly review it (unlikely this year). But maybe @talex5 has an opinion about this PR, and if he approves the code, I can merge and release. In general:
|
I think it would be good to merge the fuzz tests (#456) before making any further changes to tls-eio. It would also be good to modify the fuzz tests (in this PR) to show the problem being fixed (e.g. by having the fuzz tester randomly cancel reads).
That's interesting. Since the fuzz tests don't detect this, I guess it works as long as the underlying flow orders writes correctly, which perhaps the real flow does not. Possibly we should fix it there instead. |
What @talex5 says, but even more convincing would be a regression test (rather than a fuzz test) of the bad record mac behaviour -- if I understand both of you correctly, this should be easy since with eio everything can be mocked (including the scheduler) |
Yes, a test case would indeed be nice. However, I am not too familiar with crowbar or property based testing in general; so it will probably be a while before I get to it. Alternately, I was thinking about using As an aside, do either of you know if tla+ (https://learntla.com/) is a good candidate for uncovering concurrency issues such as what this PR is trying to address? |
Dear @bikallem, I think the underlying issue is some sort of interleaving of fibers. I don't think the hammer of property-based testing or fuzz-testing is needed for reproduction. A normal unit test should suffice, it may need a custom "scheduler" (but here I don't know enough of eio and OCaml 5 to guide how the test could be setup). As mentioned earlier, I'm not too convinced that "let's acquire a mutex for each flow, and use it for each read and write" is the way to go. I'd take a step back and ask: why would a tls flow be shared across fibers in the first place? But as mentioned, I tried to read the API docs of eio and couldn't find any guarantees / mentions of what is safe to do in parallel. |
eac6177
to
d4db677
Compare
Dear @bikallem, thanks again for this PR. It would really benefit if you could focus on a single issue (and best option to provide a test case and a minimal change that fixes it), and not have lots of other changes in the same PR - esp. the I'd appreciate to first see some discussion about "why a mutex is needed" and what the desired parallelism story of eio is. |
Also, please feel free to reach out via eMail if you'd like to schedule a more instant chat (i.e. a video meeting), where we can discuss your goals and the road towards them. :) |
Thanks for the feedback. Indeed, this PR was supposed to only fix issue #457. However, while attempting to write a test for the issue, I noticed another issue with my PR of not handling I have converted the PR into draft for now. Many thanks to @talex5 for the fuzz test. I feel I am much more productive with such test harness. |
5580b36
to
a57fa12
Compare
327b0f8
to
2bbc912
Compare
Re-implement `read_into` to make read/write of tls as required an atomic operation. A read_into call to tlw flow can't be interleaved with other read/write on the underlying flow. This commit also cleans up error/exception handling. Without this commit, the exn raised by tls-eio were obfuscating and not helpful. For example, I was getting `Eio_core.Not_first` error. This error is purely internal to `eio` and should never leak to users of the lib.
- The mock_socket has changed slightly. Previously, it would request a size to read, do the read, and then wait for another size for the next read, which would typically just be the EOF for that request. This made it harder for the fuzzer to reach useful states. Now you can do a single transmit with a single test action. - Simplified the write in Tls_eio slightly by using the new `Flow.write` operation. - Only log a send when we actually send some data. - Allow the fuzzer to perform all the actions needed for the Tls handshake in one go instead of requiring it to find them by searching. This provides it a way to get to the interesting cases quickly.
2bbc912
to
22563de
Compare
Closing this PR for now in support of #464 |
Fixes #457
Summary of Changes:
Simplified
t.state
to just beTls.Engine.state
. Previously handling and re-raising exceptions was giving obfuscating errors in tls-eio, such asCancelled: Eio__core__Fiber.Not_first
ormutex poisioned
and so forth. This and using them together withEio.Fiber.any/first
seemingly give wrong, unhelpful and unproductive errors. Therefore, the code deviates fromtls_lwt
style of state management. I believe the new state handling is much more direct and straightforward as preferred by eio and OCaml effects.I was also getting weird intermittent errors such as
MACMismatch
orRecordOverflow
. From investigation, I concluded that the errors were due to the interleaving of read/write operations of the concurrent fibers, i.e. a read on tls flow means read/write on the underlying flowt.flow
. Therefore the underlying read/write ont.flow
should be protected with a mutex such that the read/write ont.flow
is atomic.I have also updated some deprecated function usage such as
Eio.Flow.read
toEio.Flow.single_read
./cc @talex5 @hannesm