Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate socket sendall method in favor of SocketStream.send_all #291

Closed
parity3 opened this issue Aug 17, 2017 · 4 comments
Closed

Deprecate socket sendall method in favor of SocketStream.send_all #291

parity3 opened this issue Aug 17, 2017 · 4 comments

Comments

@parity3
Copy link

parity3 commented Aug 17, 2017

I looked at the high level async socket.sendall() trio 0.1.0 code and realized nothing would stop a user from calling sendall() from 2 tasks on the same socket, and having the individual lowlevel send()'s being intertwined, which is probably not what the user wants (ie fragmented/mashed sendall() data ordering).

Some options are:

  • write some concurrency protection, raise an error
  • write some documentation saying sendall() is not atomic
  • make sendall atomic (via locking/parking lots, etc because I don't think we can safely disable checkpoints)
@njsmith
Copy link
Member

njsmith commented Aug 18, 2017

The high-level SocketStream wrapper does do this check. (Though that code is a bit confusing -- it may say lock, but it actually does raise an error if multiple tasks try to call it concurrently.) So does trio.hazmat.wait_socket_writable, which is the low-level primitive that send calls when it needs to block. So the situation where this could go wrong is when someone is using the socket sendall method directly (hopefully rare once the high-level API is released), and a somewhat tricky race happens like:

  • Task A's sendall calls send, which does a partial send (meaning the output buffer is full)
  • Task B's sendall gets scheduled immediately afterwards, before Task A can run again, and its send succeeds, because those extra few milliseconds were enough for the kernel to drain the send buffer
  • Task A gets rescheduled and its sendall calls send again.

This is probably rare but I bet it can happen in real life.

The high-level layer can't just delegate worrying about sendall to the socket layer, because it needs to also catch conflicts between send_all and wait_send_all_might_not_block. I think the only way to make this all work sensibly without tons of duplicate work would be if we made "I am using this resource please make everyone else error out" explicitly part of the lower-level APIs, either at the socket or I/O loop level. And this is tricky because for UDP I don't think you necessarily want this checking on sendto, because there sendto actually is atomic. Well, but right now you can't necessarily just blithely call sendto from multiple tasks on UDP and assume it will work, because I think it can return EWOULDBLOCK in some weird cases, and then we'll call wait_socket_writable and that will raise ResourceBusyError if multiple tasks do it at the same time. So you would need a sendto_nowait to really take advantage of this. But maybe we should have sendto_nowait; I don't think we want to rule that out, anyway.

So I think I've argued myself into saying that for now, we should add a note to the documentation warning that if you try calling the socket-level sendall simultaneously from multiple tasks then you get undefined behavior (and also have you considered using our nice higher-level APIs?).

I guess there's even an argument for deprecating and removing the socket-level sendall, on the grounds that it's a convenience API and if you want convenience than our Stream layer is the thing to use. Sort of like how we don't bother providing a trio.socket.create_connection emulation of socket.create_connection, because people should just use the much fancier trio.open_tcp_stream.

(Is there ever any reason to call sock.sendall(...) on a non-SOCK_STREAM socket?)

@njsmith njsmith added the docs label Aug 18, 2017
@njsmith
Copy link
Member

njsmith commented Aug 18, 2017

(Is there ever any reason to call sock.sendall(...) on a non-SOCK_STREAM socket?)

I looked at this a little. Definitely not SOCK_DGRAM or SOCK_RAW. For SOCK_SEQPACKET I... can't tell, exactly. The docs are terrible. But I think the only thing that could even potentially make sense would be sendall with MSG_MORE followed by a call to send without MSG_MORE, but this seems like a weird way to do things. I don't think you can get a partial result back from send.

@njsmith
Copy link
Member

njsmith commented Aug 18, 2017

On even more investigation: POSIX says, regarding SOCK_SEQPACKET: "A record can be sent using one or more output operations and received using one or more input operations, but a single operation never transfers parts of more than one record. Record boundaries are visible to the receiver via the MSG_EOR flag in the received message flags returned by the recvmsg() function." Buuuut apparently Linux's AF_UNIX SOCK_SEQPACKET implementation just totally ignores all of this except for "a single operation never transfers parts of more than one record". Specifically, there doesn't seem to be any way to send a message over multiple calls (MSG_MORE does nothing), and there doesn't seem to be a way to receive one either (it doesn't actually give MSG_EOR, and it's like UDP: you get one call to receive each message, and you either get the whole thing or a truncated version, that's it). So yeah, no reason to use sendall here at all.

@njsmith
Copy link
Member

njsmith commented Aug 18, 2017

#292 for the documentation part.

@njsmith njsmith changed the title Think about socket/stream sendall() high level concurrency protection Deprecate socket sendall method in favor of SocketStream.send_all Aug 18, 2017
@njsmith njsmith mentioned this issue Aug 21, 2017
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants