Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sources): Shutdown starting from tcp and unix sockets #2618

Merged
merged 10 commits into from
May 24, 2020

Conversation

ktff
Copy link
Contributor

@ktff ktff commented May 16, 2020

Closes #2604.

Adds ReadUntil for AsyncRead similar to TakeUntil for Stream.

This wraps tcp and unix sockets with ReadUntil and ShutdownSignal. So once the shutdown starts there will be no more reads from the sockets, but all that have been read will be processed.

This also fixes mock source.

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@ktff ktff added the domain: sources Anything related to the Vector's sources label May 16, 2020
@ktff ktff self-assigned this May 16, 2020
@ktff ktff requested a review from bruceg May 16, 2020 13:50
@ktff ktff requested a review from MOZGIII as a code owner May 16, 2020 14:12
@MOZGIII
Copy link
Contributor

MOZGIII commented May 16, 2020

I think there's an issue: users will be able to submit data to the TCP connections, get a TCP ACK for their data, and when shutdown signal comes and we discard all the data that's still left in the kernel buffers.
What if we handle it differently: upon shutdown, shutdown(2) the underlying socket (so that the other end knows it shouldn't send any more data), and read all the data until the ends from the descriptors before closing. This way would be more graceful than just giving up reading from the underlying Read.

@MOZGIII
Copy link
Contributor

MOZGIII commented May 16, 2020

Did some digging, and it seems like shutdown with SHUT_RD is a noop:
https://github.com/torvalds/linux/blob/f85c1598ddfe83f61d0656bd1d2025fa3b148b99/net/ipv4/tcp.c#L2317-L2335

But still, we should even without involving shutdown - I think we should at least try to attempt to the whole kernel-buffered data up until the underlying socket returns WouldBlock (or, since that can actually never happen under certain conditions, up to some force-stop timeout - but the important part is to also read till WouldBlock).

Copy link
Contributor

@MOZGIII MOZGIII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I noticed a few issues though, let's discuss!

src/async_read.rs Show resolved Hide resolved
src/async_read.rs Outdated Show resolved Hide resolved
tests/buffering.rs Show resolved Hide resolved
tests/support/mod.rs Show resolved Hide resolved
src/async_read.rs Show resolved Hide resolved
@ktff
Copy link
Contributor Author

ktff commented May 16, 2020

whole kernel-buffered data up until the underlying socket returns WouldBlock

That would just delay the problem, there is always a window of time between receiving WouldBlock and shutting down the connection, in which OS could receive more data which we would lose.

The only way that I see to avoid any loss is to close the connection without dropping it so we can read until the end. But TcpStream::shutdown is no good for the reason

Did some digging, and it seems like shutdown with SHUT_RD is a noop:
https://github.com/torvalds/linux/blob/f85c1598ddfe83f61d0656bd1d2025fa3b148b99/net/ipv4/tcp.c#L2317-L2335

@MOZGIII do you have some other idea?

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@MOZGIII
Copy link
Contributor

MOZGIII commented May 16, 2020

The only way that I see to avoid any loss is to close the connection without dropping it so we can read until the end. But TcpStream::shutdown is no good for the reason

Right, but when socket is closed - we can't read anymore...

@MOZGIII do you have some other idea?

In essence, we want to 1) consume the stuff that's at the receive queue for a socket, but 2) without letting the kernel accepting any new data. We could easily do (1), but not the (2).

Yeah, I'm not sure it's something we can achieve with raw TCP... So, I think what we have currently is good! 👍

ktff added 2 commits May 16, 2020 17:55
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@ktff ktff force-pushed the ktff/empty_buffers branch from 9ac8e4b to b5fc71e Compare May 16, 2020 15:57
@ktff ktff requested a review from MOZGIII May 16, 2020 16:01
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Copy link
Contributor

@MOZGIII MOZGIII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blockers are addressed. This is now mergable IMO, addressing the rest is optional.

@ktff
Copy link
Contributor Author

ktff commented May 16, 2020

Let's wait for @bruceg before merging, maybe he has an idea on how to get those bytes in OS buffers.

@binarylogic
Copy link
Contributor

Nice work @ktff, thanks for the review @MOZGIII.

tests/buffering.rs Outdated Show resolved Hide resolved
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@bruceg
Copy link
Member

bruceg commented May 18, 2020

Did some digging, and it seems like shutdown with SHUT_RD is a noop:

Can we do shutdown(socket, SHUT_RDWR)? Vector won't be writing to the socket, and this will mark its state properly in the kernel.

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this allow_read_until trait approach, looks good.

@MOZGIII
Copy link
Contributor

MOZGIII commented May 18, 2020

Did some digging, and it seems like shutdown with SHUT_RD is a noop:

Can we do shutdown(socket, SHUT_RDWR)? Vector won't be writing to the socket, and this will mark its state properly in the kernel.

Technically, we can do shutdown(socket, SHUT_WR) immediately when we obtain the socket. And we probably should. This won't help us with the issue at hand though. ☹️

Copy link
Member

@lukesteensen lukesteensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is always a window of time between receiving WouldBlock and shutting down the connection, in which OS could receive more data which we would lose.

The only way that I see to avoid any loss is to close the connection without dropping it so we can read until the end.

Yes, we need to do the inverse of #2429 here. Basically, the receiver can signal to the sender that it should stop writing by closing its own write half of the socket. The sender detects this when read returns EOF and it can then finish sending and close its own end of the socket. The server can then read until EOF on its end and no data is lost.

@ktff
Copy link
Contributor Author

ktff commented May 19, 2020

@lukesteensen Didn't know that was the convention. Do you have an idea on how common it is? I ask so to determine should we add a timer for such connections with duration shorter than the shutdown timer, something like a few seconds, or just to rely on the shutdown timer for those that don't follow it.

@MOZGIII
Copy link
Contributor

MOZGIII commented May 19, 2020

Basically, the receiver can signal to the sender that it should stop writing by closing its own write half of the socket.

This looks like it’s a protocol layer atop of TCP. I would expect the receiver closing the write end asap, rather than when it’s done reading. It’s a clever way to implement the required signalling atop of a protocol that doesn’t implement it, but I think it’s a non-standard contract. The downside is we can’t really automatically detect if that contract is respected by a particular sender. Looks like we can use it by default though, and if it works - it works, if it doesn’t - well, nothing breaks, and we just fallback to the scenario equivalent to the current situation.

@lukesteensen
Copy link
Member

I'm not sure I'd call it an entire protocol, but I agree it is at least a simple convention on top of TCP. Depending on the socket implementation, it does give a much more graceful closure within the TCP protocol itself.

You are right that we can't ensure the client follows that convention, but there is no harm in trying and it degrades to exactly the same situation as before if they do not.

@lukesteensen
Copy link
Member

Do you have an idea on how common it is? I ask so to determine should we add a timer for such connections with duration shorter than the shutdown timer, something like a few seconds, or just to rely on the shutdown timer for those that don't follow it.

Good question, I am not entirely sure 😄

I would think that something on the order of 5 seconds should be long enough to wait. It's probably not wise to assume most clients will follow that convention.

@Hoverbear
Copy link
Contributor

I think a shutdown timer is reasonable. :) 5 seconds is plenty generous.

ktff added 2 commits May 22, 2020 21:54
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@ktff
Copy link
Contributor Author

ktff commented May 22, 2020

A timeout was already added in #278. It's configurable through toml so I've just reused it and left the default 30sec.

Latest commit contains a test and a fix for TcpSink. The test checks that we aren't loosing any events in TcpSink -> Socket::Tcp topology because of a source shutdown. It showed that TcpSink wasn't flushing it's buffers, when we signaled it to disconnect by closing write part of the channel from the source side.

@ktff ktff requested a review from MOZGIII May 22, 2020 21:41
src/sources/socket/mod.rs Outdated Show resolved Hide resolved
ktff added 2 commits May 24, 2020 18:03
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
@ktff ktff merged commit e0761a9 into master May 24, 2020
@ktff ktff deleted the ktff/empty_buffers branch May 24, 2020 18:28
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
…dotdev#2618)

* Add ReadUntil

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Fix mock source

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Fix mock source

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Rename

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Rename struct

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Simplify assert

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Trigger write shutdown

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Add test & fix TcpSink

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Remove stray comment

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

* Bump

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_unix_stream_syslog fails at CI
6 participants