Skip to content

Conversation

@masaori335
Copy link
Contributor

Last few years, we're seeing inactivity_timeout doesn't work as expected in some cases.
Below is some debug logs when proxy.config.http2.no_activity_timeout_in is set 3.

# 1. connection initialized
[Apr 21 09:42:27.994] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=0, for NetVC=0xaabb70
[Apr 21 09:42:27.994] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=86400000000000, for NetVC=0xaabb70
[Apr 21 09:42:27.994] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=0, for NetVC=0xaabb70

# 2. accept connection and set proxy.config.http2.accept_no_activity_timeout
[Apr 21 09:42:28.002] Server {0xb0012000} DEBUG: <Http2SessionAccept.cc:54 (accept)> (http2_seq) [HttpSessionAccept2:mainEvent 0xaabb70] accepted connection from 127.0.0.1:57528 transport type = 4
[Apr 21 09:42:28.003] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=120000000000, for NetVC=0xaabb70
[Apr 21 09:42:28.003] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:185 (new_connection)> (http2_cs) [0] session born, netvc 0xaabb70

# 3. read and update inactivity timeout
[Apr 21 09:42:28.004] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:351 (state_read_connection_preface)> (http2_cs) [0] [&Http2ClientSession::state_read_connection_preface, VC_EVENT_READ_READY]
[Apr 21 09:42:28.004] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:64 (net_activity)> (socket) net_activity updating inactivity 86400000000000, NetVC=0xaabb70    <---- !!!!!!

# 4. set proxy.config.http2.no_activity_timeout_in
[Apr 21 09:42:28.004] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:367 (state_read_connection_preface)> (http2_cs) [0] received connection preface
[Apr 21 09:42:28.004] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=3000000000, for NetVC=0xaabb70

# 5. read and update inactivity timeout
[Apr 21 09:42:28.004] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:396 (state_start_frame_read)> (http2_cs) [0] [&Http2ClientSession::state_start_frame_read, VC_EVENT_READ_READY]
[Apr 21 09:42:28.018] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:64 (net_activity)> (socket) net_activity updating inactivity 86400000000000, NetVC=0xaabb70    <---- !!!!!!
[Apr 21 09:42:28.018] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:64 (net_activity)> (socket) net_activity updating inactivity 86400000000000, NetVC=0xaabb70    <---- !!!!!!

It looks like when some read or write is happened, inactivity timeout is updated by a value of proxy.config.net.default_inactivity_timeout (86400000000000). Is this expected behavior?
IMO, a value should be used which is passed by UnixNetVConnection::set_inactivity_timeout(ink_hrtime timeout_in).

With this patch, it works as expected.

# connection initialized
[Apr 21 09:35:36.564] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=0, for NetVC=0x2d50b70
[Apr 21 09:35:36.564] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=86400000000000, for NetVC=0x2d50b70
[Apr 21 09:35:36.564] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=0, for NetVC=0x2d50b70

# accept connection and set proxy.config.http2.accept_no_activity_timeout
[Apr 21 09:35:36.572] Server {0xb0012000} DEBUG: <Http2SessionAccept.cc:54 (accept)> (http2_seq) [HttpSessionAccept2:mainEvent 0x2d50b70] accepted connection from 127.0.0.1:57512 transport type = 4
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=120000000000, for NetVC=0x2d50b70
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:185 (new_connection)> (http2_cs) [0] session born, netvc 0x2d50b70

# read and update inactivity timeout
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:351 (state_read_connection_preface)> (http2_cs) [0] [&Http2ClientSession::state_read_connection_preface, VC_EVENT_READ_READY]
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:64 (net_activity)> (socket) net_activity updating inactivity 120000000000, NetVC=0x2d50b70

#  set proxy.config.http2.no_activity_timeout_in
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:367 (state_read_connection_preface)> (http2_cs) [0] received connection preface
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:1492 (set_inactivity_timeout)> (socket) Set inactive timeout=3000000000, for NetVC=0x2d50b70

# read and update inactivity timeout
[Apr 21 09:35:36.573] Server {0xb0012000} DEBUG: <Http2ClientSession.cc:396 (state_start_frame_read)> (http2_cs) [0] [&Http2ClientSession::state_start_frame_read, VC_EVENT_READ_READY]
[Apr 21 09:35:36.588] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:64 (net_activity)> (socket) net_activity updating inactivity 3000000000, NetVC=0x2d50b70
[Apr 21 09:35:36.588] Server {0xb0012000} DEBUG: <UnixNetVConnection.cc:64 (net_activity)> (socket) net_activity updating inactivity 3000000000, NetVC=0x2d50b70

@masaori335 masaori335 added this to the 7.1.0 milestone Apr 21, 2017
@masaori335 masaori335 self-assigned this Apr 21, 2017
@atsci
Copy link

atsci commented Apr 21, 2017

@atsci
Copy link

atsci commented Apr 21, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/281/

@atsci
Copy link

atsci commented Apr 21, 2017

@atsci
Copy link

atsci commented Apr 21, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/1962/

@atsci
Copy link

atsci commented Apr 21, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/392/

@atsci
Copy link

atsci commented Apr 21, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1854/

@atsci
Copy link

atsci commented Apr 21, 2017

clang-analyzer build failed! https://ci.trafficserver.apache.org/job/clang-analyzer-github/525/

@oknet
Copy link
Member

oknet commented Apr 21, 2017

this keeps next_inactivity_timeout_at = NOW + inactivity_timeout_in.

@zwoop
Copy link
Contributor

zwoop commented Apr 21, 2017

@oknet I assume your comment still means that you approve this PR?

@oknet
Copy link
Member

oknet commented Apr 22, 2017

@zwoop yes, I have marked "approved" on this PR and this is a candidate for v7.1.x.

@zwoop zwoop merged commit b34de98 into apache:master Apr 22, 2017
@zwoop
Copy link
Contributor

zwoop commented Apr 22, 2017

This has merge conflicts on 7.1.x, maybe because I backed out that other change from @oknet ? Oknet, does this fix the performance issue we saw with that patch? If so, should we redo that on 7.1.x? If not, we'll need a separate 7.1.x PR for this.

@zwoop zwoop modified the milestones: 7.2.0, 7.1.0 Apr 22, 2017
@zwoop
Copy link
Contributor

zwoop commented Apr 22, 2017

I'm marking this for 7.2.0 for now, since I can't safely land this on 7.1.x right now. Please advice :)

@masaori335
Copy link
Contributor Author

IMO, we don't need this patch without the changes of #771. Before #771, inactivity_timeout_in is not updated by default_inactivity_timeout. ( L355 ) As for performance issue, this patch might be mitigation.

} else
inactivity_timeout = 0;
#else
if (timeout_in == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oknet How about dividing this block to new function like set_default_inactivity_timeout()?
Because, in some cases people want to set inactivity_timeout_in 0. ( One of use cases is drain traffic )
And, currently, even if set_inactivity_timeout() is called from cancel_inactivity_timeout(), the inactivity_timeout_in will be default_inactivity_timeout. This could accidentally extend inactivity timeout and increase connections, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the design of InactivityCop ( before #771 , 6.0.x branch ), any netvc should have a default value for next_inactivity_timeout_at.

InactivityCop::check_inactivity() will call vc->set_inactivity_timeout(HRTIME_SECONDS(default_inactivity_timeout)) to set it if next_inactivity_timeout_at == 0.

And the inactivity_timeout_in will be set to HRTIME_SECONDS(default_inactivity_timeout) in the UnixNetVConnection::set_inactivity_timeout.

The cancel_inactivity_timeout() is useless because the InactivityCop::check_inactivity() will reset next_inactivity_timeout_at to default vaule in the next second.

In the #771 , I do not change the behavior and the design of InactivityCop. But I miss the code in this PR.

@masaori335 Can you create a separate PR for 7.1.x ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Thanks for describing the inactivity cop behaviors. I understood my proposal doesn't make sense.

BTW, #771 is reverted in 7.1.x branch, because of performance issue (see db596a1). Do we still need separate PR for 7.1.x?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not need it, since #771 was reverted.
The code changes by this PR is a part of #771 , It was lost during code merge from our internal branch.

Because our internal branch is saved in a separate network, I copy the code from "another screen" to "my local screen".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Do you see any performance issue with your internal branch like Miles said on #771? (Half throughputs or doubled connection) If not, I think we should land #771 with this patch in 7.1.0 release. WDYT?

@zwoop
Copy link
Contributor

zwoop commented Apr 24, 2017

If we undo the revert of 771 on 7.1.x (I'm testing now on a prod box, and it's looking better), then we need this right? Lets not forget this :).

@zwoop zwoop modified the milestones: 7.2.0, 8.0.0 Apr 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants