Skip to content

Conversation

@keesspoelstra
Copy link
Contributor

Under certain conditions it is possible that read_disable or write_disable will set vc->next_inactivity_timeout to 0 while inactivity_timeout_in still has a value.
In manage_active_queue this will lead to a close of the vc, resulting in interrupted service for the client.
Checking the source code the best fix seems to be to check for next_inactivity_timeout_at!=0 in manage_active_queue.
The value 0 does not seem to be used to force an early close/cleanup. The current time is used for that in certain cases.

The active timeout does not have this problem. If it is set to 0 the active_timeout_in is also set to 0. This cannot be done for inactive timeout because the net_activity call needs inactivity_timeout_in.

…inactivity_timeout_in!=0

Under certain conditions it is possible that read_disable or write_disable will set vc->next_inactivity_timeout to 0
while inactivity_timeout_in still has a value.
In manage_active_queue this will lead to a close of the vc, resulting in interrupted service for the client.
Checking the source code the best fix seems to be to check for next_inactivity_timeout_at!=0 in manage_active_queue.
The value 0 does not seem to be used to force an early close/cleanup. The current time is used for that in certain cases.

The active timeout does not have this problem. If it is set to 0 the active_timeout_in is also set to 0. This cannot
be done for inactive timeout because the net_activity call needs inactivity_timeout_in.
@PSUdaemon PSUdaemon self-assigned this Apr 17, 2017
@PSUdaemon PSUdaemon added Backport Marked for backport for an LTS patch release Network labels Apr 17, 2017
@PSUdaemon PSUdaemon added this to the 6.2.2 milestone Apr 17, 2017
@PSUdaemon
Copy link
Contributor

The commit message is missing the note about which commit on master this was cherry-picked from. Please use the -x option when cherry picking. Also, don't forget to uncomment any conflicts.

@zwoop
Copy link
Contributor

zwoop commented Apr 18, 2017

How does this apply to master now? Is this an issue there? I saw your comment on my other Issue, maybe this is now a candidate for 7.1.x as well ?

@keesspoelstra
Copy link
Contributor Author

@zwoop this applies to 7,1.x also,It can be reliably applied to master also,

However for master the 425b696 commit mitigates the behavior by calling the vc->set_inactivity_timeout(0); instead of vc->next_inactivity_timeout_at = 0;
This will prevent the condition to clean up the connection to be satisfied (inactivity_timeout_in will be 0).

425b696 causes other problems though as you've found out.

Do I need to do anything to ready it for master/7.1?

@oknet
Copy link
Member

oknet commented Apr 19, 2017

In the master branch,

  • Call vc->set_inactivity_timeout(0) instead of set next_inactivity_timeout_at to 0.
  • The inactivity_timeout_in is always greater than 0, it will be nh->default_inactivity_timeout if you set_inactivity_timeout(0).

Therefore, the patch for 7.0.x and later is below,

diff --git a/iocore/net/UnixNetVConnection.cc b/iocore/net/UnixNetVConnection.cc
index b73bcb3..d850e69 100644
--- a/iocore/net/UnixNetVConnection.cc
+++ b/iocore/net/UnixNetVConnection.cc
@@ -75,10 +75,7 @@ net_activity(UnixNetVConnection *vc, EThread *thread)
       vc->inactivity_timeout = 0;
   }
 #else
-  if (vc->inactivity_timeout_in)
-    vc->next_inactivity_timeout_at = Thread::get_hrtime() + vc->inactivity_timeout_in;
-  else
-    vc->next_inactivity_timeout_at = 0;
+  vc->next_inactivity_timeout_at = Thread::get_hrtime() + vc->inactivity_timeout_in;
 #endif
 }

@keesspoelstra Thanks for helping us figure it out. @zwoop

@keesspoelstra
Copy link
Contributor Author

@oknet , seems reasonable, but I don't know if reverting back to default inactivity timeout is the way to go, in the old code
next_inactivity_timeout_at=0 seems to be used to stall the inactivity timeout during write_disable, read_disable, not resetting it to default. Default might even be higher than the current inactivity timeout causing more connections for some servers.

To make it more explicit I would opt for an extra function pause_inactivity_timeout or something like that.

@oknet
Copy link
Member

oknet commented Apr 19, 2017

@keesspoelstra in the old InactivityCop::check_inactivity() code, it will set next_inactivity_timeout_at to default if it is 0.

 83       // set a default inactivity timeout if one is not set
 84       if (vc->next_inactivity_timeout_at == 0 && default_inactivity_timeout > 0) {
 85         Debug("inactivity_cop", "vc: %p inactivity timeout not set, setting a default of %d", vc, default_inactivity_timeout);
 86         vc->set_inactivity_timeout(HRTIME_SECONDS(default_inactivity_timeout));
 87         NET_INCREMENT_DYN_STAT(default_inactivity_timeout_stat);
 88       } else {
 89         Debug("inactivity_cop_verbose", "vc: %p now: %" PRId64 " timeout at: %" PRId64 " timeout in: %" PRId64, vc, now,
 90               ink_hrtime_to_sec(vc->next_inactivity_timeout_at), ink_hrtime_to_sec(vc->inactivity_timeout_in));
 91       }

@oknet
Copy link
Member

oknet commented Apr 19, 2017

"set next_inactivity_timeout_at to default" is a safety mechanism that enforce the netvc is closed in a expected time.

The manage_active_queue() try to close the netvc that has timeouted.
The netvc won't be closed by manage_active_queue() if the netvc has read_disable() and write_disable().

@keesspoelstra
Copy link
Contributor Author

Ok, now we're on to something. Because the VCs return from disable very quickly (we hope so) the inactivity cop does not see them often and does not set the default, thus not introducing the lengthening or shortening of inactivity timeout.

But by setting the inactivity_timeout_in explicitly to default when we're disabling and on other places we're making matters worse because we're introducing longer (or shorter) timeouts in some cases.

No access to source code , but I can imagine SSL handshakes being vulnerable to this.

@zwoop
Copy link
Contributor

zwoop commented Apr 22, 2017

What's the consensus here? Note that I had to revert @oknet's patch for inactivity timeout on 7.1.x, we likely need to do it on master too unless we can resolve the performance issues (they are severe). How does this PR apply to master? I'm confused with the fact that this is being developed on the 6.2.x branch.

@keesspoelstra
Copy link
Contributor Author

If you reverted you will need this patch. @oknet , agree?

Without it every once in a few 1000-100.000 requests connections can be dropped giving partial content depending on plugins and timing of the system.

It was developed against 6.2.x because at the time it was only applicable to 6.2.x and earlier. What is the procedure to land this on master or 7.1.x?

@oknet
Copy link
Member

oknet commented Apr 24, 2017

@keesspoelstra agree with you, @zwoop please apply this PR if you have revert #771 .
Because next_inactivity_timeout_at=0 for a new netvc.

ATS maybe close netvc in active queue if add_to_active_queue() called.

@oknet
Copy link
Member

oknet commented Apr 24, 2017

@keesspoelstra @zwoop

Without #771 , the cancel_inactivity_timeout() is useless because the InactivityCop::check_inactivity() will reset next_inactivity_timeout_at to default value in the next second.

codes from 7.1.x branch:

437 TS_INLINE void
438 UnixNetVConnection::cancel_inactivity_timeout()
439 {
440   Debug("socket", "Cancel inactive timeout for NetVC=%p", this);
441   inactivity_timeout_in = 0;
442 #ifdef INACTIVITY_TIMEOUT
443   if (inactivity_timeout) {
444     Debug("socket", "Cancel inactive timeout for NetVC=%p", this);
445     inactivity_timeout->cancel_action(this);
446     inactivity_timeout = nullptr;
447   }
448 #else
449   next_inactivity_timeout_at = 0;   // set to 0
450 #endif
451 }

Set next_inactivity_timeout_at by set_inactivity_timeout() if it is 0.

 83       // set a default inactivity timeout if one is not set
 84       if (vc->next_inactivity_timeout_at == 0 && default_inactivity_timeout > 0) {
 85         Debug("inactivity_cop", "vc: %p inactivity timeout not set, setting a default of %d", vc, default_inactivity_timeout);
 86         vc->set_inactivity_timeout(HRTIME_SECONDS(default_inactivity_timeout));
 87         NET_INCREMENT_DYN_STAT(default_inactivity_timeout_stat);
 88       } else {

@PSUdaemon
Copy link
Contributor

[approve ci]

1 similar comment
@PSUdaemon
Copy link
Contributor

[approve ci]

@zwoop
Copy link
Contributor

zwoop commented May 31, 2017

Where are we with this PR? Do we need this on master and/or 7.1.x as well? Or is it already in either (or both) ?

@oknet
Copy link
Member

oknet commented Jun 1, 2017

@zwoop I think this is only to 6.2.x.

@PSUdaemon
Copy link
Contributor

This PR seems a little confused. It seems like a legitimate bug, but this is not set up as a backport of an upstream fix. I'd like to see that fixed so I can merge, but I cannot merge this as is. I am going to close it, but please reopen it if it can be fixed, or open a new one as a proper backport of an upstream fix.

@PSUdaemon PSUdaemon closed this Jul 26, 2017
@PSUdaemon PSUdaemon removed this from the 6.2.2 milestone Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backport Marked for backport for an LTS patch release Network

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants