TS-4796 Change UnixNetHandler to always bubble up epoll errors to the VConnection #947

jacksontj · 2016-08-31T20:05:56Z

Before if the vcon wasn't read or write enabled errors would be swallowed. This leads to a variety of issues where the socket errors aren't dealt with immediately.

atsci · 2016-09-01T01:55:02Z

FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/659/ for details.

atsci · 2016-09-01T01:59:33Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/554/ for details.

atsci · 2016-09-01T05:23:11Z

FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/661/ for details.

atsci · 2016-09-01T05:24:24Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/556/ for details.

SolidWallOfCode · 2016-09-01T15:17:07Z

Seems reasonable. I can't think why the errors shouldn't bubble up. We may need to be a bit careful on handling them though.

jpeach · 2016-09-01T21:26:32Z

I don't think this is the right approach.

First, read_signal_error and write_signal_error don't need to be made non-static since you already have UnixNetVConnection::readSignalError().

Second, the err argument to read_signal_error is supposed to be an errno, but you are passing and eventio bitmask. That's not really what is supposed to happen here.

Third, the logic in NetHandler::mainNetEvent is now weird. There is a subsequent check for get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR), but the error case would never be true now.

Finally, does this change mean that VC_EVENT_EOS is now delivered as VC_EVENT_ERROR? The EVENTIO_ERROR bitmask includes the HUP event, which I suspect means that the socket was closed. I think the expectation is that this would be detected in read_from_net, but now it looks like you would get an error event.

jpeach · 2016-09-01T21:26:40Z

/cc @oknet

oknet · 2016-09-02T02:09:40Z

EVENTIO_ERROR means EPOLLHUP | EPOLLERR | EPOLLPRI.

EPOLLPRI means OOB or TCP URG is set. You will always receive EPOLLPRI with EPOLLIN.
To EPOLLPRI, we need handle READ first and ignore EPOLLPRI, this is what NetHandler does.

EPOLLHUP & EPOLLRDHUP
reference : http://stackoverflow.com/questions/8707458/epoll-and-remote-1-way-shutdown

A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN)
event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily
mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and
eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote
peer only did a half-close of its socket.

The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket
with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent
instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or
EPOLLERR. (I haven't had time to confirm, but it makes sense).

There is some documentation to suggest there are older Linux implementations that don't support
EPOLLRDHUP, as such EPOLLHUP gets generated instead.

And for what it is worth, in my particular case, I found that it is not too interesting to have code that
special cases EPOLLHUP or EPOLLRDHUP events. Instead, just treat these events the same as
EPOLLIN/EPOLLOUT and call recv() (or send() as appropriate). But pay close attention to return codes
returned back from recv() and send().

EPOLLERR means the possible non-fatal errors on socket fd such as EAGAIN, EINTR, EWOULDBLOCK and fatal errors.

When you receive EPLLERR, it means an error of socketfd and also there may be data before this error. Therefore we should call read() and write() to figure out the actual meanning of this error

Currently, NetHandler try to perform read() & write() on the socket fd first.
We will get non-fatal errors or fatal erros from read() or write().
if it is non-fatal error, just put socket fd into wait list.
if it is fatal error, signal SM to close socket fd.
e.g.
There is a Fatal ERROR if EPIPE is returned from write().
If "0" is returned from read() there is EOF.

So the currently implement of NetHandler is enough to handle all of this and doesn't need to change.

oknet · 2016-09-02T06:38:16Z

The InactivityCop will handle those NetVCs that read or write disabled.
InactivityCop will send TIMEOUT to SM or call close_UnixNetVConnection() to close NetVC directly depend on the last call to do_io_read and do_io_write.

To disable read and write but still handle Timeout Event:
do_io_read(sm, 0, NULL) and do_io_write(sm, 0, NULL)
In this way, SM is able to receive the TIMEOUT from InactitiryCop then re-set timeout or close NetVC.

To disable read and write and ignore Timeout Event:
do_io_read(NULL, 0, NULL) and do_io_write(NULL, 0, NULL)
The SM will not receive any Event related to the NetVC and InactivityCop will close NetVC while timeout.

Note:
The InactivityCop try to send TIMEOUT to read.vio._cont first then write.vio._cont if NetVC not closed. The NetVC closed immediately if vio._cont is NULL, otherwise depend on the result of SM

jacksontj · 2016-09-02T14:53:29Z

The behavior I see on master (without this patch) is that ATS doesn't close the session when getting the RST. From digging that UnixNetHandler gets an EPOLLERR -- which attempts to add it to the write_ready queue, but since the vc isn't enabled for writing the HTTPTunnel isn't ever called to do the write.

So from your last 2 comments it sounds like a more correct approach would be to immediately schedule a read/write to the socket -- to determine if there was in fact an error.

It also sounds like you are suggesting that inactivity cop should get these sessions? In my tests I see that these sessions are either killed by the next attempt to write to the closed socket (when the origin sends more bytes later) or when the transaction hits the max inactivity timeout. Neither of these behaviors are what we want-- since I already got an RST from the client. So it seems that we need to somehow force-schedule a read/write in these error conditions-- do you have any pointers on how to do that?

oknet · 2016-09-02T17:37:34Z

according your description, the HttpTunnel transfers the data from server session to client session, ATS received a RST from client and the connection between ATS and origin server is still alive.

In your scenario, server session is the producer of HttpTunnel and client session is a consumer of HttpTunnel and cache session is the 2nd consumer if cache enabled.

The HttpTunnel will not break if one consumer failed.

The producer will re-enable all consumers if received READ_READY event.
The consumer will re-enable producer if received WRITE_READY event.

The producer is master and all consumers are slave. let the master trigger slaves and isolate the broken slave.

Add a netvc into write_ready queue means there is non-fatal error(ex EAGAIN) at last write() call. It means write is enabled on the netvc.

Only the returned errno from read() and write() is trustable.

To close a netvc immediately if vc->read.vio._cont and vc->write.vio._cont both are NULL.
Otherwise should callback to SM.

oknet · 2016-09-03T06:34:54Z

comments for codes:

      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
        // ** set read.triggered if a epoll event of net has EVENTIO_READ or EVENTIO_ERROR bit set.
        vc->read.triggered = 1;
        // ** and put vc in read_ready_list
        if (!read_ready_list.in(vc)) {
          read_ready_list.enqueue(vc);
        } else if (get_ev_events(pd, x) & EVENTIO_ERROR) {
          // ** output an error message if a netvc already in ready_list got EVENTIO_ERROR.
          // check for unhandled epoll events that should be handled
          Debug("iocore_net_main", "Unhandled epoll event on read: 0x%04x read.enabled=%d closed=%d read.netready_queue=%d",
                get_ev_events(pd, x), vc->read.enabled, vc->closed, read_ready_list.in(vc));
        }
      }

The netvc always put in read_ready_list if it has EVENTIO_READ or EVENTIO_ERROR.

#if defined(USE_EDGE_TRIGGER)
  // UnixNetVConnection *
  // ** for each netvc in read_ready_list
  while ((vc = read_ready_list.dequeue())) {
    if (vc->closed) // ** if the netvc mark closed
      close_UnixNetVConnection(vc, trigger_event->ethread);
    else if (vc->read.enabled && vc->read.triggered) //** if the netvc is enabled and triggered
      vc->net_read_io(this, trigger_event->ethread);
    else if (!vc->read.enabled) { //** if the netvc is not enabled
      read_ready_list.remove(vc);
#if defined(solaris)
      if (vc->read.triggered && vc->write.enabled) {
        vc->ep.modify(-EVENTIO_READ);
        vc->ep.refresh(EVENTIO_WRITE);
        vc->writeReschedule(this);
      }   
#endif
    }   
  }

for your case, read.enabled is 0

    else if (!vc->read.enabled) { // if the netvc is not enabled
      read_ready_list.remove(vc);

The vc is removed from read_ready_list.

The below is my suggest:

      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
        vc->read.triggered = 1;
+      if (get_ev_events(pd, x) & EVENTIO_ERROR) {
+        vc->read.error = 1;
+      }
        if (!read_ready_list.in(vc)) {
          read_ready_list.enqueue(vc);
        } else if (get_ev_events(pd, x) & EVENTIO_ERROR) {

and

    else if (vc->read.enabled && vc->read.triggered) //** if the netvc is enabled and triggered
      vc->net_read_io(this, trigger_event->ethread);
+  else if (vc->read.error) {
+    int err = 0, errlen = sizeof(int);
+    if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &err, &errlen) == -1) {
+      err = errno;
+    }
+    if (err != EAGAIN && err != EINTR)
+      vc->readSignalError(this, err);
+  }
    else if (!vc->read.enabled) { //** if the netvc is not enabled
      read_ready_list.remove(vc);
#if defined(solaris)
      if (vc->read.triggered && vc->write.enabled) {
        vc->ep.modify(-EVENTIO_READ);
        vc->ep.refresh(EVENTIO_WRITE);

jacksontj · 2016-09-03T17:52:03Z

That all sounds reasonable :) I just pushed a new commit here which does effectively what you are suggesting-- just both on the read and write side (as well as the few other little changes to make it work). I tested it and this covers my use-case (since we are still bubbling the error up when we aren't enabled) and should continue functioning the same for all the rest of the cases (since left them alone).

atsci · 2016-09-03T18:02:05Z

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/691/ for details.

atsci · 2016-09-03T18:05:44Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/587/ for details.

oknet · 2016-09-05T10:08:12Z

iocore/net/UnixNet.cc

+      }
+      if (err != EAGAIN && err != EINTR)
+        vc->writeSignalError(this, err);
+    } else if (!vc->write.enabled) {


reset read.error or write.error if non-fatal error.
keep read.enabled or write.enabled check.
suggest code:

else { if (vc->write.error) { ... if (err && err != EAGAIN && err != EINTR) vc->writeSignalError(this, err); else vc->write.error = 0; } if (!vc->write.enabled) { ... } }

the same to READ.

atsci · 2016-09-06T19:15:10Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/606/ for details.

atsci · 2016-09-06T19:20:04Z

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/711/ for details.

atsci · 2016-09-06T19:26:45Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/607/ for details.

oknet · 2016-09-07T06:10:12Z

iocore/net/UnixNet.cc

+        err = errno;
+      }
+      if (err != EAGAIN && err != EINTR) {
+        vc->readSignalError(this, err);


Sorry, my mistaken here. We should get vio mutex locked first before callback to SM. the new suggest :

else if (vc->read.enabled && vc->read.triggered || vc->read.error)

move "else if (vc->read.error) { ... }" into "vc->net_read_io()"

we can not access any member of NetVC ( e.g. vc->con.fd ) before get vio mutex locked.

static void read_from_net(NetHandler *nh, UnixNetVConnection *vc, EThread *thread) { ... if (vc->closed) { close_UnixNetVConnection(vc, thread); return; } + // if it is not enabled and got error from polling + if (!s->enabled && s->error) { + int err = 0, errlen = sizeof(int); + if (getsockopt(vc->con.fd, SOL_SOCKET, SO_ERROR, &err, (socklen_t *)&errlen) == -1) { + err = errno; + } + if (err != EAGAIN && err != EINTR) { + read_signal_error(this, vc, err); + } + return; + } // if it is not enabled. if (!s->enabled || s->vio.op != VIO::READ) { read_disable(nh, vc); return; } ... }

That makes sense :) I've cleaned that up (left it as a separate commit for review-- although before merging I'll squash it). I did test that its still fixing my bug-- and it is, so I think we are set :)

atsci · 2016-10-26T20:07:51Z

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1089/ for details.

atsci · 2016-10-26T20:08:59Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/981/ for details.

… VConnection Before if the vcon wasn't read or write enabled errors would be swallowed. This leads to a variety of issues where the socket errors aren't dealt with immediately.

atsci · 2016-10-26T20:27:49Z

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1090/ for details.

atsci · 2016-10-26T20:28:49Z

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/982/ for details.

jacksontj · 2016-10-26T22:41:50Z

An update to summarize updates from today:

After looking into it the core issue with the crash I was seeing is that the read/write side of the VIOs where being called regardless of which side the error came in on. Really we should handle the error on the appropriate side (in or out)-- so instead of allowing us to do the read (for example) when read OR error, this PR changes it so that we only call that routine if it was on the read side. Then within that read handler we can check if there was an error.

Secondly, instead of trying to unset the error state in the handler, I'm simply just setting it every time we get into the read/write blocks to the appropriate values.

So, the PR is now updated (with the patch I've tested) and ready for merge!

shinrich · 2016-10-26T23:46:22Z

Looks good to me!

After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.

After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.

After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.

After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont. (cherry picked from commit aee3f3b)

…s to the VConnection" this reverts PRs apache#1559, apache#1522 and apache#947 This reverts commit c1ac5f8.

…s to the VConnection" this reverts PRs #1559, #1522 and #947 This reverts commit c1ac5f8.

This reverts PRs apache#1559, apache#1522 and apache#947 PR apache#947 made the HTTP state machine unstable and lead to crashes in production like apache#1930 apache#1559 apache#1522 apache#1531 apache#1629 This reverts commit c1ac5f8.

This reverts PRs #1559, #1522 and #947 PR #947 made the HTTP state machine unstable and lead to crashes in production like #1930 #1559 #1522 #1531 #1629 This reverts commit c1ac5f8.

jacksontj force-pushed the TS-4796 branch from 5b032ba to ba77985 Compare August 31, 2016 20:09

zwoop added the Network label Aug 31, 2016

zwoop added this to the 7.0.0 milestone Aug 31, 2016

zwoop assigned shinrich Aug 31, 2016

jacksontj force-pushed the TS-4796 branch 2 times, most recently from 80156d5 to 2c76cd2 Compare September 1, 2016 01:42

jacksontj force-pushed the TS-4796 branch from 2c76cd2 to 8038f8e Compare September 1, 2016 05:11

jacksontj force-pushed the TS-4796 branch from 8038f8e to 3bd0fff Compare September 3, 2016 17:50

oknet reviewed Sep 5, 2016
View reviewed changes

jacksontj force-pushed the TS-4796 branch 2 times, most recently from 73933e9 to 84934a8 Compare September 6, 2016 19:04

oknet reviewed Sep 7, 2016
View reviewed changes

jacksontj force-pushed the TS-4796 branch from 8d51ba3 to 9b119c2 Compare October 26, 2016 19:55

TS-4796 Change UnixNetHandler to always bubble up epoll errors to the…

a46c246

… VConnection Before if the vcon wasn't read or write enabled errors would be swallowed. This leads to a variety of issues where the socket errors aren't dealt with immediately.

jacksontj force-pushed the TS-4796 branch from 9b119c2 to a46c246 Compare October 26, 2016 20:12

jacksontj merged commit c1ac5f8 into apache:master Oct 26, 2016

This was referenced Mar 3, 2017

Ignore read and write errors if vio has been cleared #1522

Merged

Assertion in state_read_server_response_header (v7.1.0) #1531

Closed

oknet mentioned this pull request Mar 9, 2017

Do not callback one SM EVENT_ERROR twice. #1559

Merged

oknet mentioned this pull request Apr 5, 2017

Assertion in state_read_server_response_header (v7.1.0) #1629

Closed

oknet mentioned this pull request Jun 1, 2017

WRITE_COMPLETE event in http server open causes traffic server to abort #1930

Closed

zwoop mentioned this pull request Jun 6, 2017

Fix post and post transform retry #2077

Closed

vmamidi added a commit to vmamidi/trafficserver that referenced this pull request Jun 8, 2017

Revert "TS-4796 Change UnixNetHandler to always bubble up epoll error…

92102b7

…s to the VConnection" this reverts PRs apache#1559, apache#1522 and apache#947 This reverts commit c1ac5f8.

vmamidi mentioned this pull request Jun 8, 2017

Revert PR #947 #2093

Merged

vmamidi added a commit that referenced this pull request Jun 8, 2017

Revert "TS-4796 Change UnixNetHandler to always bubble up epoll error…

4d9279c

…s to the VConnection" this reverts PRs #1559, #1522 and #947 This reverts commit c1ac5f8.

vmamidi mentioned this pull request Jun 8, 2017

Revert PR #947 - merge to 7.1.x #2103

Merged

zwoop pushed a commit that referenced this pull request Jun 8, 2017

Revert PR #947 - merge to 7.1.x

65a001b

This reverts PRs #1559, #1522 and #947 PR #947 made the HTTP state machine unstable and lead to crashes in production like #1930 #1559 #1522 #1531 #1629 This reverts commit c1ac5f8.

zizhong mentioned this pull request Mar 14, 2018

Reopen TS-4796 #3288

Closed

zizhong mentioned this pull request Mar 22, 2018

Adding proxy.config.http.allow_half_open #3325

Merged

TS-4796 Change UnixNetHandler to always bubble up epoll errors to the VConnection #947

TS-4796 Change UnixNetHandler to always bubble up epoll errors to the VConnection #947

Uh oh!

Conversation

jacksontj commented Aug 31, 2016

Uh oh!

atsci commented Sep 1, 2016

Uh oh!

atsci commented Sep 1, 2016

Uh oh!

atsci commented Sep 1, 2016

Uh oh!

atsci commented Sep 1, 2016

Uh oh!

SolidWallOfCode commented Sep 1, 2016

Uh oh!

jpeach commented Sep 1, 2016

Uh oh!

jpeach commented Sep 1, 2016

Uh oh!

oknet commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oknet commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacksontj commented Sep 2, 2016

Uh oh!

oknet commented Sep 2, 2016

Uh oh!

oknet commented Sep 3, 2016

Uh oh!

jacksontj commented Sep 3, 2016

Uh oh!

atsci commented Sep 3, 2016

Uh oh!

atsci commented Sep 3, 2016

Uh oh!

oknet Sep 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacksontj Sep 6, 2016

Choose a reason for hiding this comment

Uh oh!

atsci commented Sep 6, 2016

Uh oh!

atsci commented Sep 6, 2016

Uh oh!

atsci commented Sep 6, 2016

Uh oh!

oknet Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

oknet Sep 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacksontj Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

atsci commented Oct 26, 2016

Uh oh!

atsci commented Oct 26, 2016

Uh oh!

atsci commented Oct 26, 2016

Uh oh!

atsci commented Oct 26, 2016

Uh oh!

jacksontj commented Oct 26, 2016

Uh oh!

shinrich commented Oct 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

oknet commented Sep 2, 2016 •

edited

Loading

oknet commented Sep 2, 2016 •

edited

Loading

oknet Sep 5, 2016 •

edited

Loading

oknet Sep 7, 2016 •

edited

Loading