Skip to content

Conversation

@jacksontj
Copy link
Contributor

Before if the vcon wasn't read or write enabled errors would be swallowed. This leads to a variety of issues where the socket errors aren't dealt with immediately.

@zwoop zwoop added the Network label Aug 31, 2016
@zwoop zwoop added this to the 7.0.0 milestone Aug 31, 2016
@jacksontj jacksontj force-pushed the TS-4796 branch 2 times, most recently from 80156d5 to 2c76cd2 Compare September 1, 2016 01:42
@atsci
Copy link

atsci commented Sep 1, 2016

FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/659/ for details.

@atsci
Copy link

atsci commented Sep 1, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/554/ for details.

@atsci
Copy link

atsci commented Sep 1, 2016

FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/661/ for details.

@atsci
Copy link

atsci commented Sep 1, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/556/ for details.

@SolidWallOfCode
Copy link
Member

Seems reasonable. I can't think why the errors shouldn't bubble up. We may need to be a bit careful on handling them though.

@jpeach
Copy link
Contributor

jpeach commented Sep 1, 2016

I don't think this is the right approach.

First, read_signal_error and write_signal_error don't need to be made non-static since you already have UnixNetVConnection::readSignalError().

Second, the err argument to read_signal_error is supposed to be an errno, but you are passing and eventio bitmask. That's not really what is supposed to happen here.

Third, the logic in NetHandler::mainNetEvent is now weird. There is a subsequent check for get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR), but the error case would never be true now.

Finally, does this change mean that VC_EVENT_EOS is now delivered as VC_EVENT_ERROR? The EVENTIO_ERROR bitmask includes the HUP event, which I suspect means that the socket was closed. I think the expectation is that this would be detected in read_from_net, but now it looks like you would get an error event.

@jpeach
Copy link
Contributor

jpeach commented Sep 1, 2016

/cc @oknet

@oknet
Copy link
Member

oknet commented Sep 2, 2016

EVENTIO_ERROR means EPOLLHUP | EPOLLERR | EPOLLPRI.

EPOLLPRI means OOB or TCP URG is set. You will always receive EPOLLPRI with EPOLLIN.
To EPOLLPRI, we need handle READ first and ignore EPOLLPRI, this is what NetHandler does.

EPOLLHUP & EPOLLRDHUP
reference : http://stackoverflow.com/questions/8707458/epoll-and-remote-1-way-shutdown

A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN)
event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily
mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and
eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote
peer only did a half-close of its socket.

The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket
with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent
instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or
EPOLLERR. (I haven't had time to confirm, but it makes sense).

There is some documentation to suggest there are older Linux implementations that don't support
EPOLLRDHUP, as such EPOLLHUP gets generated instead.

And for what it is worth, in my particular case, I found that it is not too interesting to have code that
special cases EPOLLHUP or EPOLLRDHUP events. Instead, just treat these events the same as
EPOLLIN/EPOLLOUT and call recv() (or send() as appropriate). But pay close attention to return codes
returned back from recv() and send().

EPOLLERR means the possible non-fatal errors on socket fd such as EAGAIN, EINTR, EWOULDBLOCK and fatal errors.

When you receive EPLLERR, it means an error of socketfd and also there may be data before this error. Therefore we should call read() and write() to figure out the actual meanning of this error

Currently, NetHandler try to perform read() & write() on the socket fd first.
We will get non-fatal errors or fatal erros from read() or write().
if it is non-fatal error, just put socket fd into wait list.
if it is fatal error, signal SM to close socket fd.
e.g.
There is a Fatal ERROR if EPIPE is returned from write().
If "0" is returned from read() there is EOF.

So the currently implement of NetHandler is enough to handle all of this and doesn't need to change.

@oknet
Copy link
Member

oknet commented Sep 2, 2016

The InactivityCop will handle those NetVCs that read or write disabled.
InactivityCop will send TIMEOUT to SM or call close_UnixNetVConnection() to close NetVC directly depend on the last call to do_io_read and do_io_write.

To disable read and write but still handle Timeout Event:
do_io_read(sm, 0, NULL) and do_io_write(sm, 0, NULL)
In this way, SM is able to receive the TIMEOUT from InactitiryCop then re-set timeout or close NetVC.

To disable read and write and ignore Timeout Event:
do_io_read(NULL, 0, NULL) and do_io_write(NULL, 0, NULL)
The SM will not receive any Event related to the NetVC and InactivityCop will close NetVC while timeout.

Note:
The InactivityCop try to send TIMEOUT to read.vio._cont first then write.vio._cont if NetVC not closed. The NetVC closed immediately if vio._cont is NULL, otherwise depend on the result of SM

@jacksontj
Copy link
Contributor Author

The behavior I see on master (without this patch) is that ATS doesn't close the session when getting the RST. From digging that UnixNetHandler gets an EPOLLERR -- which attempts to add it to the write_ready queue, but since the vc isn't enabled for writing the HTTPTunnel isn't ever called to do the write.

So from your last 2 comments it sounds like a more correct approach would be to immediately schedule a read/write to the socket -- to determine if there was in fact an error.

It also sounds like you are suggesting that inactivity cop should get these sessions? In my tests I see that these sessions are either killed by the next attempt to write to the closed socket (when the origin sends more bytes later) or when the transaction hits the max inactivity timeout. Neither of these behaviors are what we want-- since I already got an RST from the client. So it seems that we need to somehow force-schedule a read/write in these error conditions-- do you have any pointers on how to do that?

@oknet
Copy link
Member

oknet commented Sep 2, 2016

according your description, the HttpTunnel transfers the data from server session to client session, ATS received a RST from client and the connection between ATS and origin server is still alive.

In your scenario, server session is the producer of HttpTunnel and client session is a consumer of HttpTunnel and cache session is the 2nd consumer if cache enabled.

The HttpTunnel will not break if one consumer failed.

The producer will re-enable all consumers if received READ_READY event.
The consumer will re-enable producer if received WRITE_READY event.

The producer is master and all consumers are slave. let the master trigger slaves and isolate the broken slave.

Add a netvc into write_ready queue means there is non-fatal error(ex EAGAIN) at last write() call. It means write is enabled on the netvc.

Only the returned errno from read() and write() is trustable.

To close a netvc immediately if vc->read.vio._cont and vc->write.vio._cont both are NULL.
Otherwise should callback to SM.

@oknet
Copy link
Member

oknet commented Sep 3, 2016

comments for codes:

      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
        // ** set read.triggered if a epoll event of net has EVENTIO_READ or EVENTIO_ERROR bit set.
        vc->read.triggered = 1;
        // ** and put vc in read_ready_list
        if (!read_ready_list.in(vc)) {
          read_ready_list.enqueue(vc);
        } else if (get_ev_events(pd, x) & EVENTIO_ERROR) {
          // ** output an error message if a netvc already in ready_list got EVENTIO_ERROR.
          // check for unhandled epoll events that should be handled
          Debug("iocore_net_main", "Unhandled epoll event on read: 0x%04x read.enabled=%d closed=%d read.netready_queue=%d",
                get_ev_events(pd, x), vc->read.enabled, vc->closed, read_ready_list.in(vc));
        }
      }

The netvc always put in read_ready_list if it has EVENTIO_READ or EVENTIO_ERROR.

#if defined(USE_EDGE_TRIGGER)
  // UnixNetVConnection *
  // ** for each netvc in read_ready_list
  while ((vc = read_ready_list.dequeue())) {
    if (vc->closed) // ** if the netvc mark closed
      close_UnixNetVConnection(vc, trigger_event->ethread);
    else if (vc->read.enabled && vc->read.triggered) //** if the netvc is enabled and triggered
      vc->net_read_io(this, trigger_event->ethread);
    else if (!vc->read.enabled) { //** if the netvc is not enabled
      read_ready_list.remove(vc);
#if defined(solaris)
      if (vc->read.triggered && vc->write.enabled) {
        vc->ep.modify(-EVENTIO_READ);
        vc->ep.refresh(EVENTIO_WRITE);
        vc->writeReschedule(this);
      }   
#endif
    }   
  }

for your case, read.enabled is 0

    else if (!vc->read.enabled) { // if the netvc is not enabled
      read_ready_list.remove(vc);

The vc is removed from read_ready_list.

The below is my suggest:

      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
        vc->read.triggered = 1;
+      if (get_ev_events(pd, x) & EVENTIO_ERROR) {
+        vc->read.error = 1;
+      }
        if (!read_ready_list.in(vc)) {
          read_ready_list.enqueue(vc);
        } else if (get_ev_events(pd, x) & EVENTIO_ERROR) {

and

    else if (vc->read.enabled && vc->read.triggered) //** if the netvc is enabled and triggered
      vc->net_read_io(this, trigger_event->ethread);
+  else if (vc->read.error) {
+    int err = 0, errlen = sizeof(int);
+    if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &err, &errlen) == -1) {
+      err = errno;
+    }
+    if (err != EAGAIN && err != EINTR)
+      vc->readSignalError(this, err);
+  }
    else if (!vc->read.enabled) { //** if the netvc is not enabled
      read_ready_list.remove(vc);
#if defined(solaris)
      if (vc->read.triggered && vc->write.enabled) {
        vc->ep.modify(-EVENTIO_READ);
        vc->ep.refresh(EVENTIO_WRITE);

@jacksontj
Copy link
Contributor Author

That all sounds reasonable :) I just pushed a new commit here which does effectively what you are suggesting-- just both on the read and write side (as well as the few other little changes to make it work). I tested it and this covers my use-case (since we are still bubbling the error up when we aren't enabled) and should continue functioning the same for all the rest of the cases (since left them alone).

@atsci
Copy link

atsci commented Sep 3, 2016

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/691/ for details.

@atsci
Copy link

atsci commented Sep 3, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/587/ for details.

}
if (err != EAGAIN && err != EINTR)
vc->writeSignalError(this, err);
} else if (!vc->write.enabled) {
Copy link
Member

@oknet oknet Sep 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset read.error or write.error if non-fatal error.
keep read.enabled or write.enabled check.
suggest code:

    else {
      if (vc->write.error) {
      ...
        if (err && err != EAGAIN && err != EINTR)
          vc->writeSignalError(this, err);
        else
          vc->write.error = 0;
      }
      if (!vc->write.enabled) {
      ...
      }
    }

the same to READ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@jacksontj jacksontj force-pushed the TS-4796 branch 2 times, most recently from 73933e9 to 84934a8 Compare September 6, 2016 19:04
@atsci
Copy link

atsci commented Sep 6, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/606/ for details.

@atsci
Copy link

atsci commented Sep 6, 2016

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/711/ for details.

@atsci
Copy link

atsci commented Sep 6, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/607/ for details.

err = errno;
}
if (err != EAGAIN && err != EINTR) {
vc->readSignalError(this, err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistaken here. We should get vio mutex locked first before callback to SM. the new suggest :

else if (vc->read.enabled && vc->read.triggered || vc->read.error)

move "else if (vc->read.error) { ... }" into "vc->net_read_io()"

we can not access any member of NetVC ( e.g. vc->con.fd ) before get vio mutex locked.

Copy link
Member

@oknet oknet Sep 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static void
read_from_net(NetHandler *nh, UnixNetVConnection *vc, EThread *thread)
{
...
  if (vc->closed) {
    close_UnixNetVConnection(vc, thread);
    return;
  }
+  // if it is not enabled and got error from polling
+  if (!s->enabled && s->error) {
+    int err = 0, errlen = sizeof(int);
+    if (getsockopt(vc->con.fd, SOL_SOCKET, SO_ERROR, &err, (socklen_t *)&errlen) == -1) {
+      err = errno;
+    }
+    if (err != EAGAIN && err != EINTR) {
+      read_signal_error(this, vc, err);
+    }
+    return;
+  }
  // if it is not enabled.
  if (!s->enabled || s->vio.op != VIO::READ) {
    read_disable(nh, vc);
    return;
  }
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense :) I've cleaned that up (left it as a separate commit for review-- although before merging I'll squash it). I did test that its still fixing my bug-- and it is, so I think we are set :)

@atsci
Copy link

atsci commented Oct 26, 2016

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1089/ for details.

@atsci
Copy link

atsci commented Oct 26, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/981/ for details.

… VConnection

Before if the vcon wasn't read or write enabled errors would be swallowed. This leads to a variety of issues where the socket errors aren't dealt with immediately.
@atsci
Copy link

atsci commented Oct 26, 2016

FreeBSD build successful! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1090/ for details.

@atsci
Copy link

atsci commented Oct 26, 2016

Linux build successful! See https://ci.trafficserver.apache.org/job/Github-Linux/982/ for details.

@jacksontj
Copy link
Contributor Author

An update to summarize updates from today:

After looking into it the core issue with the crash I was seeing is that the read/write side of the VIOs where being called regardless of which side the error came in on. Really we should handle the error on the appropriate side (in or out)-- so instead of allowing us to do the read (for example) when read OR error, this PR changes it so that we only call that routine if it was on the read side. Then within that read handler we can check if there was an error.

Secondly, instead of trying to unset the error state in the handler, I'm simply just setting it every time we get into the read/write blocks to the appropriate values.

So, the PR is now updated (with the patch I've tested) and ready for merge!

@shinrich
Copy link
Member

Looks good to me!

@jacksontj jacksontj merged commit c1ac5f8 into apache:master Oct 26, 2016
oknet added a commit to oknet/trafficserver that referenced this pull request Mar 9, 2017
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by
EPOLLERR will be sent to read.vio._cont first and then write.vio._cont.

The reader SM could close or shutdown(WRITE) the VC, but we do not check
these operations before we callback write.vio._cont. The SM would
received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
oknet added a commit to oknet/trafficserver that referenced this pull request Mar 9, 2017
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by
EPOLLERR will be sent to read.vio._cont first and then write.vio._cont.

The reader SM could close or shutdown(WRITE) the VC, but we do not check
these operations before we callback write.vio._cont. The SM would
received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
oknet added a commit to oknet/trafficserver that referenced this pull request Mar 9, 2017
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by
EPOLLERR will be sent to read.vio._cont first and then write.vio._cont.

The reader SM could close or shutdown(WRITE) the VC, but we do not check
these operations before we callback write.vio._cont. The SM would
received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
oknet added a commit to oknet/trafficserver that referenced this pull request Mar 9, 2017
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR which caused by
EPOLLERR will be sent to read.vio._cont first and then write.vio._cont.

The reader SM could close or shutdown(WRITE) the VC, but we do not check
these operations before we callback write.vio._cont. The SM would
received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
zwoop pushed a commit that referenced this pull request Mar 9, 2017
After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by
EPOLLERR will be sent to read.vio._cont first and then write.vio._cont.

The reader SM could close or shutdown(WRITE) the VC, but we do not check
these operations before we callback write.vio._cont. The SM would
received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
zwoop pushed a commit that referenced this pull request Mar 9, 2017
After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by
EPOLLERR will be sent to read.vio._cont first and then write.vio._cont.

The reader SM could close or shutdown(WRITE) the VC, but we do not check
these operations before we callback write.vio._cont. The SM would
received EVENT_ERROR twice if write.vio._cont == read.vio._cont.

(cherry picked from commit aee3f3b)
vmamidi added a commit to vmamidi/trafficserver that referenced this pull request Jun 8, 2017
@vmamidi vmamidi mentioned this pull request Jun 8, 2017
vmamidi added a commit that referenced this pull request Jun 8, 2017
…s to the VConnection"

this reverts PRs #1559, #1522 and #947

This reverts commit c1ac5f8.
vmamidi added a commit to vmamidi/trafficserver that referenced this pull request Jun 8, 2017
This reverts PRs apache#1559, apache#1522 and apache#947

PR apache#947 made the HTTP state machine unstable and lead to crashes in production like apache#1930 apache#1559 apache#1522 apache#1531 apache#1629

This reverts commit c1ac5f8.
zwoop pushed a commit that referenced this pull request Jun 8, 2017
This reverts PRs #1559, #1522 and #947

PR #947 made the HTTP state machine unstable and lead to crashes in production like #1930 #1559 #1522 #1531 #1629

This reverts commit c1ac5f8.
@zizhong zizhong mentioned this pull request Mar 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants