Skip to content

Conversation

@oknet
Copy link
Member

@oknet oknet commented Jan 7, 2019

In general, NetAccept::action_->continuation is a type of ProtocolProbeSessionAccept object.

The mutex of ProtocolProbeSessionAccept is NULL to allow parallel accepts.

Resolve issue #4726.

…ENT_ERROR event is called back.

In general, `NetAccept::action_->continuation` is a type of
`ProtocolProbeSessionAccept` object.

The mutex of `ProtocolProbeSessionAccept` is NULL to allow parallel
accepts.

Resolve issue apache#4726.
@scw00
Copy link
Member

scw00 commented Jan 7, 2019

How to process EVENT_ERROR in ProtocolProbeSessionAccept.

It still crash .

@oknet
Copy link
Member Author

oknet commented Jan 7, 2019

@scw00

If errno returns from accept(2) neither EAGAIN nor 'EINTR`, it means an error that cannot be recovered.

In the situation, we must abort the ATS process, and output the errno into log file to help system admin find the cause of the problem.

@oknet
Copy link
Member Author

oknet commented Jan 7, 2019

@scw00 @masaori335

I guess the value of errno might be:

  • EMFILE (24)
    • The per-process limit of open file descriptors has been reached.
  • ENFILE (23)
    • The system limit on the total number of open files has been reached.
  • ENOMEM (12)
    • Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory.
  • EBADF (9)
    • The descriptor is invalid.

Copy link
Member

@scw00 scw00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@scw00
Copy link
Member

scw00 commented Jan 7, 2019

Thanks for your explain.

@masaori335 masaori335 requested a review from vmamidi January 7, 2019 10:29
@zwoop
Copy link
Contributor

zwoop commented Jan 7, 2019

So, do we think this still fixes the crashes in #4726 ?

@scw00
Copy link
Member

scw00 commented Jan 8, 2019

Yep, we need to catch the errno to figure out what happen during crashing. Some code can not be recover like ENFILE 、 EMFILE and ENOMEM because it is memory leaking.

@oknet
Copy link
Member Author

oknet commented Jan 8, 2019

@zwoop With this PR, ATS still crash (ink_abort) on L190 of ProtocolProbeSessionAccept.cc.

190   ink_abort("Protocol probe received a fatal error: errno = %d", -((int)(intptr_t)data));

We will get errno from the abort message and find out the cause of the problem.

@oknet oknet merged commit 732dd20 into apache:master Jan 9, 2019
@bryancall bryancall modified the milestones: 9.0.0, 8.1.0 Mar 27, 2019
@bryancall
Copy link
Contributor

Cherry picked to 8.1.0

@zwoop zwoop modified the milestones: 8.1.0, 8.1.0-nogo Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants