Skip to content

Conversation

@oknet
Copy link
Member

@oknet oknet commented Apr 15, 2017

We met a crash in PluginVC

(gdb) bt full
#0  0x00002b29f722cdf0 in ?? ()
No symbol table info available.
#1  0x00002b29ef4ac3fb in PluginVC::process_close (this=0x2b2a19def180) at PluginVC.cc:700
        __FUNCTION__ = "process_close"
#2  0x00002b29ef4acc10 in PluginVC::main_handler (this=0x2b2a19def180, event=<optimized out>, data=0x2b2a1c13ecc0) at PluginVC.cc:216
        __FUNCTION__ = "main_handler"
        my_ethread = <optimized out>
        call_event = 0x2b2a1c13ecc0
        read_mutex_held = true
        write_mutex_held = true
        read_side_mutex = {m_ptr = 0x2b2a1c1449a0}
        write_side_mutex = {m_ptr = 0x2b2a1c1449a0}
#3  0x00002b29ef792680 in handleEvent (data=0x2b2a1c13ecc0, event=2, this=<optimized out>) at I_Continuation.h:146
No locals.
#4  EThread::process_event (this=this@entry=0x2b29f7132050, e=e@entry=0x2b2a1c13ecc0, calling_code=2) at UnixEThread.cc:130
        c_temp = <optimized out>
        lock = {m = {m_ptr = 0x2b2a1c1449f0}, lock_acquired = true}
#5  0x00002b29ef793283 in EThread::execute (this=0x2b29f7132050) at UnixEThread.cc:212
        done_one = true
        e = 0x2b2a1c13ecc0
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x2b29f722dd50}, tail = 0x2b29f722d090}
        next_time = 1492131740118840109
#6  0x00002b29ef79196a in spawn_thread_internal (a=0x2b29f711efd0) at Thread.cc:85
        p = 0x2b29f711efd0
#7  0x00002b29f37cfb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00002b29f351a30d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

the local side and other side share one sm_lock_retry_event

(gdb) p this
$1 = (PluginVC * const) 0x2b2a19def180
(gdb) p this->sm_lock_retry_event
$2 = (Event *) 0x2b29f722c7f0
(gdb) p other_side->sm_lock_retry_event
$3 = (Event *) 0x2b29f722c7f0

And the event was cancelled

(gdb) p *(Event *) 0x2b29f722c7f0
$4 = {<Action> = {_vptr.Action = 0x2b29f722c1f0, continuation = 0x2b2a19def180, mutex = {m_ptr = 0x0}, cancelled = 1}, 
  ethread = 0x2b29f7132050, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 0, in_heap = 0, 
  callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

The callback event is inactive_event

(gdb) p inactive_event
$5 = (Event *) 0x0
(gdb) p *call_event
$6 = {<Action> = {_vptr.Action = 0x2b29efabd2b0, continuation = 0x2b2a19def180, mutex = {m_ptr = 0x2b2a1c1449f0}, cancelled = 1}, 
  ethread = 0x2b29f7132050, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 0, globally_allocated = 1, in_heap = 0, 
  callback_event = 2, timeout_at = 1492131902133937916, period = 1000000000, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, 
    prev = 0x0}}

To reproduce it with heavy load and make request to stats_over_http plugin then wait for inactive_timeout.

@oknet oknet self-assigned this Apr 15, 2017
@mingzym
Copy link
Member

mingzym commented Apr 15, 2017

this pull may or may not fix the very old bug: https://issues.apache.org/jira/browse/TS-2462 hopes this will lead us a good future of PluginVC.

@oknet oknet changed the title Fix a crash in PluginVC, reschedule other_side with core_lock_retry_event instead sm_lock Fix a crash in PluginVC, reschedule other_side with core_lock_retry_event instead of sm_lock Apr 15, 2017
@atsci
Copy link

atsci commented Apr 15, 2017

RAT check successful! https://ci.trafficserver.apache.org/job/RAT-github/215/

@atsci
Copy link

atsci commented Apr 15, 2017

@atsci
Copy link

atsci commented Apr 15, 2017

clang-analyzer build successful! https://ci.trafficserver.apache.org/job/clang-analyzer-github/458/

@atsci
Copy link

atsci commented Apr 15, 2017

@atsci
Copy link

atsci commented Apr 15, 2017

FreeBSD11 build successful! https://ci.trafficserver.apache.org/job/freebsd-github/1896/

@atsci
Copy link

atsci commented Apr 15, 2017

Intel CC build successful! https://ci.trafficserver.apache.org/job/icc-github/326/

@atsci
Copy link

atsci commented Apr 15, 2017

Linux build successful! https://ci.trafficserver.apache.org/job/linux-github/1787/

@atsci
Copy link

atsci commented Apr 15, 2017

clang-analyzer build successful! https://ci.trafficserver.apache.org/job/clang-analyzer-github/459/

@oknet
Copy link
Member Author

oknet commented Apr 15, 2017

Consider this is a better fix for TS-3235 ( PR #164 )

@oknet oknet requested a review from bryancall April 15, 2017 05:28
@zwoop
Copy link
Contributor

zwoop commented Apr 18, 2017

For issues on Jira, that are relevant, if you can make new Github Issues instead, that would be super helpful. Not required, but it really makes things easier to track (since Jira is now read-only).

@zwoop zwoop added this to the 7.2.0 milestone Apr 18, 2017
@zwoop
Copy link
Contributor

zwoop commented Apr 18, 2017

@oknet Is this a candidate for v7.1.x ?

@oknet
Copy link
Member Author

oknet commented Apr 19, 2017

@zwoop Is there any complains about the Interception ? Please backport this to 7.1.x if the Interception is widely used.

@zwoop zwoop modified the milestones: 7.2.0, 8.0.0 Apr 25, 2017
@zwoop
Copy link
Contributor

zwoop commented Apr 27, 2017

Shall we land this?

@zwoop zwoop modified the milestones: 8.0.0, 7.1.0 Apr 28, 2017
@oknet
Copy link
Member Author

oknet commented Apr 28, 2017 via email

@zwoop zwoop added the WIP label Apr 28, 2017
@zwoop
Copy link
Contributor

zwoop commented Apr 28, 2017

Marking as WIP, please keep this updated, so we know where we're taking this.

@zwoop zwoop modified the milestones: 7.1.0, 7.1.1 May 4, 2017
@PSUdaemon
Copy link
Contributor

[approve ci]

@oknet
Copy link
Member Author

oknet commented Jun 1, 2017

This is not a bug, the other_side_call == true means it is called from other side and has mutex locked already.

@oknet oknet closed this Jun 1, 2017
@zwoop zwoop modified the milestone: 7.1.1 Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants