Skip to content

Conversation

@shinrich
Copy link
Member

This stack as been showing up in our environment intermittently for a while.

#0  0x00000000006ae0c6 in HostDBContinuation::do_dns() () at HostDB.cc:1835
#1  0x00000000006b09e0 in HostDBContinuation::probeEvent(int, Event*) () at HostDB.cc:1757
#2  0x00000000006b0c08 in HostDBContinuation::dnsPendingEvent(int, Event*) () at HostDB.cc:1183
#3  0x00000000007c3f3c in handleEvent (data=0x2b36790007e0, event=1, this=0x2b3b17d8c800) at I_Continuation.h:182
#4  EThread::process_event(Event*, int) () at UnixEThread.cc:136
#5  0x00000000007c50ac in EThread::execute_regular (this=0x2b3669d95440) at UnixEThread.cc:250
#6  0x00000000007c3932 in spawn_thread_internal (a=0x2b366801efc0) at Thread.cc:85
#7  0x00002b3666627dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00002b366735cead in clone () from /lib64/libc.so.6

In this case, the HostDBContinuation->mutex is NULL this causing the crash in do_dns. The handler is set to probeEvent(). I see no place were the hostDBContinuation is created but init() is not called (which sets the mutex). However when the HostDBContinuation is freed, the mutex is nulled.

Ultimately, I think the use-after-free issue is caused by the combination of these two lines.

remove_trigger_pending_dns();
hostdb_cont_free(this);

remove_trigger_pending_dns, walks the pending list and sends a schedule_imm so it can process the result on the invoking thread. However, based on my search of the code, particularly set_check_pending_dns, the current object (this) is on the list. So we just scheduled an event to process that object and then delete the object before the immediate event can be processed (most likely).

The code change removes the extra host_db_cont_free and schedule.

@shinrich shinrich added this to the 10.0.0 milestone Sep 20, 2019
@shinrich shinrich self-assigned this Sep 20, 2019
@shinrich
Copy link
Member Author

[approve ci autest]

@zwoop zwoop added the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Oct 17, 2019
Copy link
Contributor

@calavera calavera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been onDocs for three months, anything missing to merge it @zwoop ?

@zwoop
Copy link
Contributor

zwoop commented Feb 26, 2020

Yeh, I'm ok with this. @shinrich Should we merge this ?

@shinrich
Copy link
Member Author

I have a few more tweaks in the hostdb area, but as I recall this part is fine. Will make another PR later with a fuller cross reference of various versions.

@shinrich shinrich merged commit f27f90d into apache:master Feb 27, 2020
@zwoop
Copy link
Contributor

zwoop commented Feb 27, 2020

Cherry-picked to v9.0.x branch.

@zwoop zwoop removed the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Feb 27, 2020
@zwoop zwoop modified the milestones: 10.0.0, 9.0.0 Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants