-
Notifications
You must be signed in to change notification settings - Fork 178
mv iter close fix
- merged to master
- code complete - June 18, 2014
- development started - June 17, 2014
This is an eleveldb branch fixing problems in basho/eleveldb repository.
The primary goal of the mv-tuning7 branch in the basho/eleveldb repository was to fix a segfault problem caused by keeping a C++ class object on the Erlang heap. This goal is achieved in the branch. A secondary goal was to make the database close and interator close APIs asynchronous. The code for the secondary goal was close, but contained a race condition that could lead to new segfaults. The mv-iter-close-fix corrects the asynchronous race conditions and adds a few defensive measures.
The essential changes are in two functions of c_src/refobjects.cc: ErlRefObject::InitiateCloseRequest() and ErlRefObject::RefDec(). ErlRefObject is a base class to the key DbObject and ItrObject classes. These two functions coordinate the destruction of the classes knowing that often there are two threads simultaneously releasing the object for destruction. InitiateCloseRequest() is called by the thread that must wait until all other threads / users of the object are done. It is typically called by a "C" thread, but it could be called by the Erlang garbage collector thread. The last user of the object will call RefDec() which detects it is the last user and executes some special logic to inform InitiateCloseRequest() that all users of the object are done.
The underlying concept of InitiateCloseRequest() and RefDec() was correct. There were some details that were wrong and could lead to two delete calls for the same object. Bad!
-
InitiateCloseRequest() could skip its pthread_cond_wait() call if the thread in RefDec() had decremented the reference counter but not yet entered the mutex protected zone. The result could have InitiateCloseRequest() calling RefDec() at the end of the routine AND the other thread causing another delete through its RefInc()/RefDec() sequence. The change to the "if" statement now verifies both the proper outstanding reference count AND whether the last user thread has completed the pthread_cond_broadcast() call.
-
RefDec() saves the count of references after its atomic decrement. Then it enters a mutex protected zone to send the pthread_cond_broadcast(). The actual reference count needed within mutex protected zone is the reference count at that moment, not the count from sometime in the past (before receiving the mutex).
The following changes reflect incremental work that hardened the code, but did not address the primary problem.
ErlRefObject::ClaimCloseFromCThread() tightened the window for a known race condition that might lead to a segfault read. The condition still exists but now is even harder to reach. The race condition is discussed in detail here:
This routine limits the number of threads that could possibly attempt to access the Erlang memory. Only the first C thread to call this routine will also test Erlang memory. Previously all C threads vying to close the object could test the Erlang memory. Some might test really late after Erlang had garbage collected. Now only the first will test and it should always precede Erlang unless this is a process cleanup. And then Erlang's garbage collection thread is likely blocked wait for all user threads anyway.
All code locations in refobjects.cc and eleveldb.cc that previously attempted the atomic swap within Erlang memory now use this routine instead.
enif_free_env(itr_ref_env) was one of the locations that would segfault due to the real problem of objects being deleted twice. itr_ref_env's use was hardened during the problem research. It is now properly initialized at object creation to NULL. It is now released earlier in its usage cycle for safety. It is doubtful that either itr_ref_env change is essential.
The asynchronous close of an iterator object would previously await direct confirmation that leveldb had finished the operation (and any pending read-ahead iterations had completed). There is actually no benefit to waiting for the leveldb to finish. The worst case scenario is that the iterator is still pending close when the database close operation starts. Here, the asynchronous database close code knows to let all pending iterators close before starting the leveldb actions to close the database. The worst case scenario is covered and all other scenario are irrelevant.
The two source files were changed to let Erlang continue processing after posting the iterator close request. No message is now sent upon actual leveldb completion of the iterator close.