-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMDB - fix integration, restoring ability of use lmdb with nginx-modsecurity #2688
LMDB - fix integration, restoring ability of use lmdb with nginx-modsecurity #2688
Conversation
Thanks, @ziollek , for the submission (and others who have reviewed it). I'll add this to v3.0.7 project. |
Thanks @martinhsv, I saw that static-check fails because of some warning for modsecurity.cc. |
src/collection/backend/lmdb.cc
Outdated
|
||
MDBEnvProvider::MDBEnvProvider() : | ||
m_env(NULL), initialized(false) { | ||
pthread_mutex_init(&m_lock, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if the call to pthread_mutex_init might be better inside the if-block at line 538 in init().
As it is, it looks like it might at least be theoretically possible to have a thread-safety issue here where pthread_mutex_init could get called on an already-initialized m_lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, m_lock mutex is responsible for controlling access to flag: initialized and preventing calling mdb_env_create multiple times. If we moved it into if-block it wouldn't fulfill its aim. It's worth noticing - that the init() function is designed to be called from multiple threads simultaneously.
To my mind, it's not possible to call multiple pthread_mutex_init on the same structure via current implementation (but of course, I'm not an expert ;) ). Even if we create a few new instances directly through the constructor: new MDBEnvProvider(), there will be an independent m_lock structure in each instance, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. My earlier comments weren't fully thought through.
However, I do still wonder about a sequence of events like this:
Thread 1:
- calls GetInstance and runs to line 508
- line 508 is executed, the boolean expression is evaluated to true
- execution advances to the beginning of 509.
Thread 2:
- calls GetInstance and runs to line 508
- line 508 is executed, the boolean expression is again evaluated to true
- execution advances all the way to the end of GetInstance(); an MDBEnvProvider has been constructed -- let's call it MDBEnvProvider object A
- MDBEnvProvider object A is returned to the caller of GetInstance
Thread 1:
- completes executing GetInstance; a different new MDBEnvProvider is constructed ('MDBEnvProvider object B')
- MDBEnvProvider object B is returned to the caller of GetInstance
And we have two objects instead of 1. (And now both of these separate MDBEnvProvider objects can have init() called. That would include having mdb_dbi_open called both times -- and this possibility is part of what prompted the original #2601 fix.)
It's been a while since I've done anything with Singletons, so I've had to do a bit of review.
Your mechanism (static member pointer variable, check for NULL, then create the object via new) is the Gang-of-Four mechanism, but it seems not fully thread-safe.
Since the code base is firmly in C++11, might a Meyer's Singleton be a better implementation?
(Apologies if I've mucked something up, as I said, it's been a while since I've used a this Design Pattern.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed clarification, I agree entirely. I've made a silent assumption that GetInstance() is called only by the main thread. I've been biased by focusing on nginx-connector use case - where this assumption is fulfilled indeed.
I'll replace this implementation with one that is thread-safe, as You suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinhsv I reimplemented the singleton pattern to a thread-safe version (Meyer's Singleton). Thanks a lot for all suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably get merged as it stands, but let me know what you think about my two obsevations -- especially the one at line 522. Thx.
src/collection/backend/lmdb.cc
Outdated
void MDBEnvProvider::init() { | ||
pthread_mutex_lock(&m_lock); | ||
if (!initialized) { | ||
MDB_txn *txn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have guaranteed that only one MDBEnvProvider can be constructed in a given process, is it perhaps worth considering moving the contents of this initialization block to the MDBEnvProvider constructor?
There's nothing unsafe about the structure right now, but making that change (I think) would allow us to omit using m_lock every time GetEnv and GetDBI are called. (Indeed we could get rid of the init() function entirely then. And, I think, get rid of the m_lock member variable entirely.)
(I'm mostly interested in the compute cycles saved by omitting this locking/unlocking on each of those calls if it's not truly necessary. It's not a huge savings, but still.)
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree. It could be significantly simplified and any locking is not necessary.
src/collection/backend/lmdb.cc
Outdated
pthread_mutex_unlock(&m_lock); | ||
} | ||
|
||
void MDBEnvProvider::close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This close function isn't currently called anywhere.
I'm thinking we'd want an MDBEnvProvider destructor to call this?
(I suppose in practice it doesn't matter that much since the object won't be destructed until the process shuts down.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, now it should be moved to destructor. I will fix that
@martinhsv I've adjusted the code to optimizations that You suggested. |
Looks, I think. I'll merge it today. Thanks for the work on this. |
Hi, I would like expand works started in issue: #2601. It turns out that these fixes cause significant problems with the ModSecurity-nginx stack.
Long story short - You cannot use readonly transactions in a child process if the lmdb environment (MDB_env) has been opened in a process which is forked afterwards (Use an MDB_env* in the process which opened it, without fork()ing ).
I prepared changes which improve that behavior by delaying opening the environment until the first use of any collection. Additionally, there should be only one open environment per process - that's why I introduced dedicated singleton wrapper responsible for creating the lmdb environment and providing it to each lmdb collection.