-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-optimize #93
Auto-optimize #93
Conversation
@@ -1419,6 +1419,16 @@ void Table::insert_done() | |||
{ | |||
++m_size; | |||
|
|||
// We will do an automatic optimize after 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change interferes negatively with replication.
As is revealed by the following FIXME from "replication.hpp", the replication feature is implemented under the assumption that public modifying table methods will not call each other:
// FIXME: Be careful about the possibility of one modification
// functions being called by another where both do transaction
// logging.
(note, this is not limited to table methods only!)
The reason is that each public modifying method adds a corresponding instruction to the transaction log, and on the remote side, each instruction causes a call to the same public modifying method that generated it.
Therefore, if one public modifying method calls another one, the other one will end up being called twice on the remote side.
For example, if optimize() is called from the code below, it will add an 'optimize' instruction to the transaction log. That will be followed by a 'insert_done' instruction. On the remote side, where the transaction log is applied, the 'optimize' operation is first carried out, then the 'insert_done'. The problem is that, since the 'insert_done' instruction causes a call to Table::insert_done(), the optimization process will run twice.
Although, by itself, this is not a big issue, we must strive to adhere to rules like the mentioned one (no blame intended, I know you didn't know). If we don't, code complexity will quickly rise to a level where nobody can maintain enough insight to spot an issue like this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trivial solution in this case is to introduce a new function that actually carries out the optimization, but does not add anything to the transaction log. This new function is then called both from Table::optimize() and from Table::insert_done(). The only hard problem is finding a suitable name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pathological scenario might not be that unlikely, which makes me a bit uneasy about this way of doing it.
I believe a simple and good behaviour would be to prevent the automatic optimization to occur more than once per table. The good part is that the worst case performance impact is limited in a simple to understandable way.
Unfortunately, I don't think this can be achieved without allocating an extra bit of information in the underlying array structure.
For example, by adding a flag that says whether the optimization operation has been carried out or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alexander, do you have any other ideas of how to achieve 'once per table' behaviour?
Done reviewing. |
Hmm, I just wondered if this would get triggered for sub-tables as well. That would not be intended behavior (yet). |
@kspangsege @astigsen any news on the status of this one would be welcome as well. Have we considered other ways to smooth performance without developer inputs (e.g. automatic index creation) ? |
My considerations in this area is that I think it's dangerous or at least // Brian On Tue, Feb 4, 2014 at 7:01 AM, Tim Anglade notifications@github.comwrote:
|
Test FAILed. |
Test FAILed. |
@astigsen @bmunkholm Should this be closed or parked or…? |
Abandonned |
I have noticed that one of the things that confuses new users most is when and why they should call optimize(). So I have mede a change that calls optimize() automatically after 1000 inserts. That should in most cases be enough for any patterns to be detectable.
There are some cases where this will cause unintended behavior. If you are working with a table where you constantly delete everything and then fill it up again, then you might not want the optimize step every time (but who knows, it might still be useful, depending on what you are putting in). And if you use it as a stack adding and popping from the bottom, you could get pathological behavior if you stayed around the 1000 mark, but that is probably a very rare edge case.
For the large majority of cases it will greatly simplify things for the user, and if it turn out to give problems, we can add some code to detect that it has already been optimized, and skip doing it again.