Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-optimize #93

Closed
wants to merge 2 commits into from
Closed

Auto-optimize #93

wants to merge 2 commits into from

Conversation

astigsen
Copy link
Contributor

I have noticed that one of the things that confuses new users most is when and why they should call optimize(). So I have mede a change that calls optimize() automatically after 1000 inserts. That should in most cases be enough for any patterns to be detectable.

There are some cases where this will cause unintended behavior. If you are working with a table where you constantly delete everything and then fill it up again, then you might not want the optimize step every time (but who knows, it might still be useful, depending on what you are putting in). And if you use it as a stack adding and popping from the bottom, you could get pathological behavior if you stayed around the 1000 mark, but that is probably a very rare edge case.

For the large majority of cases it will greatly simplify things for the user, and if it turn out to give problems, we can add some code to detect that it has already been optimized, and skip doing it again.

@@ -1419,6 +1419,16 @@ void Table::insert_done()
{
++m_size;

// We will do an automatic optimize after 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change interferes negatively with replication.

As is revealed by the following FIXME from "replication.hpp", the replication feature is implemented under the assumption that public modifying table methods will not call each other:

// FIXME: Be careful about the possibility of one modification
// functions being called by another where both do transaction
// logging.

(note, this is not limited to table methods only!)

The reason is that each public modifying method adds a corresponding instruction to the transaction log, and on the remote side, each instruction causes a call to the same public modifying method that generated it.

Therefore, if one public modifying method calls another one, the other one will end up being called twice on the remote side.

For example, if optimize() is called from the code below, it will add an 'optimize' instruction to the transaction log. That will be followed by a 'insert_done' instruction. On the remote side, where the transaction log is applied, the 'optimize' operation is first carried out, then the 'insert_done'. The problem is that, since the 'insert_done' instruction causes a call to Table::insert_done(), the optimization process will run twice.

Although, by itself, this is not a big issue, we must strive to adhere to rules like the mentioned one (no blame intended, I know you didn't know). If we don't, code complexity will quickly rise to a level where nobody can maintain enough insight to spot an issue like this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trivial solution in this case is to introduce a new function that actually carries out the optimization, but does not add anything to the transaction log. This new function is then called both from Table::optimize() and from Table::insert_done(). The only hard problem is finding a suitable name...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pathological scenario might not be that unlikely, which makes me a bit uneasy about this way of doing it.

I believe a simple and good behaviour would be to prevent the automatic optimization to occur more than once per table. The good part is that the worst case performance impact is limited in a simple to understandable way.

Unfortunately, I don't think this can be achieved without allocating an extra bit of information in the underlying array structure.

For example, by adding a flag that says whether the optimization operation has been carried out or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alexander, do you have any other ideas of how to achieve 'once per table' behaviour?

@kspangsege
Copy link
Contributor

Done reviewing.

@astigsen
Copy link
Contributor Author

Hmm, I just wondered if this would get triggered for sub-tables as well. That would not be intended behavior (yet).

@timanglade
Copy link

@kspangsege @astigsen any news on the status of this one would be welcome as well.

Have we considered other ways to smooth performance without developer inputs (e.g. automatic index creation) ?

@bmunkholm
Copy link
Contributor

My considerations in this area is that I think it's dangerous or at least
undesirable to do too much "intelligent" on behalf of developers. Suddenly
the system behavior is not deterministic anymore. And in the case of
indexing you would suddenly decide to trade memory usage for speed. I don't
think we can make such a tradeoff for people automatically.

// Brian

On Tue, Feb 4, 2014 at 7:01 AM, Tim Anglade notifications@github.comwrote:

@kspangsege https://github.com/kspangsege @astigsenhttps://github.com/astigsenany news on the status of this one would be welcome as well.

Have we considered other ways to smooth performance without developer
inputs (e.g. automatic index creation) ?

Reply to this email directly or view it on GitHubhttps://github.com//pull/93#issuecomment-34033260
.

@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pr/11/

@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pr/22/

@simonask simonask added the wip label Mar 30, 2015
@simonask
Copy link
Contributor

simonask commented Apr 9, 2015

@astigsen @bmunkholm Should this be closed or parked or…?

@bmunkholm bmunkholm changed the title [WIP] Auto-optimize [PARKED] Auto-optimize Apr 9, 2015
@danielpovlsen danielpovlsen changed the title [PARKED] Auto-optimize Auto-optimize Jul 15, 2015
@bmunkholm bmunkholm added P3 and removed parked labels Aug 12, 2015
@bmunkholm bmunkholm removed the wip label Feb 8, 2016
@jedelbo
Copy link
Contributor

jedelbo commented May 18, 2018

Abandonned

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants