Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reload block when seeking backward in SegmentTermsEnum. #13253

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

vsop-479
Copy link
Contributor

@vsop-479 vsop-479 commented Mar 30, 2024

In SegmentTermsEnum.seekExact and SegmentTermsEnum.seekCeil, if we seek a smaller target after seeking a greater one, we set nextEnt to -1 in SegmentTermsEnumFrame.rewind. Which result in Force reload a loaded block.

I think we could just set nextEnt to 0 to let seek from start, and temporarily set entCount to last nextEnt to reduce the seeking range, and reset suffixesReader and suffixLengthsReader 's position. But don't need to reload the block again.

@vsop-479 vsop-479 marked this pull request as draft March 30, 2024 07:46
@vsop-479
Copy link
Contributor Author

vsop-479 commented Mar 30, 2024

Edit: Need more consider about multi floor block.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea @vsop-479! It would speed up cases where the caller seeks backwards a bit, remaining in the current frame.

@vsop-479 vsop-479 marked this pull request as ready for review April 12, 2024 09:22
@vsop-479
Copy link
Contributor Author

@mikemccand
Please take a look when you get a chance.

@mikemccand
Copy link
Member

This might be a needle-moving optimization for apps that reuse a single TermsEnum and seek randomly to terms, right? Because all up and down the stack of SegmentTermsEnumFrames we can rewindWithoutReload for many/most of them?

@vsop-479
Copy link
Contributor Author

This might be a needle-moving optimization for apps that reuse a single TermsEnum and seek randomly to terms, right?

Yes.

Because all up and down the stack of SegmentTermsEnumFrames we can rewindWithoutReload for many/most of them?

I just implemented rewindWithoutReload for non-floor block and first floor block, in current version. Since target may exists in more upper floor when it less than last term, but lastFrame is non-first floor.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 10, 2024
@mikemccand
Copy link
Member

Thanks @vsop-479 I will try to re-engage here soon!

@github-actions github-actions bot removed the Stale label May 11, 2024
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for being patient @vsop-479 -- I keep trying to review/understand this change and keep struggling :)

I left some superficial-ish comments.


// We got lastFrame by comparing target and term, and target less than last seeked term in
// currentFrame. If lastFrame's fp is same with currentFrame's fp, we can reduce entCount to
// nextEnt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to we can rewind without reloading? We don't seem to reduce entCount anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to reduce entCount anymore?

We can reduce entCount to nextEnt if we finally seek the same block.
This has been already implemented in:


// We still stay on withOutReload frame, reduce entCount to nextEnt.
        int origEntCount = -1;
        if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
          origEntCount = currentFrame.entCount;
          currentFrame.entCount = origNextEnt;
        }

currentFrame.rewind();
} else {
// Prepare to reduce entCount.
if (currentIsLast && currentFrame.isLeafBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why there are two cases under the rewindWithoutReload case. One case you can roll back entCount temporarily, because you know the target term is before the current term? And in the 2nd case, you don't know that, but you do know the target term is in this current block? But then why are we even rewinding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case you can roll back entCount temporarily, because you know the target term is before the current term?

Yes.

but you do know the target term is in this current block? But then why are we even rewinding?

We get lastFrame from stack[0] firstly, and compare target to last seek'd term to update lastFrame to reuse seek state, and assign it to currentFrame. This currentFrame only got by target and last seek'd term 's common prefix, we still need to continue walking the index to seek target term's block.

I think we need rewinding to set frame's state (nextEnt, entCount, etc.) to prepare for seeking. Unlike rewind, rewindWithoutReload can keep loaded block's data (same FP).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewindWithoutReload is called when currentFrame's is loaded and fp equals fpOrig.
Doing this can avoid reload a loaded block, when we finally need seek it, and it is still in stack.

Copy link
Contributor Author

@vsop-479 vsop-479 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I was not clear. I mean we can rewindWithoutReload when currentFrame.fp == currentFrame.fpOrig, base on this, if we finally seek the same block with last term, roll back entCount temporarily.

@vsop-479 vsop-479 requested a review from mikemccand June 16, 2024 09:25
Copy link

github-actions bot commented Jul 1, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jul 1, 2024
@vsop-479
Copy link
Contributor Author

Conflicts resolved.

@github-actions github-actions bot removed the Stale label Aug 13, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Aug 28, 2024
@vsop-479 vsop-479 changed the title Avoid SegmentTermsEnumFrame reload block. Avoid reload block when seeking backward in SegmentTermsEnum. Sep 20, 2024
@github-actions github-actions bot removed the Stale label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants