Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reload block when seeking backward in SegmentTermsEnum. #13253

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,8 @@ private boolean setEOF() {
@Override
public boolean seekExact(BytesRef target) throws IOException {

long withOutReloadFp = -1;
int origNextEnt = -1;
if (fr.index == null) {
throw new IllegalStateException("terms index was not loaded");
}
Expand Down Expand Up @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws IOException {
// System.out.println(" target is before current (shares prefixLen=" + targetUpto + ");
// rewind frame ord=" + lastFrame.ord);
// }

// We got lastFrame by comparing target and term, and target less than last seeked term in
// currentFrame. If lastFrame's fp is same with currentFrame's fp, we can reduce entCount to
// nextEnt.
boolean currentIsLast = currentFrame.fp == lastFrame.fp;
currentFrame = lastFrame;
currentFrame.rewind();

// Only rewindWithoutReload for non-floor block or first floor block.
// TODO: We need currentFrame's first entry to judge whether we can rewindWithoutReload for
// non-first floor blocks.
vsop-479 marked this conversation as resolved.
Show resolved Hide resolved
if (currentFrame.fp != currentFrame.fpOrig
vsop-479 marked this conversation as resolved.
Show resolved Hide resolved
|| currentFrame.entCount == 0
|| currentFrame.nextEnt == -1) {
currentFrame.rewind();
} else {
// Prepare to reduce entCount.
if (currentIsLast && currentFrame.isLeafBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why there are two cases under the rewindWithoutReload case. One case you can roll back entCount temporarily, because you know the target term is before the current term? And in the 2nd case, you don't know that, but you do know the target term is in this current block? But then why are we even rewinding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case you can roll back entCount temporarily, because you know the target term is before the current term?

Yes.

but you do know the target term is in this current block? But then why are we even rewinding?

We get lastFrame from stack[0] firstly, and compare target to last seek'd term to update lastFrame to reuse seek state, and assign it to currentFrame. This currentFrame only got by target and last seek'd term 's common prefix, we still need to continue walking the index to seek target term's block.

I think we need rewinding to set frame's state (nextEnt, entCount, etc.) to prepare for seeking. Unlike rewind, rewindWithoutReload can keep loaded block's data (same FP).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewindWithoutReload is called when currentFrame's is loaded and fp equals fpOrig.
Doing this can avoid reload a loaded block, when we finally need seek it, and it is still in stack.

Copy link
Contributor Author

@vsop-479 vsop-479 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I was not clear. I mean we can rewindWithoutReload when currentFrame.fp == currentFrame.fpOrig, base on this, if we finally seek the same block with last term, roll back entCount temporarily.

origNextEnt = currentFrame.nextEnt;
withOutReloadFp = currentFrame.fp;
}
currentFrame.rewindWithoutReload();
}

} else {
// Target is exactly the same as current term
assert term.length() == target.length;
Expand Down Expand Up @@ -518,7 +541,20 @@ public boolean seekExact(BytesRef target) throws IOException {

currentFrame.loadBlock();

// We still stay on withOutReload frame, reduce entCount to nextEnt.
int origEntCount = -1;
if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
origEntCount = currentFrame.entCount;
currentFrame.entCount = origNextEnt;
}

final SeekStatus result = currentFrame.scanToTerm(target, true);

// Revert entCount to origEntCount.
vsop-479 marked this conversation as resolved.
Show resolved Hide resolved
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}

if (result == SeekStatus.FOUND) {
// if (DEBUG) {
// System.out.println(" return FOUND term=" + term.utf8ToString() + " " + term);
Expand Down Expand Up @@ -573,7 +609,20 @@ public boolean seekExact(BytesRef target) throws IOException {

currentFrame.loadBlock();

// We still stay on withOutReload frame, reduce entCount to nextEnt.
int origEntCount = -1;
if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
origEntCount = currentFrame.entCount;
currentFrame.entCount = origNextEnt;
}

final SeekStatus result = currentFrame.scanToTerm(target, true);

// Revert entCount to origEntCount.
vsop-479 marked this conversation as resolved.
Show resolved Hide resolved
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}

if (result == SeekStatus.FOUND) {
// if (DEBUG) {
// System.out.println(" return FOUND term=" + term.utf8ToString() + " " + term);
Expand All @@ -591,6 +640,8 @@ public boolean seekExact(BytesRef target) throws IOException {

@Override
public SeekStatus seekCeil(BytesRef target) throws IOException {
long withOutReloadFp = -1;
int origNextEnt = -1;

if (fr.index == null) {
throw new IllegalStateException("terms index was not loaded");
Expand Down Expand Up @@ -710,8 +761,28 @@ public SeekStatus seekCeil(BytesRef target) throws IOException {
// System.out.println(" target is before current (shares prefixLen=" + targetUpto + ");
// rewind frame ord=" + lastFrame.ord);
// }

// We got lastFrame by comparing target and term, and target less than last seeked term in
// currentFrame. If lastFrame's fp is same with currentFrame's fp, we can reduce entCount to
// nextEnt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to we can rewind without reloading? We don't seem to reduce entCount anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to reduce entCount anymore?

We can reduce entCount to nextEnt if we finally seek the same block.
This has been already implemented in:


// We still stay on withOutReload frame, reduce entCount to nextEnt.
        int origEntCount = -1;
        if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
          origEntCount = currentFrame.entCount;
          currentFrame.entCount = origNextEnt;
        }

boolean currentIsLast = currentFrame.fp == lastFrame.fp;
currentFrame = lastFrame;
currentFrame.rewind();

// Only rewindWithoutReload for non-floor block or first floor block.
// TODO: We need currentFrame's first entry to judge whether we can rewindWithoutReload for
// non-first floor blocks.
if (currentFrame.fp != currentFrame.fpOrig
|| currentFrame.entCount == 0
|| currentFrame.nextEnt == -1) {
currentFrame.rewind();
} else {
// Prepare to reduce entCount.
if (currentIsLast && currentFrame.isLeafBlock) {
origNextEnt = currentFrame.nextEnt;
withOutReloadFp = currentFrame.fp;
}
currentFrame.rewindWithoutReload();
}
} else {
// Target is exactly the same as current term
assert term.length() == target.length;
Expand Down Expand Up @@ -781,8 +852,21 @@ public SeekStatus seekCeil(BytesRef target) throws IOException {

currentFrame.loadBlock();

// We still stay on withOutReload frame, reduce entCount to nextEnt.
int origEntCount = -1;
if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
origEntCount = currentFrame.entCount;
currentFrame.entCount = origNextEnt;
}

// if (DEBUG) System.out.println(" now scanToTerm");
final SeekStatus result = currentFrame.scanToTerm(target, false);

// Revert entCount to origEntCount.
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}

if (result == SeekStatus.END) {
term.copyBytes(target);
termExists = false;
Expand Down Expand Up @@ -838,8 +922,20 @@ public SeekStatus seekCeil(BytesRef target) throws IOException {

currentFrame.loadBlock();

// We still stay on withOutReload frame, reduce entCount to nextEnt.
int origEntCount = -1;
if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
origEntCount = currentFrame.entCount;
currentFrame.entCount = origNextEnt;
}

final SeekStatus result = currentFrame.scanToTerm(target, false);

// Revert entCount to origEntCount.
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}

if (result == SeekStatus.END) {
term.copyBytes(target);
termExists = false;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,68 @@ void rewind() {
*/
}

// Only rewind, don't force reload block.
// Reset readers' position, don't read, decompress.
// Current term greater than target, reduce endCount.
void rewindWithoutReload() {
// Set nextEnt to 0, to prevent force load.
nextEnt = 0;
suffixesReader.setPosition(0);
suffixLengthsReader.setPosition(0);
statsReader.setPosition(0);
bytesReader.setPosition(0);

// TODO: Since we only rewind without reload for fist floor(currentFrame.fp ==
vsop-479 marked this conversation as resolved.
Show resolved Hide resolved
// currentFrame.fpOrig)
// So no need to set floorDataReader again?
// if (isFloor) {
// floorDataReader.setPosition(rewindPos);
// numFollowFloorBlocks = floorDataReader.readVInt();
// assert numFollowFloorBlocks > 0;
// nextFloorLabel = floorDataReader.readByte() & 0xff;
// }

metaDataUpto = 0;

statsSingletonRunLength = 0;
state.termBlockOrd = 0;
lastSubFP = -1;
// state.termBlockOrd = 0;
/*
//System.out.println("rewind");
// Keeps the block loaded, but rewinds its state:
if (nextEnt > 0 || fp != fpOrig) {
if (DEBUG) {
System.out.println(" rewind frame ord=" + ord + " fpOrig=" + fpOrig + " fp=" + fp + " hasTerms?=" + hasTerms + " isFloor?=" + isFloor + " nextEnt=" + nextEnt + " prefixLen=" + prefix);
}
if (fp != fpOrig) {
fp = fpOrig;
nextEnt = -1;
} else {
nextEnt = 0;
}
hasTerms = hasTermsOrig;
if (isFloor) {
floorDataReader.rewind();
numFollowFloorBlocks = floorDataReader.readVInt();
nextFloorLabel = floorDataReader.readByte() & 0xff;
}
assert suffixBytes != null;
suffixesReader.rewind();
assert statBytes != null;
statsReader.rewind();
metaDataUpto = 0;
state.termBlockOrd = 0;
// TODO: skip this if !hasTerms? Then postings
// impl wouldn't have to write useless 0 byte
postingsReader.resetTermsBlock(fieldInfo, state);
lastSubFP = -1;
} else if (DEBUG) {
System.out.println(" skip rewind fp=" + fp + " fpOrig=" + fpOrig + " nextEnt=" + nextEnt + " ord=" + ord);
}
*/
}

// Decodes next entry; returns true if it's a sub-block
public boolean next() throws IOException {
if (isLeafBlock) {
Expand Down Expand Up @@ -855,4 +917,35 @@ private void fillTerm() {
ste.term.grow(termLength);
System.arraycopy(suffixBytes, startBytePos, ste.term.bytes(), prefix, suffix);
}

// Used for debugging.
@Override
public String toString() {
return "fp: "
+ fp
+ ", fpOrig: "
+ fpOrig
+ ", fpEnd: "
+ fpEnd
+ ", lastSubFP: "
+ lastSubFP
+ ", entCount: "
+ entCount
+ ", nextEnt: "
+ nextEnt
+ ", isLeafBlock: "
+ isLeafBlock
+ ", isFloor: "
+ isFloor
+ ", isLastInFloor: "
+ isLastInFloor
+ ", nextFloorLabel: "
+ nextFloorLabel
+ ", suffixesPos: "
+ suffixesReader.getPosition()
+ ", suffixLengthsPos: "
+ suffixLengthsReader.getPosition()
+ ", floorDataPos: "
+ floorDataReader.getPosition();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@
package org.apache.lucene.codecs.lucene99;

import static org.apache.lucene.codecs.lucene99.Lucene99ScoreSkipReader.readImpacts;
import static org.apache.lucene.tests.util.TestUtil.alwaysPostingsFormat;
import static org.apache.lucene.tests.util.TestUtil.getDefaultPostingsFormat;

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.CompetitiveImpactAccumulator;
import org.apache.lucene.codecs.PostingsFormat;
import org.apache.lucene.codecs.lucene90.blocktree.FieldReader;
import org.apache.lucene.codecs.lucene90.blocktree.Stats;
import org.apache.lucene.codecs.lucene99.Lucene99ScoreSkipReader.MutableImpactList;
Expand Down Expand Up @@ -150,4 +153,113 @@ protected void subCheckBinarySearch(TermsEnum termsEnum) throws Exception {
assertEquals(TermsEnum.SeekStatus.NOT_FOUND, termsEnum.seekCeil(new BytesRef("10004a")));
assertEquals(termsEnum.term(), new BytesRef("100051"));
}

public void testFloorBlocks() throws Exception {
Directory dir = newDirectory();
// Set minTermBlockSize to 2, maxTermBlockSize to 3, to generate deep subBlock.
PostingsFormat postingsFormat = getDefaultPostingsFormat(2, 3);

IndexWriter writer =
new IndexWriter(dir, newIndexWriterConfig().setCodec(alwaysPostingsFormat(postingsFormat)));
String[] categories =
new String[] {
"regular", "request1", "request2", "request3", "request4", "rest", "teacher", "team"
};

for (String category : categories) {
Document doc = new Document();
doc.add(newStringField("category", category, Field.Store.YES));
writer.addDocument(doc);
}

IndexReader reader = DirectoryReader.open(writer);

TermsEnum termsEnum = getOnlyLeafReader(reader).terms("category").iterator();

// Test seekExact.
BytesRef target = new BytesRef("request2");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request1");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request4");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request3");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request4");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request1");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request4");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("regular");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("rest");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("regular");
assertTrue(termsEnum.seekExact(target));
assertEquals(termsEnum.term(), target);

// Test seekCeil.
target = new BytesRef("request2");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request1");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request4");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request3");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request4");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request1");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("request4");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("regular");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("rest");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

target = new BytesRef("regular");
assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
assertEquals(termsEnum.term(), target);

writer.close();
reader.close();
dir.close();
}
}