-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte. #592
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guard exists because getting bytes from a closed mapped ByteBuffer
would cause a segv. So to keep the protection it offers, we should keep calling ensureValid
before reading any byte from the ByteBuffer
while your PR does it afterwards. I would also prefer keeping access to the ByteBuffer
s encapsulated into ByteBufferGuard
.
Separately, this makes me wonder if some of the tasks that get the best speedups shouldn't be using a different encoding than vints.
I think there is another issue with this PR, which is that if a vInt is written across two ByteBuffers, then your PR would only ensure the validity of one of these ByteBuffers, while I think it should check both for safety. |
b06f078
to
b792e94
Compare
Thanks @jpountz ! I did not realize that reading ummapped bytebuffer will make JVM crach, sorry!
Sorry for my poor english, i'm not sure i've got your point. So to confirm, which one is you are meaning?
I'm having this question because i have not understood why todays ensurevalid can protect readings on bytebuffer, it seems there should be a case that |
I checked some other usage, seeing that for operations like |
Thanks @gf2121 the implementation looks correct to me now. Are you still seeing a good speedup with this change? |
The problem here is that we just reducing the number of guard checks and at same time raising the risk to segv: ReadVint uses multiple atomic reads and all of those reads can segv. Sorry to tell this: I would not change anything here, it is too risky. Let's wait for MemorySegmentIndexInput, which has JVM internal checks and a guard is not needed. see #518 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not do this. The benchmark above shows no significant changes and it looks like ran with wrong JVM options (defaults by Mike were bad, fixed recently). It should run with tiered compilation enabled then the guard checks should go away.
Thanks @jpountz @uschindler , this is the benchmark result based on the newest codes
OK! I'll close this then.
Yes i did not run with tiered compilation, but in fact i'm running with the newest luceneutil. It seems we used to add this param but removed again a few days ago? See this commit: mikemccand/luceneutil@f48b538 |
The param must go away or change - to +. Mike's commit fixed the tiered compilation problem and I was not sure if you have used his latest commit. To get better predicatable results that are not so noisy (especially for those low-level changes) you should have a huge dataset (not only wikiMedium) and also run it with more queries/run. What you see here is mostly noise. |
@uschindler Thank you for the benchmark guidance! I made a new change in this commit that ensure valid before each call of
So it seems like the point that brings a speed up is not reducing number of valid check but something else ? Anyway, this approach is no longer raising the risk of segv and benchmark result seems positive so i reopened the PR to see if this will make sense to you now :) |
Hi @gf2121, I see that you carefully also save the position before the try block and restore it on boundary crossing exception. What I don't like: ByteBufferGuard is just a hack and should not contain any program code except guarding. So I am not happy of including readVInt code there. Have you tried in just moving the If that helps we can also try the same on the MemorySegment one from #518. |
I think about this: @Override
public final int readVInt() throws IOException {
try {
// using #position instead of #mark here as #position is a final method
int pos = curBuf.position();
try {
byte b = guard.get(curBuf);
if (b >= 0) return b;
int i = b & 0x7F;
b = guard.get(curBuf);
i |= (b & 0x7F) << 7;
if (b >= 0) return i;
b = guard.get(curBuf);
i |= (b & 0x7F) << 14;
if (b >= 0) return i;
b = guard.get(curBuf);
i |= (b & 0x7F) << 21;
if (b >= 0) return i;
b = guard.get(curBuf);
// Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors:
i |= (b & 0x0F) << 28;
if ((b & 0xF0) == 0) return i;
throw new IOException("Invalid vInt detected (too many bits)");
} catch (
@SuppressWarnings("unused")
BufferUnderflowException e) {
curBuf.position(pos);
return super.readVInt();
}
} catch (
@SuppressWarnings("unused")
NullPointerException npe) {
throw new AlreadyClosedException("Already closed: " + this);
}
} |
I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no longer affects any of the JVM versions we support to my knowledge, maybe we should consider rolling up these loops again (assuming it helps performance). |
We only loose the last check on last iteration (the "copypaste" statement). |
We should maybe convert all of them back to loops in DataInput base class and remove the specializations and do more tests. I think the change here is then completely obsolete. |
Hi @jpountz, Uwe |
+1 |
@gf2121 Do you have an email I can use to contact you? Please send me an email at jpountz@gmail.com if you don't want to leave it on GitHub. |
Hi @jpountz. This is my email: guofeng.my@qq.com |
I find that in a higher level, we are using Note: This speed up is a duplicate of this PR as they are actually solving the same problem with different approaches.
I'm not very sure if we should move on in this approach to avoid potential confusion in a more complex environment other than benchmark, but it sounds reasonable to remove I raised https://issues.apache.org/jira/browse/LUCENE-10388 (#620) |
Hi @gf2121, |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Hello @gf2121! Looks like @uschindler wants you to have to honor of merging this (now stale!) PR! |
Does this also affect |
Thanks! |
…ut to avoid the abstraction confusion of #readByte. (#592)
…ut to avoid the abstraction confusion of #readByte. (apache#592)
I'm pushing an annotation, this triggered a speedup in PKLookup: http://people.apache.org/~mikemccand/lucenebench/PKLookup.html. |
https://issues.apache.org/jira/browse/LUCENE-10366
#11402
Today, we do not rewrite
#readVInt
and#readVLong
forByteBufferIndexInput
. By default, the logic will call#readByte
several times, and we need to check whetherByteBuffer
is valid every time. This may not be necessary as we just need a final check.