-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8341594: Use Unsafe to coalesce reads in java.util.zip.ZipUtils #21377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back redestad! A progress list of the required criteria for merging this PR into |
|
@cl4es This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 103 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
| Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER); | ||
| Preconditions.checkIndex(off + 1, b.length, Preconditions.AIOOBE_FORMATTER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Preconditions.checkFromIndexSize, which should be less overhead:
| Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER); | |
| Preconditions.checkIndex(off + 1, b.length, Preconditions.AIOOBE_FORMATTER); | |
| Preconditions.checkFromIndexSize(off, 2, b.length, Preconditions.AIOOBE_FORMATTER); |
Similarly for other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not less overhead in my tests, since checkIndex is intrinsic and mostly disappears, while with checkFromIndexSize performance gets significantly worse (on par with baseline). It's on my todo to investigate this in-depth but I think checkFromIndexSize needs to be given similar intrinsification treatment as checkIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if we trust the input index to be nonnegative, we can just check our end index for out of bounds too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think the JIT is pretty good at eliminating the (intrinsic) checkIndex calls when they are redundant though. Performance with and without these checkIndexes are the same in my testing, so we can eat and have the cake on this one.
FWIW I wouldn't mind giving similar treatment to ByteArray(-LittleEndian) and avoid the VarHandles dependency in those utility classes, but I have no urge to get into the sort of discussions that were spawned in #19616
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if we trust the input index to be nonnegative, we can just check our end index for out of bounds too.
I would not trust that. Perhaps for well-formed ZIP files, but trust me, not all ZIPs are well-formed ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that requires you to pre-validate the input argument as a fixed position after already-read content, which is not always the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like @eirbjo suggests we'd have to put a lot of validation in other places if we went down this route. Regardless this is an academic discussion since the PR suggests the safe route and we don't pay much of a cost for that in microbenchmarks.
| buf[i+1] == (byte)'K' && | ||
| buf[i+2] == (byte)'\005' && | ||
| buf[i+3] == (byte)'\006') { | ||
| if (get32(buf, i) == ENDSIG) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a matter of personal preference, but I think GETSIG(buf, i) == ENDSIG reads better than get32(buf, i) == ENDSIG.
The fact that it's 32 bits is kind of a detail and it doesn't reveal as well that we intend to read a signature.
So could we keep GETSIG, but add an index? There are places in ZipInputStream as well which could make use of that for signature checking. (But maybe not for this PR)
Alternatively, ENDSIG(buf, i) == ENDSIG would be consistent with CENSIG(buf, i) uses.
Same applies to the other GETSIG replacements in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the GETSIG(byte[]) methods are quite nasty, and it's all used very inconsistently. I wouldn't mind going the other way and removing all the CENSIG, LOCNAM etc methods and just call get16/32/32S/64(buf, ZipConstants.FOO) as appropriate. Add a comment at each ZipConstants entry about exactly how many bytes are in the field, if it's unsigned or signed. Then at least there's a reference to something that looks like a specification, not a mix of constants and single-purpose offset getters scattered around (which doesn't even reference the constants in ZipConstants).
| zf.close(); | ||
| zf2.close(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A short comment stating the purpose of the main method would not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to many benchmarks, so I wonder where is the best place for such a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair to add a descriptive comment individually.
| static final long CENOFF(byte[] b, int pos) { return LG(b, pos + 42);} | ||
| static final long CENSIG(byte[] b, int pos) { return get32(b, pos + 0); } | ||
| static final int CENVEM(byte[] b, int pos) { return get16(b, pos + 4); } | ||
| static final int CENVEM_FA(byte[] b, int pos) { return Byte.toUnsignedInt(b[pos + 5]); } // file attribute compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider introducing get8 for consistency here? As it stands, this looks like the odd one out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it, but since get8 would basically just delegate to or do exactly what Byte.toUnsignedInt does I opted to cut out the middle man.
| * Fetches signed 64-bit value from byte array at specified offset. | ||
| * The bytes are assumed to be in Intel (little-endian) byte order. | ||
| */ | ||
| public static final long get64(byte[] b, int off) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method returns a signed 64 bit value, which I think is not what some of its call sites expect. It should in any case be renamed to get64S to align with get32S. A new method get64 should be introduced and any call site expecting unsigned numbers (most?) should use that instead.
If you don't want to deal with this in this PR, I could file an issue and suggest a PR for this. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's a pre-existing issue I'd prefer to keep this one focused on the switch-over. How would you model unsigned long values here, though? Sure we could read into a BigInteger or accept negative values, but to really support such overflows we might have to rework a lot of things.
FWIW we already cap some values even lower in practice:
end.centot = (int)centot64; // assume total < 2g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you model unsigned long values here, though?
I don't think we should. 9223372036854775807 should be enough for everyone :-)
It may be worth renaming the method to get64S and add a get64 variant which either clamps at LONG.MAX_VALUE or throws IllegalArgumentException for larger values. Call sites doing custom validation (like checkZip64ExtraFieldValues) could then call get64S and check for a negative long.
But that's food for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming to get64S is reasonable to be internally consistent. Updated. Improving validation of data in such 64-bit fields I'll leave for the future. I think a reasonable stance is to throw in the check methods if any such field is negative, at least for some of these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we already cap some values even lower in practice:
end.centot = (int)centot64; // assume total < 2g
I submitted #21384 which adds validation of end.centot and also eliminates this narrowing conversion.
LanceAndersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Claes,
Looks reasonable to me.
thank you for your efforts here
| * Fetches unsigned 16-bit value from byte array at specified offset. | ||
| * The bytes are assumed to be in Intel (little-endian) byte order. | ||
| */ | ||
| public static final int get16(byte[] b, int off) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can JIT automatically perform MergeStore here? If JIT can automatically perform MergeStore without using Unsafe, many scenarios will benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we only have merge store for writing to arrays; no merge load when we are reading from arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As I wrote in the PR description: "An appealing alternative would be to implement a merge load analogue to the merge store optimizations in C2." So it's the opposite of a merge store. I recall @eme64 mentioning somewhere that such an optimization would be possible, if we could show it to be profitable. I think it's reasonable to do this enhancement now, then revert the Unsafe stuff if/when there's a merge load optimization that beats it.
|
/integrate |
|
Going to push as commit ffb60e5.
Your commit was automatically rebased without conflicts. |
#14632 showed that coalescing loads in the
ZipUtilsutility methods could improve performance in zip-related microbenchmarks, but the suggested PR would increase startup overheads by early use ofByteArrayLittleEndianwhich depends onVarHandles. Progress was stalled as we backed out some related early use ofByteArray(LittleEndian)and started exploring merge store optimizations in C2.In this PR I instead suggest using
Unsafedirectly to coalesceshort,int, andlongreads from zip data. Even with explicit bounds checking to ensure these utilities are always safe there are significant improvements both to lookup speed and speed of opening zip files (most if not all bounds checks are optimized away):This PR also address some code duplication in
ZipUtils.An appealing alternative would be to implement a merge load analogue to the merge store optimizations in C2. Such optimizations would be very welcome since it would improve similar code outside of java.base (jdk.zipfs has some duplicate code that is left untouched) and reduce the need for
Unsafetrickery. This enhancement and the microbenchmarks could then be used as verification and theUnsafecode backed out.Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21377/head:pull/21377$ git checkout pull/21377Update a local copy of the PR:
$ git checkout pull/21377$ git pull https://git.openjdk.org/jdk.git pull/21377/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21377View PR using the GUI difftool:
$ git pr show -t 21377Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21377.diff
Webrev
Link to Webrev Comment