Skip to content

Conversation

@cl4es
Copy link
Member

@cl4es cl4es commented Oct 6, 2024

#14632 showed that coalescing loads in the ZipUtils utility methods could improve performance in zip-related microbenchmarks, but the suggested PR would increase startup overheads by early use of ByteArrayLittleEndian which depends on VarHandles. Progress was stalled as we backed out some related early use of ByteArray(LittleEndian) and started exploring merge store optimizations in C2.

In this PR I instead suggest using Unsafe directly to coalesce short, int, and long reads from zip data. Even with explicit bounds checking to ensure these utilities are always safe there are significant improvements both to lookup speed and speed of opening zip files (most if not all bounds checks are optimized away):

make test TEST=micro:java.util.zip.ZipFile

Name                          (size) Cnt       Base      Error        Test      Error  Unit  Change
GetEntry.getEntryHit             512  15     37.999 ±    0.841      34.641 ±    0.389 ns/op   1.10x (p = 0.000*)
GetEntry.getEntryHit            1024  15     39.557 ±    0.523      36.959 ±    1.488 ns/op   1.07x (p = 0.000*)
GetEntry.getEntryHitUncached     512  15     69.250 ±    0.931      64.851 ±    0.987 ns/op   1.07x (p = 0.000*)
GetEntry.getEntryHitUncached    1024  15     71.628 ±    0.307      67.927 ±    0.714 ns/op   1.05x (p = 0.000*)
GetEntry.getEntryMiss            512  15     22.961 ±    0.336      22.825 ±    0.188 ns/op   1.01x (p = 0.158 )
GetEntry.getEntryMiss           1024  15     22.940 ±    0.115      23.502 ±    0.273 ns/op   0.98x (p = 0.000*)
GetEntry.getEntryMissUncached    512  15     35.886 ±    0.429      35.598 ±    1.296 ns/op   1.01x (p = 0.395 )
GetEntry.getEntryMissUncached   1024  15     38.168 ±    0.911      36.141 ±    0.356 ns/op   1.06x (p = 0.000*)
Open.openCloseZipFile            512  15  62425.563 ±  997.455   56263.401 ±  896.892 ns/op   1.11x (p = 0.000*)
Open.openCloseZipFile           1024  15 117491.250 ±  962.928  108055.491 ± 1595.577 ns/op   1.09x (p = 0.000*)
Open.openCloseZipFilex2          512  15  62974.575 ±  911.095   57996.388 ±  910.929 ns/op   1.09x (p = 0.000*)
Open.openCloseZipFilex2         1024  15 119164.769 ± 1756.065  108803.468 ±  929.483 ns/op   1.10x (p = 0.000*)
  * = significant

This PR also address some code duplication in ZipUtils.

An appealing alternative would be to implement a merge load analogue to the merge store optimizations in C2. Such optimizations would be very welcome since it would improve similar code outside of java.base (jdk.zipfs has some duplicate code that is left untouched) and reduce the need for Unsafe trickery. This enhancement and the microbenchmarks could then be used as verification and the Unsafe code backed out.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8341594: Use Unsafe to coalesce reads in java.util.zip.ZipUtils (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21377/head:pull/21377
$ git checkout pull/21377

Update a local copy of the PR:
$ git checkout pull/21377
$ git pull https://git.openjdk.org/jdk.git pull/21377/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21377

View PR using the GUI difftool:
$ git pr show -t 21377

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21377.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 6, 2024

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 6, 2024

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8341594: Use Unsafe to coalesce reads in java.util.zip.ZipUtils

Reviewed-by: lancea

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 103 new commits pushed to the master branch:

  • fc7244d: 8340713: Open source DnD tests - Set5
  • f7bb647: 8341595: Clean up iteration of CEN headers in ZipFile.Source.initCEN
  • d0c5e4b: 8341373: Open source closed frame tests # 4
  • 3359518: 8341593: Problemlist java/foreign/TestUpcallStress.java in Xcomp mode
  • a2372c6: 8341238: G1: Refactor G1Policy to move collection set selection methods into G1CollectionSet
  • 4ba170c: 8341235: Improve default instruction frame title in PassFailJFrame
  • 520060f: 8340799: Add border inside instruction frame in PassFailJFrame
  • 2897797: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp
  • 747a3fa: 8341562: RISC-V: Generate comments in -XX:+PrintInterpreter to link to source code
  • 81ebbb2: 8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
  • ... and 93 more: https://git.openjdk.org/jdk/compare/ad5ffccffa89359dac6ad44b9e43242e5bf3e398...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 6, 2024
@openjdk
Copy link

openjdk bot commented Oct 6, 2024

@cl4es The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Oct 6, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 6, 2024

Webrevs

Comment on lines +174 to +175
Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER);
Preconditions.checkIndex(off + 1, b.length, Preconditions.AIOOBE_FORMATTER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Preconditions.checkFromIndexSize, which should be less overhead:

Suggested change
Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER);
Preconditions.checkIndex(off + 1, b.length, Preconditions.AIOOBE_FORMATTER);
Preconditions.checkFromIndexSize(off, 2, b.length, Preconditions.AIOOBE_FORMATTER);

Similarly for other methods.

Copy link
Member Author

@cl4es cl4es Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not less overhead in my tests, since checkIndex is intrinsic and mostly disappears, while with checkFromIndexSize performance gets significantly worse (on par with baseline). It's on my todo to investigate this in-depth but I think checkFromIndexSize needs to be given similar intrinsification treatment as checkIndex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if we trust the input index to be nonnegative, we can just check our end index for out of bounds too.

Copy link
Member Author

@cl4es cl4es Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think the JIT is pretty good at eliminating the (intrinsic) checkIndex calls when they are redundant though. Performance with and without these checkIndexes are the same in my testing, so we can eat and have the cake on this one.

FWIW I wouldn't mind giving similar treatment to ByteArray(-LittleEndian) and avoid the VarHandles dependency in those utility classes, but I have no urge to get into the sort of discussions that were spawned in #19616

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if we trust the input index to be nonnegative, we can just check our end index for out of bounds too.

I would not trust that. Perhaps for well-formed ZIP files, but trust me, not all ZIPs are well-formed ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that requires you to pre-validate the input argument as a fixed position after already-read content, which is not always the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @eirbjo suggests we'd have to put a lot of validation in other places if we went down this route. Regardless this is an academic discussion since the PR suggests the safe route and we don't pay much of a cost for that in microbenchmarks.

buf[i+1] == (byte)'K' &&
buf[i+2] == (byte)'\005' &&
buf[i+3] == (byte)'\006') {
if (get32(buf, i) == ENDSIG) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a matter of personal preference, but I think GETSIG(buf, i) == ENDSIG reads better than get32(buf, i) == ENDSIG.

The fact that it's 32 bits is kind of a detail and it doesn't reveal as well that we intend to read a signature.

So could we keep GETSIG, but add an index? There are places in ZipInputStream as well which could make use of that for signature checking. (But maybe not for this PR)

Alternatively, ENDSIG(buf, i) == ENDSIG would be consistent with CENSIG(buf, i) uses.

Same applies to the other GETSIG replacements in this file.

Copy link
Member Author

@cl4es cl4es Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the GETSIG(byte[]) methods are quite nasty, and it's all used very inconsistently. I wouldn't mind going the other way and removing all the CENSIG, LOCNAM etc methods and just call get16/32/32S/64(buf, ZipConstants.FOO) as appropriate. Add a comment at each ZipConstants entry about exactly how many bytes are in the field, if it's unsigned or signed. Then at least there's a reference to something that looks like a specification, not a mix of constants and single-purpose offset getters scattered around (which doesn't even reference the constants in ZipConstants).

zf.close();
zf2.close();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short comment stating the purpose of the main method would not hurt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19477 (comment)

This applies to many benchmarks, so I wonder where is the best place for such a note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair to add a descriptive comment individually.

static final long CENOFF(byte[] b, int pos) { return LG(b, pos + 42);}
static final long CENSIG(byte[] b, int pos) { return get32(b, pos + 0); }
static final int CENVEM(byte[] b, int pos) { return get16(b, pos + 4); }
static final int CENVEM_FA(byte[] b, int pos) { return Byte.toUnsignedInt(b[pos + 5]); } // file attribute compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider introducing get8 for consistency here? As it stands, this looks like the odd one out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it, but since get8 would basically just delegate to or do exactly what Byte.toUnsignedInt does I opted to cut out the middle man.

* Fetches signed 64-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
public static final long get64(byte[] b, int off) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method returns a signed 64 bit value, which I think is not what some of its call sites expect. It should in any case be renamed to get64S to align with get32S. A new method get64 should be introduced and any call site expecting unsigned numbers (most?) should use that instead.

If you don't want to deal with this in this PR, I could file an issue and suggest a PR for this. Let me know.

Copy link
Member Author

@cl4es cl4es Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's a pre-existing issue I'd prefer to keep this one focused on the switch-over. How would you model unsigned long values here, though? Sure we could read into a BigInteger or accept negative values, but to really support such overflows we might have to rework a lot of things.

FWIW we already cap some values even lower in practice:

                            end.centot = (int)centot64; // assume total < 2g

Copy link
Contributor

@eirbjo eirbjo Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you model unsigned long values here, though?

I don't think we should. 9223372036854775807 should be enough for everyone :-)

It may be worth renaming the method to get64S and add a get64 variant which either clamps at LONG.MAX_VALUE or throws IllegalArgumentException for larger values. Call sites doing custom validation (like checkZip64ExtraFieldValues) could then call get64S and check for a negative long.

But that's food for another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming to get64S is reasonable to be internally consistent. Updated. Improving validation of data in such 64-bit fields I'll leave for the future. I think a reasonable stance is to throw in the check methods if any such field is negative, at least for some of these fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we already cap some values even lower in practice:

                            end.centot = (int)centot64; // assume total < 2g

I submitted #21384 which adds validation of end.centot and also eliminates this narrowing conversion.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Claes,

Looks reasonable to me.

thank you for your efforts here

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 7, 2024
* Fetches unsigned 16-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
public static final int get16(byte[] b, int off) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can JIT automatically perform MergeStore here? If JIT can automatically perform MergeStore without using Unsafe, many scenarios will benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we only have merge store for writing to arrays; no merge load when we are reading from arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As I wrote in the PR description: "An appealing alternative would be to implement a merge load analogue to the merge store optimizations in C2." So it's the opposite of a merge store. I recall @eme64 mentioning somewhere that such an optimization would be possible, if we could show it to be profitable. I think it's reasonable to do this enhancement now, then revert the Unsafe stuff if/when there's a merge load optimization that beats it.

@cl4es
Copy link
Member Author

cl4es commented Oct 8, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Oct 8, 2024

Going to push as commit ffb60e5.
Since your change was applied there have been 109 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 8, 2024
@openjdk openjdk bot closed this Oct 8, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 8, 2024
@openjdk
Copy link

openjdk bot commented Oct 8, 2024

@cl4es Pushed as commit ffb60e5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants