-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8341594: Use Unsafe to coalesce reads in java.util.zip.ZipUtils #21377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c4b327a
9ffaff1
32010f6
d904ce6
23f0c8e
360afea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -39,6 +39,7 @@ | |||||||
| import jdk.internal.access.JavaNioAccess; | ||||||||
| import jdk.internal.access.SharedSecrets; | ||||||||
| import jdk.internal.misc.Unsafe; | ||||||||
| import jdk.internal.util.Preconditions; | ||||||||
|
|
||||||||
| class ZipUtils { | ||||||||
|
|
||||||||
|
|
@@ -170,23 +171,31 @@ static LocalDateTime javaEpochToLocalDateTime(long time) { | |||||||
| * The bytes are assumed to be in Intel (little-endian) byte order. | ||||||||
| */ | ||||||||
| public static final int get16(byte[] b, int off) { | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can JIT automatically perform MergeStore here? If JIT can automatically perform MergeStore without using Unsafe, many scenarios will benefit.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately we only have merge store for writing to arrays; no merge load when we are reading from arrays.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. As I wrote in the PR description: "An appealing alternative would be to implement a merge load analogue to the merge store optimizations in C2." So it's the opposite of a merge store. I recall @eme64 mentioning somewhere that such an optimization would be possible, if we could show it to be profitable. I think it's reasonable to do this enhancement now, then revert the |
||||||||
| return (b[off] & 0xff) | ((b[off + 1] & 0xff) << 8); | ||||||||
| Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| Preconditions.checkIndex(off + 1, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
|
Comment on lines
+174
to
+175
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use
Suggested change
Similarly for other methods.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's actually not less overhead in my tests, since
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually if we trust the input index to be nonnegative, we can just check our end index for out of bounds too.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I think the JIT is pretty good at eliminating the (intrinsic) FWIW I wouldn't mind giving similar treatment to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I would not trust that. Perhaps for well-formed ZIP files, but trust me, not all ZIPs are well-formed ;-)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, that requires you to pre-validate the input argument as a fixed position after already-read content, which is not always the case.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like @eirbjo suggests we'd have to put a lot of validation in other places if we went down this route. Regardless this is an academic discussion since the PR suggests the safe route and we don't pay much of a cost for that in microbenchmarks. |
||||||||
| return Short.toUnsignedInt( | ||||||||
| UNSAFE.getShortUnaligned(b, off + Unsafe.ARRAY_BYTE_BASE_OFFSET, false)); | ||||||||
| } | ||||||||
|
|
||||||||
| /** | ||||||||
| * Fetches unsigned 32-bit value from byte array at specified offset. | ||||||||
| * The bytes are assumed to be in Intel (little-endian) byte order. | ||||||||
| */ | ||||||||
| public static final long get32(byte[] b, int off) { | ||||||||
| return (get16(b, off) | ((long)get16(b, off+2) << 16)) & 0xffffffffL; | ||||||||
| Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| Preconditions.checkIndex(off + 3, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| return Integer.toUnsignedLong( | ||||||||
| UNSAFE.getIntUnaligned(b, off + Unsafe.ARRAY_BYTE_BASE_OFFSET, false)); | ||||||||
| } | ||||||||
|
|
||||||||
| /** | ||||||||
| * Fetches signed 64-bit value from byte array at specified offset. | ||||||||
| * The bytes are assumed to be in Intel (little-endian) byte order. | ||||||||
| */ | ||||||||
| public static final long get64(byte[] b, int off) { | ||||||||
| return get32(b, off) | (get32(b, off+4) << 32); | ||||||||
| public static final long get64S(byte[] b, int off) { | ||||||||
| Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| Preconditions.checkIndex(off + 7, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| return UNSAFE.getLongUnaligned(b, off + Unsafe.ARRAY_BYTE_BASE_OFFSET, false); | ||||||||
| } | ||||||||
|
|
||||||||
| /** | ||||||||
|
|
@@ -195,28 +204,9 @@ public static final long get64(byte[] b, int off) { | |||||||
| * | ||||||||
| */ | ||||||||
| public static final int get32S(byte[] b, int off) { | ||||||||
| return (get16(b, off) | (get16(b, off+2) << 16)); | ||||||||
| } | ||||||||
|
|
||||||||
| // fields access methods | ||||||||
| static final int CH(byte[] b, int n) { | ||||||||
| return b[n] & 0xff ; | ||||||||
| } | ||||||||
|
|
||||||||
| static final int SH(byte[] b, int n) { | ||||||||
| return (b[n] & 0xff) | ((b[n + 1] & 0xff) << 8); | ||||||||
| } | ||||||||
|
|
||||||||
| static final long LG(byte[] b, int n) { | ||||||||
| return ((SH(b, n)) | (SH(b, n + 2) << 16)) & 0xffffffffL; | ||||||||
| } | ||||||||
|
|
||||||||
| static final long LL(byte[] b, int n) { | ||||||||
| return (LG(b, n)) | (LG(b, n + 4) << 32); | ||||||||
| } | ||||||||
|
|
||||||||
| static final long GETSIG(byte[] b) { | ||||||||
| return LG(b, 0); | ||||||||
| Preconditions.checkIndex(off, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| Preconditions.checkIndex(off + 3, b.length, Preconditions.AIOOBE_FORMATTER); | ||||||||
| return UNSAFE.getIntUnaligned(b, off + Unsafe.ARRAY_BYTE_BASE_OFFSET, false); | ||||||||
| } | ||||||||
|
|
||||||||
| /* | ||||||||
|
|
@@ -231,56 +221,56 @@ static final long GETSIG(byte[] b) { | |||||||
|
|
||||||||
|
|
||||||||
| // local file (LOC) header fields | ||||||||
| static final long LOCSIG(byte[] b) { return LG(b, 0); } // signature | ||||||||
| static final int LOCVER(byte[] b) { return SH(b, 4); } // version needed to extract | ||||||||
| static final int LOCFLG(byte[] b) { return SH(b, 6); } // general purpose bit flags | ||||||||
| static final int LOCHOW(byte[] b) { return SH(b, 8); } // compression method | ||||||||
| static final long LOCTIM(byte[] b) { return LG(b, 10);} // modification time | ||||||||
| static final long LOCCRC(byte[] b) { return LG(b, 14);} // crc of uncompressed data | ||||||||
| static final long LOCSIZ(byte[] b) { return LG(b, 18);} // compressed data size | ||||||||
| static final long LOCLEN(byte[] b) { return LG(b, 22);} // uncompressed data size | ||||||||
| static final int LOCNAM(byte[] b) { return SH(b, 26);} // filename length | ||||||||
| static final int LOCEXT(byte[] b) { return SH(b, 28);} // extra field length | ||||||||
| static final long LOCSIG(byte[] b) { return get32(b, 0); } // signature | ||||||||
| static final int LOCVER(byte[] b) { return get16(b, 4); } // version needed to extract | ||||||||
| static final int LOCFLG(byte[] b) { return get16(b, 6); } // general purpose bit flags | ||||||||
| static final int LOCHOW(byte[] b) { return get16(b, 8); } // compression method | ||||||||
| static final long LOCTIM(byte[] b) { return get32(b, 10);} // modification time | ||||||||
| static final long LOCCRC(byte[] b) { return get32(b, 14);} // crc of uncompressed data | ||||||||
| static final long LOCSIZ(byte[] b) { return get32(b, 18);} // compressed data size | ||||||||
| static final long LOCLEN(byte[] b) { return get32(b, 22);} // uncompressed data size | ||||||||
| static final int LOCNAM(byte[] b) { return get16(b, 26);} // filename length | ||||||||
| static final int LOCEXT(byte[] b) { return get16(b, 28);} // extra field length | ||||||||
|
|
||||||||
| // extra local (EXT) header fields | ||||||||
| static final long EXTCRC(byte[] b) { return LG(b, 4);} // crc of uncompressed data | ||||||||
| static final long EXTSIZ(byte[] b) { return LG(b, 8);} // compressed size | ||||||||
| static final long EXTLEN(byte[] b) { return LG(b, 12);} // uncompressed size | ||||||||
| static final long EXTCRC(byte[] b) { return get32(b, 4);} // crc of uncompressed data | ||||||||
| static final long EXTSIZ(byte[] b) { return get32(b, 8);} // compressed size | ||||||||
| static final long EXTLEN(byte[] b) { return get32(b, 12);} // uncompressed size | ||||||||
|
|
||||||||
| // end of central directory header (END) fields | ||||||||
| static final int ENDSUB(byte[] b) { return SH(b, 8); } // number of entries on this disk | ||||||||
| static final int ENDTOT(byte[] b) { return SH(b, 10);} // total number of entries | ||||||||
| static final long ENDSIZ(byte[] b) { return LG(b, 12);} // central directory size | ||||||||
| static final long ENDOFF(byte[] b) { return LG(b, 16);} // central directory offset | ||||||||
| static final int ENDCOM(byte[] b) { return SH(b, 20);} // size of ZIP file comment | ||||||||
| static final int ENDCOM(byte[] b, int off) { return SH(b, off + 20);} | ||||||||
|
|
||||||||
| // zip64 end of central directory recoder fields | ||||||||
| static final long ZIP64_ENDTOD(byte[] b) { return LL(b, 24);} // total number of entries on disk | ||||||||
| static final long ZIP64_ENDTOT(byte[] b) { return LL(b, 32);} // total number of entries | ||||||||
| static final long ZIP64_ENDSIZ(byte[] b) { return LL(b, 40);} // central directory size | ||||||||
| static final long ZIP64_ENDOFF(byte[] b) { return LL(b, 48);} // central directory offset | ||||||||
| static final long ZIP64_LOCOFF(byte[] b) { return LL(b, 8);} // zip64 end offset | ||||||||
| static final int ENDSUB(byte[] b) { return get16(b, 8); } // number of entries on this disk | ||||||||
| static final int ENDTOT(byte[] b) { return get16(b, 10);} // total number of entries | ||||||||
| static final long ENDSIZ(byte[] b) { return get32(b, 12);} // central directory size | ||||||||
| static final long ENDOFF(byte[] b) { return get32(b, 16);} // central directory offset | ||||||||
| static final int ENDCOM(byte[] b) { return get16(b, 20);} // size of ZIP file comment | ||||||||
| static final int ENDCOM(byte[] b, int off) { return get16(b, off + 20);} | ||||||||
|
|
||||||||
| // zip64 end of central directory record fields | ||||||||
| static final long ZIP64_ENDTOD(byte[] b) { return get64S(b, 24);} // total number of entries on disk | ||||||||
| static final long ZIP64_ENDTOT(byte[] b) { return get64S(b, 32);} // total number of entries | ||||||||
| static final long ZIP64_ENDSIZ(byte[] b) { return get64S(b, 40);} // central directory size | ||||||||
| static final long ZIP64_ENDOFF(byte[] b) { return get64S(b, 48);} // central directory offset | ||||||||
| static final long ZIP64_LOCOFF(byte[] b) { return get64S(b, 8);} // zip64 end offset | ||||||||
|
|
||||||||
| // central directory header (CEN) fields | ||||||||
| static final long CENSIG(byte[] b, int pos) { return LG(b, pos + 0); } | ||||||||
| static final int CENVEM(byte[] b, int pos) { return SH(b, pos + 4); } | ||||||||
| static final int CENVEM_FA(byte[] b, int pos) { return CH(b, pos + 5); } // file attribute compatibility | ||||||||
| static final int CENVER(byte[] b, int pos) { return SH(b, pos + 6); } | ||||||||
| static final int CENFLG(byte[] b, int pos) { return SH(b, pos + 8); } | ||||||||
| static final int CENHOW(byte[] b, int pos) { return SH(b, pos + 10);} | ||||||||
| static final long CENTIM(byte[] b, int pos) { return LG(b, pos + 12);} | ||||||||
| static final long CENCRC(byte[] b, int pos) { return LG(b, pos + 16);} | ||||||||
| static final long CENSIZ(byte[] b, int pos) { return LG(b, pos + 20);} | ||||||||
| static final long CENLEN(byte[] b, int pos) { return LG(b, pos + 24);} | ||||||||
| static final int CENNAM(byte[] b, int pos) { return SH(b, pos + 28);} | ||||||||
| static final int CENEXT(byte[] b, int pos) { return SH(b, pos + 30);} | ||||||||
| static final int CENCOM(byte[] b, int pos) { return SH(b, pos + 32);} | ||||||||
| static final int CENDSK(byte[] b, int pos) { return SH(b, pos + 34);} | ||||||||
| static final int CENATT(byte[] b, int pos) { return SH(b, pos + 36);} | ||||||||
| static final long CENATX(byte[] b, int pos) { return LG(b, pos + 38);} | ||||||||
| static final int CENATX_PERMS(byte[] b, int pos) { return SH(b, pos + 40);} // posix permission data | ||||||||
| static final long CENOFF(byte[] b, int pos) { return LG(b, pos + 42);} | ||||||||
| static final long CENSIG(byte[] b, int pos) { return get32(b, pos + 0); } | ||||||||
| static final int CENVEM(byte[] b, int pos) { return get16(b, pos + 4); } | ||||||||
| static final int CENVEM_FA(byte[] b, int pos) { return Byte.toUnsignedInt(b[pos + 5]); } // file attribute compatibility | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you consider introducing
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I considered it, but since |
||||||||
| static final int CENVER(byte[] b, int pos) { return get16(b, pos + 6); } | ||||||||
| static final int CENFLG(byte[] b, int pos) { return get16(b, pos + 8); } | ||||||||
| static final int CENHOW(byte[] b, int pos) { return get16(b, pos + 10);} | ||||||||
| static final long CENTIM(byte[] b, int pos) { return get32(b, pos + 12);} | ||||||||
| static final long CENCRC(byte[] b, int pos) { return get32(b, pos + 16);} | ||||||||
| static final long CENSIZ(byte[] b, int pos) { return get32(b, pos + 20);} | ||||||||
| static final long CENLEN(byte[] b, int pos) { return get32(b, pos + 24);} | ||||||||
| static final int CENNAM(byte[] b, int pos) { return get16(b, pos + 28);} | ||||||||
| static final int CENEXT(byte[] b, int pos) { return get16(b, pos + 30);} | ||||||||
| static final int CENCOM(byte[] b, int pos) { return get16(b, pos + 32);} | ||||||||
| static final int CENDSK(byte[] b, int pos) { return get16(b, pos + 34);} | ||||||||
| static final int CENATT(byte[] b, int pos) { return get16(b, pos + 36);} | ||||||||
| static final long CENATX(byte[] b, int pos) { return get32(b, pos + 38);} | ||||||||
| static final int CENATX_PERMS(byte[] b, int pos) { return get16(b, pos + 40);} // posix permission data | ||||||||
| static final long CENOFF(byte[] b, int pos) { return get32(b, pos + 42);} | ||||||||
|
|
||||||||
| // The END header is followed by a variable length comment of size < 64k. | ||||||||
| static final long END_MAXLEN = 0xFFFF + ENDHDR; | ||||||||
|
|
@@ -293,16 +283,16 @@ static void loadLibrary() { | |||||||
| jdk.internal.loader.BootLoader.loadLibrary("zip"); | ||||||||
| } | ||||||||
|
|
||||||||
| private static final Unsafe unsafe = Unsafe.getUnsafe(); | ||||||||
| private static final Unsafe UNSAFE = Unsafe.getUnsafe(); | ||||||||
|
|
||||||||
| private static final long byteBufferArrayOffset = unsafe.objectFieldOffset(ByteBuffer.class, "hb"); | ||||||||
| private static final long byteBufferOffsetOffset = unsafe.objectFieldOffset(ByteBuffer.class, "offset"); | ||||||||
| private static final long byteBufferArrayOffset = UNSAFE.objectFieldOffset(ByteBuffer.class, "hb"); | ||||||||
| private static final long byteBufferOffsetOffset = UNSAFE.objectFieldOffset(ByteBuffer.class, "offset"); | ||||||||
|
|
||||||||
| static byte[] getBufferArray(ByteBuffer byteBuffer) { | ||||||||
| return (byte[]) unsafe.getReference(byteBuffer, byteBufferArrayOffset); | ||||||||
| return (byte[]) UNSAFE.getReference(byteBuffer, byteBufferArrayOffset); | ||||||||
| } | ||||||||
|
|
||||||||
| static int getBufferOffset(ByteBuffer byteBuffer) { | ||||||||
| return unsafe.getInt(byteBuffer, byteBufferOffsetOffset); | ||||||||
| return UNSAFE.getInt(byteBuffer, byteBufferOffsetOffset); | ||||||||
| } | ||||||||
| } | ||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a matter of personal preference, but I think
GETSIG(buf, i) == ENDSIGreads better thanget32(buf, i) == ENDSIG.The fact that it's 32 bits is kind of a detail and it doesn't reveal as well that we intend to read a signature.
So could we keep GETSIG, but add an index? There are places in
ZipInputStreamas well which could make use of that for signature checking. (But maybe not for this PR)Alternatively,
ENDSIG(buf, i) == ENDSIGwould be consistent withCENSIG(buf, i)uses.Same applies to the other GETSIG replacements in this file.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the
GETSIG(byte[])methods are quite nasty, and it's all used very inconsistently. I wouldn't mind going the other way and removing all theCENSIG,LOCNAMetc methods and just callget16/32/32S/64(buf, ZipConstants.FOO)as appropriate. Add a comment at eachZipConstantsentry about exactly how many bytes are in the field, if it's unsigned or signed. Then at least there's a reference to something that looks like a specification, not a mix of constants and single-purpose offset getters scattered around (which doesn't even reference the constants inZipConstants).