Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decoding UTF-8 constant pool entries #150

Merged
merged 1 commit into from
Oct 4, 2021

Conversation

Ladicek
Copy link
Contributor

@Ladicek Ladicek commented Oct 4, 2021

Constant pool entries of type CONSTANT_Utf8_info do not, despite
the name, use the UTF-8 encoding. They use a modified variant
of UTF-8, as specified by java.io.Data{Input,Output}.

When decoding these entries, the Indexer.decodeUtf8Entry method
interpreted the data as UTF-8. This didn't cause issues, because
the constants are usually human-readable strings that fit the ASCII
table, in which case the two encodings do not differ.

In case of machine-generated content, the difference may easily
occur; for example in case of a string that contains the null
character. One realistic example is the Kotlin standard library
JAR, where the kotlin/collections/ArraysKt___ArraysKt.class class
contains a @KotlinMetadata annotation whose d1 member contains
such "weird" string.

The fix is simple: use DataInputStream to read a String out of
the byte array.

Fixes #49

Constant pool entries of type `CONSTANT_Utf8_info` do not, despite
the name, use the UTF-8 encoding. They use a modified variant
of UTF-8, as specified by `java.io.Data{Input,Output}`.

When decoding these entries, the `Indexer.decodeUtf8Entry` method
interpreted the data as UTF-8. This didn't cause issues, because
the constants are usually human-readable strings that fit the ASCII
table, in which case the two encodings do not differ.

In case of machine-generated content, the difference may easily
occur; for example in case of a string that contains the null
character. One realistic example is the Kotlin standard library
JAR, where the `kotlin/collections/ArraysKt___ArraysKt.class` class
contains a `@KotlinMetadata` annotation whose `d1` member contains
such "weird" string.

The fix is simple: use `DataInputStream` to read a `String` out of
the byte array.
@Ladicek Ladicek added this to the 3.0.0 milestone Oct 4, 2021
@Ladicek Ladicek merged commit aac1389 into smallrye:smallrye Oct 4, 2021
@Ladicek Ladicek deleted the utf8-constant-encoding branch October 4, 2021 11:05
@Ladicek Ladicek linked an issue Oct 4, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception when writing big indices
1 participant