Fix decoding UTF-8 constant pool entries #150
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Constant pool entries of type
CONSTANT_Utf8_info
do not, despitethe name, use the UTF-8 encoding. They use a modified variant
of UTF-8, as specified by
java.io.Data{Input,Output}
.When decoding these entries, the
Indexer.decodeUtf8Entry
methodinterpreted the data as UTF-8. This didn't cause issues, because
the constants are usually human-readable strings that fit the ASCII
table, in which case the two encodings do not differ.
In case of machine-generated content, the difference may easily
occur; for example in case of a string that contains the null
character. One realistic example is the Kotlin standard library
JAR, where the
kotlin/collections/ArraysKt___ArraysKt.class
classcontains a
@KotlinMetadata
annotation whosed1
member containssuch "weird" string.
The fix is simple: use
DataInputStream
to read aString
out ofthe byte array.
Fixes #49