IllegalStateException during indexing #175

attila-vegh · 2022-03-07T09:30:40Z

I got the next exception from Jandex library:

Caused by: java.lang.IllegalStateException: Class not found in class table:scala.Function18$$anonfun$curried$1
at org.jboss.jandex.IndexWriterV2.positionOf (IndexWriterV2.java:507)
at org.jboss.jandex.IndexWriterV2.writeTypeEntry (IndexWriterV2.java:789)
at org.jboss.jandex.IndexWriterV2.writeTypeTable (IndexWriterV2.java:271)
at org.jboss.jandex.IndexWriterV2.write (IndexWriterV2.java:211)
at org.jboss.jandex.IndexWriter.write (IndexWriter.java:110)
at org.jboss.jandex.IndexWriter.write (IndexWriter.java:74)
at org.jboss.jandex.JarIndexer.createJarIndex (JarIndexer.java:193)

Please investigate this issue. thanks

Ladicek · 2022-03-07T09:53:57Z

I'd gladly investigate the issue, if you provided a way to reproduce. Since the stack trace starts at JarIndexer.createJarIndex, I assume you have a JAR file that you're trying to index. Is that JAR file available somewhere, or is it possible to build it from source that is available somewhere?

attila-vegh · 2022-03-07T11:43:48Z

The jar file is our company build.
I`m sending a screenshot where the "same" object is created twice
Dotname@957 vs DotName2717

I think the type is constructed from scala jar, what is a dependency jar for any other library file.
Our jar does not use scala directly.

Ladicek · 2022-03-07T11:54:20Z

The Scala library, at least the one I'm looking at (org.scala:scala-library:2.13.8) doesn't contain scala.Function18$$anonfun$curried$1. I'll need a way to reproduce the issue, I don't see a way around it.

Your screenshot is unfortunately useless. The nameTable in IndexWriterV2 is a HashMap and DotName provides equality and hash code, so the fact that there are 2 different instances of DotName with the same content doesn't mean anything.

attila-vegh · 2022-03-07T12:58:38Z

We are decided to filter out indexing scala jars. Now the indexing is ok

attila-vegh · 2022-03-07T13:00:43Z

The Function is inside the: scala-library-2.9.1.jar

Ladicek · 2022-03-07T13:18:23Z

Thank you! Indeed running java -jar core/target/jandex-3.0.0-SNAPSHOT.jar ~/Downloads/scala-library-2.9.1-1.jar fails with the same exception:

$ java -jar core/target/jandex-3.0.0-SNAPSHOT.jar ~/Downloads/scala-library-2.9.1-1.jar 
java.lang.IllegalStateException: Class not found in class table:scala.collection.TraversableLike$$anonfun$isEmpty$1
        at org.jboss.jandex.IndexWriterV2.positionOf(IndexWriterV2.java:491)
        at org.jboss.jandex.IndexWriterV2.writeTypeEntry(IndexWriterV2.java:801)
        at org.jboss.jandex.IndexWriterV2.writeTypeTable(IndexWriterV2.java:264)
        at org.jboss.jandex.IndexWriterV2.write(IndexWriterV2.java:206)
        at org.jboss.jandex.IndexWriter.write(IndexWriter.java:102)
        at org.jboss.jandex.IndexWriter.write(IndexWriter.java:66)
        at org.jboss.jandex.JarIndexer.createJarIndex(JarIndexer.java:197)
        at org.jboss.jandex.JarIndexer.createJarIndex(JarIndexer.java:88)
        at org.jboss.jandex.Main.getIndex(Main.java:87)
        at org.jboss.jandex.Main.execute(Main.java:67)
        at org.jboss.jandex.Main.main(Main.java:52)

I'll look into it.

In the meantime, I believe what you did seems right, indexing the Scala classes would most likely be useless for you.

Ladicek · 2022-03-08T10:06:56Z

Reopening, I can reproduce the issue by indexing a single file: scala/Function2$$anonfun$curried$1$$anonfun$apply$1.class. That should itself work fine, because Jandex is explicitly designed for incomplete classpaths. Yet I get the exception when writing the index through IndexWriterV2.

It also turns out that the hint at 2 different DotName objects is relevant. When the type table is being serialized, it seems the name table contains 2 entries with a key of scala.Function2$$anonfun$curried$1. Those 2 DotNames are differently componentized, and there may be a bug either in the name destructuring logic, or in equality/hash code.

Looking more into it.

Ladicek · 2022-03-08T10:09:59Z

One more thing: I recall we adjusted DotName parsing to account for $ at the beginning or at the end of a name component. Since the class names in question here contain $$, I'm thinking maybe those adjustments are at fault. Need to check.

Ladicek · 2022-03-08T11:32:07Z

The changes we did to account for $ at first/last position don't seem to be related.

Ladicek · 2022-03-08T14:11:19Z

OK so DotName.hashCode seems fine, hash codes of those 2 different DotNames are equal. DotName.equals is more suspicious, but I guess that one of the DotName's core assumptions is that componentization is canonical (there's exactly one componentization for each valid name). In other words, the problem is probably in the code that creates the componentized DotNames in the first place.

Ladicek · 2022-03-08T14:15:33Z

(Side note: to be completely fair, implementing DotNames equality by delegating to DotName.toString's equality would probably be easiest and I'm not convinced it would be much slower :-) )

Ladicek · 2022-03-08T15:01:41Z

So the 1st DotName is created by parsing a field, whose descriptor is scala/Function2$$anonfun$curried$1. This is componentized by NameTable.convertToName as:

{local = "scala", innerClass = false}
{local = "Function2$", innerClass = false}
{local = "anonfun", innerClass = true}
{local = "curried", innerClass = true}
{local = "1", innerClass = true}

The same DotName would later be created by parsing a method parameter, whose descriptor is identical, but that is correctly found in the intern pool.

The 2nd DotName is created by parsing a generic signature of that same method, which reads (Lscala/Function2<TT1;TT2;TR;>.$anonfun$curried$1;)V. That is a little strange, but valid.

The generic signature parser first finds a type name of scala.Function2. That name is componentized by NameTable.convertToName as:

{local = "scala", innerClass = false}
{local = "Function2", innerClass = false}

Then, the generic signature parser finds and parses the type arguments, and then it looks for nested types. Those are delimited by ., so there's just one: $anonfun$curried$1. NameTable.wrap is used to create a DotName whose prefix is the 2-element name shown above and local name is $anonfun$curried$1. In total, it looks like this:

{local = "scala", innerClass = false}
{local = "Function2", innerClass = false}
{local = "$anonfun$curried$1", innerClass = true}

Each case makes perfect sense when taken in isolation. However, looking at both of them at the same time, I'm starting to believe that DotName componentization most likely should NOT be assumed to be canonical. In other words, DotName.equals is wrong when both names are componentized.

attila-vegh · 2022-03-08T15:56:39Z

It is possible to somehow resolve this?

Ladicek · 2022-03-08T16:00:01Z

It is possible. I'm still working on how to do it best.

Ladicek · 2022-03-08T16:05:02Z

Hi @n1hility, could you please comment on how DotName componentization is supposed to work? At the moment, it seems to me that DotName.equals assumes that there's always exactly 1 componentization of any given name, but I've discovered (see above if you have plenty of time) that that's not really the case.

Ladicek · 2022-03-09T07:45:33Z

I realized yesterday evening that the generic signature parser is probably at fault, because it does not componentize the inner class name. I'll check today.

Ladicek · 2022-03-09T10:11:12Z

I came up with a relatively simple change that makes the generic signature parser componentize the inner class name and end up with the following DotName:

{local = "scala", innerClass = false}
{local = "Function2", innerClass = false}
{local = "$anonfun", innerClass = true}
{local = "curried", innerClass = true}
{local = "1", innerClass = true}

This is almost identical to the DotName created by parsing the descriptor, shown in one of the previous comments. The only difference is the placement of that one extra $.

Fixing that will likely affect the treatment of $ in the first/last position, as I mentioned above. Will continue digging.

Ladicek · 2022-03-09T10:26:23Z

I think I just got enlightened :-) The problem with a descriptor like scala/Function2$$anonfun$curried$1 is that it is literally impossible to know if any given $ is a nesting separator or a true part of the name. Therefore, the only possible canonical componentization of a name like this is to split on every occurence of $, which leads to introduction of empty local names.

We eliminated empty local names previously (as I mentioned several times, by treating $ in the first/last position specially), which leads to ambiguity: should we split greedily (leading to Function2 + $anonfun), or eagerly (leading to Function2$ + anonfun). Any given answer to this question may be wrong, which will manifest in a situation like this, when there's another source of the name which leads to a different componentization (in this case, it's the generic signature).

The only question is, well, now what :-)

Ladicek · 2022-03-09T13:11:37Z

I have a patch locally that allows (not sure if correctly :-) ) indexing scala-library-2.9.1-1.jar. That patch attempts to restore canonical DotName componentization.

However, looking more at DotName code, it only guards against . separators when constructing componentized instances. It seems $ is allowed to be mixed relatively freely. Sometimes, it's part of the local name, and sometimes, it's a nested type separator. Under that condition, componentization cannot be canonical, and hence DotName.equals() is wrong when both instances are componentized.

n1hility · 2022-03-11T07:29:36Z

@Ladicek sorry for the delay I will take a look it this tomorrow (today for you). Skimming it sounds like you are on the right track. The intention is that it's not precise (since not enough info), but it should be deterministic.

Ladicek · 2022-03-11T08:24:18Z

@n1hility thanks, I think I've got it now. I need to fix DotName.equals(), which will be interesting (I found in history that it was specifically refactored to avoid allocations etc.), but doable.

Ladicek · 2022-03-11T15:11:40Z

I have a fixed version of DotName.equals that allows successfully indexing the entire scala-library-2.9.1-1.jar. Unfortunately, reading the index back fails, so the fix isn't exactly a fix :-) Will continue digging.

Ladicek · 2022-03-11T17:09:35Z

Reproducing the additional failure is possible by indexing one file (scala/collection/parallel/ParSeqLike$Elements$$anonfun$psplit$3$$anon$3.class) and then serializing and deserializing that index.

Of course, it's once again caused by a piece of code that assumes that componentization is canonical. This time, it's the code that reads/writes the name table.

Specifically, these 2 names are serialized next to each other:

scala.collection.parallel.ParSeqLike$Elements$$anonfun$psplit
scala.collection.parallel.ParSeqLike$Elements$$anonfun$psplit$3

The first is componentized like this:

{local = "scala", innerClass = false}
{local = "collection", innerClass = false}
{local = "parallel", innerClass = false}
{local = "ParSeqLike", innerClass = false}
{local = "Elements$", innerClass = true}
{local = "anonfun", innerClass = true}
{local = "psplit", innerClass = true}

While the second is componentized like this:

{local = "scala", innerClass = false}
{local = "collection", innerClass = false}
{local = "parallel", innerClass = false}
{local = "ParSeqLike", innerClass = false}
{local = "Elements$$anonfun$psplit$3", innerClass = true}

The code that constructs componentized DotNames during deserialization completely miscalculates the name depths and from there, everything goes wrong.

Will think about a fix next week.

Ladicek · 2022-03-11T17:11:53Z

(Adding here just for the fun: after the index reader miscalculates the name depths, it even creates non-existing names such as scala.collection.parallel.ParSeqLike$Elements$$anonfun$psplit$3$3$ :-) )

n1hility · 2022-03-12T00:45:47Z

@Ladicek ok thanks. it's all yours. One thing to check out is the sorting is correct, since these names are written in a radix tree order where each branch is preceded by its peers and finally the immediate parent (prefix())

Ladicek · 2022-03-14T07:50:21Z

Yea the sorting is correct. It's the assumption that it's possible, when deserializing a name, to get the correct prefix by taking the previously deserialized name and call prefix() on it N times, where N is read from the serialized format of the name. That assumes canonical componentization.

I realized over the weekend that there's a simple fix. Instead of writing a depth of the name, which will later be used to find the prefix, we can just store an index of the prefix in the name table. Since names are written in order, the prefix must have already been written, so we can simply look it up from the table, instead of inspecting the previously deserialized name. In case of huge JARs with 1000s classes, indices can be large numbers and the total index size could increase. That can be mitigated relatively easily: instead of storing an [absolute] index, we can store a [relative] offset. Since offsets would always be negative, we would actually store the offset modified like this: -1 would be stored as 0, -2 would be stored as 1, -3 would be stored as 2, etc. To sum up, if int currentIndex is an index of the currently serialized name and int prefixIndex is an index of the prefix, we would write currentIndex - prefixIndex - 1.

That obviously requires a new persistent format version, so can't be fixed in 2.4.

EDIT: the offset format described above actually can't be used, because a prefix of null requires extra treatment. So we'll use 0 to mean a prefix of null, otherwise we'll store an absolute value of the prefix offset (so currentIndex - prefixIndex).

Ladicek · 2022-03-14T09:52:15Z

Aaaand with the 2 changes described above (fixed DotName.equals when both names are componentized, and fixed serialization/deserialization of name prefix), the entire scala-library-2.9.1-1.jar file can be successfully indexed and the index can be successfully serialized and deserialized again. Yay! :-)

Ladicek · 2022-03-15T16:37:20Z

I could write a test for the DotName.equals bug relatively easily, but I'm struggling to write a test for the name prefix serialization/deserialization issue. This small snippet of Scala code, compiled with Scala 2.11.12, reproduces the issue:

package test

trait Func[-T1, -T2, +R] {
  def apply(v1: T1, v2: T2): R

  def curried: T1 => T2 => R = {
    (x1: T1) => (x2: T2) => apply(x1, x2)
  }
}

But I'm struggling to write an equivalent Java code that would also exhibit the issue. Will continue digging for some time, not sure what to do if I'm not successful.

Ladicek · 2022-03-15T16:51:07Z

I just realized that I could just put the Scala source code into the test-data submodule and be done with it. There's a Maven plugin for Scala that can be used to compile it. The path of least resistance! :-)

Ladicek · 2022-03-15T17:05:17Z

OK this is embarrassing. I spent more than a day trying to find a reproducer for the name prefix serialization/deserialization issue, and it looks like my test for DotName.equals reproduces the problem too, if I roundtrip that index. The reproducer is more subtle, because deserializing the index doesn't fail, it just builds a nonsensical index in memory. But that's good enough for me.

Ladicek · 2022-03-15T17:30:05Z

Fixes are in #176. Due to the need to change persistent format, this can't be backported to 2.x, unfortunately.

Ladicek · 2022-03-23T07:39:06Z

Fixed in #176.

attila-vegh closed this as completed Mar 7, 2022

Ladicek reopened this Mar 8, 2022

Ladicek mentioned this issue Mar 15, 2022

DotName componentization #176

Merged

Ladicek linked a pull request Mar 15, 2022 that will close this issue

DotName componentization #176

Merged

Ladicek self-assigned this Mar 16, 2022

Ladicek added this to the 3.0.0 milestone Mar 16, 2022

Ladicek closed this as completed Mar 23, 2022

Ladicek added the serialization-format-change Changes that affect index serialization format label Apr 8, 2022

Ladicek mentioned this issue Apr 19, 2022

add IndexView.getKnownDirectSubinterfaces and getAllKnownSubinterfaces #184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IllegalStateException during indexing #175

IllegalStateException during indexing #175

attila-vegh commented Mar 7, 2022

Ladicek commented Mar 7, 2022

attila-vegh commented Mar 7, 2022

Ladicek commented Mar 7, 2022

attila-vegh commented Mar 7, 2022

attila-vegh commented Mar 7, 2022 •

edited

Loading

Ladicek commented Mar 7, 2022

Ladicek commented Mar 8, 2022 •

edited

Loading

Ladicek commented Mar 8, 2022

Ladicek commented Mar 8, 2022

Ladicek commented Mar 8, 2022 •

edited

Loading

Ladicek commented Mar 8, 2022 •

edited

Loading

Ladicek commented Mar 8, 2022

attila-vegh commented Mar 8, 2022

Ladicek commented Mar 8, 2022

Ladicek commented Mar 8, 2022

Ladicek commented Mar 9, 2022

Ladicek commented Mar 9, 2022

Ladicek commented Mar 9, 2022 •

edited

Loading

Ladicek commented Mar 9, 2022

n1hility commented Mar 11, 2022

Ladicek commented Mar 11, 2022

Ladicek commented Mar 11, 2022

Ladicek commented Mar 11, 2022

Ladicek commented Mar 11, 2022

n1hility commented Mar 12, 2022

Ladicek commented Mar 14, 2022 •

edited

Loading

Ladicek commented Mar 14, 2022

Ladicek commented Mar 15, 2022

Ladicek commented Mar 15, 2022 •

edited

Loading

Ladicek commented Mar 15, 2022

Ladicek commented Mar 15, 2022

Ladicek commented Mar 23, 2022

IllegalStateException during indexing #175

IllegalStateException during indexing #175

Comments

attila-vegh commented Mar 7, 2022

Ladicek commented Mar 7, 2022

attila-vegh commented Mar 7, 2022

Ladicek commented Mar 7, 2022

attila-vegh commented Mar 7, 2022

attila-vegh commented Mar 7, 2022 • edited Loading

Ladicek commented Mar 7, 2022

Ladicek commented Mar 8, 2022 • edited Loading

Ladicek commented Mar 8, 2022

Ladicek commented Mar 8, 2022

Ladicek commented Mar 8, 2022 • edited Loading

Ladicek commented Mar 8, 2022 • edited Loading

Ladicek commented Mar 8, 2022

attila-vegh commented Mar 8, 2022

Ladicek commented Mar 8, 2022

Ladicek commented Mar 8, 2022

Ladicek commented Mar 9, 2022

Ladicek commented Mar 9, 2022

Ladicek commented Mar 9, 2022 • edited Loading

Ladicek commented Mar 9, 2022

n1hility commented Mar 11, 2022

Ladicek commented Mar 11, 2022

Ladicek commented Mar 11, 2022

Ladicek commented Mar 11, 2022

Ladicek commented Mar 11, 2022

n1hility commented Mar 12, 2022

Ladicek commented Mar 14, 2022 • edited Loading

Ladicek commented Mar 14, 2022

Ladicek commented Mar 15, 2022

Ladicek commented Mar 15, 2022 • edited Loading

Ladicek commented Mar 15, 2022

Ladicek commented Mar 15, 2022

Ladicek commented Mar 23, 2022

attila-vegh commented Mar 7, 2022 •

edited

Loading

Ladicek commented Mar 8, 2022 •

edited

Loading

Ladicek commented Mar 8, 2022 •

edited

Loading

Ladicek commented Mar 8, 2022 •

edited

Loading

Ladicek commented Mar 9, 2022 •

edited

Loading

Ladicek commented Mar 14, 2022 •

edited

Loading

Ladicek commented Mar 15, 2022 •

edited

Loading