ExtendedCfgNodeMethods: rename type parameter #4936

xavierpinho · 2024-09-18T21:10:34Z

@tajobe has been doing some byte code surgery in order to create a Kotlin interoperability layer and found an interesting edge case when type parameters are shadowed.

Taking a simpler example, consider the following Scala code:

class Example[NodeType <: Object](node: NodeType) extends AnyVal{
  def foo[NodeType](x: List[NodeType]): List[NodeType] = x
}

When decompiling the foo method, we'll find the following:

public static <NodeType extends java.lang.Object, NodeType extends java.lang.Object> scala.collection.immutable.List<NodeType> foo$extension(java.lang.Object, scala.collection.immutable.List<NodeType>);
    descriptor: (Ljava/lang/Object;Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=3, locals=2, args_size=2
         0: getstatic     #18                 // Field Example$.MODULE$:LExample$;
         3: aload_0
         4: aload_1
         5: invokevirtual #25                 // Method Example$.foo$extension:(Ljava/lang/Object;Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
         8: areturn
    Signature: #23                          // <NodeType:Ljava/lang/Object;NodeType:Ljava/lang/Object;>(Ljava/lang/Object;Lscala/collection/immutable/List<TNodeType;>;)Lscala/collection/immutable/List<TNodeType;>;

Notice how NodeType (the type parameter) occurs twice in the definition of foo$extension. Apparently the JVM (assuming Oracle as canonical) is designed to accept this when loading/linking, but guarantees nothing if the byte code is interpreted through reflection/etc, cf. https://docs.oracle.com/javase/specs/jvms/se23/html/jvms-4.html#jvms-4.7.9:

Oracle's Java Virtual Machine implementation does not check the well-formedness of Signature attributes during class loading or linking. Instead, Signature attributes are checked by methods of the Java SE Platform class libraries which expose generic signatures of classes, interfaces, constructors, methods, and fields. Examples include getGenericSuperclass in Class and toGenericString in java.lang.reflect.Executable.

This byte code poses a problem that is trivially fixed if we prevent shadowing by just renaming one of the type parameters, e.g.

class Example[CfgNodeType <: Object](node: CfgNodeType) extends AnyVal{
  def foo[NodeType](x: List[NodeType]): List[NodeType] = x
}

resulting in

public static <NodeType extends java.lang.Object, CfgNodeType extends java.lang.Object> scala.collection.immutable.List<NodeType> foo$extension(java.lang.Object, scala.collection.immutable.List<NodeType>);
    descriptor: (Ljava/lang/Object;Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=3, locals=2, args_size=2
         0: getstatic     #18                 // Field Example$.MODULE$:LExample$;
         3: aload_0
         4: aload_1
         5: invokevirtual #25                 // Method Example$.foo$extension:(Ljava/lang/Object;Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
         8: areturn
    Signature: #23                          // <NodeType:Ljava/lang/Object;CfgNodeType:Ljava/lang/Object;>(Ljava/lang/Object;Lscala/collection/immutable/List<TNodeType;>;)Lscala/collection/immutable/List<TNodeType;>;

Compiling Joern with -Wshadow:type-parameter-shadow only warns about 2 sites: the one in this PR and another one buried in javasrc2cpg. The latter is not public-facing, so I won't even bother proposing to change it at this point.

This rename is purely syntactical and seems to easily solve a bigger hurdle. Wdyt?

mpollmeier

I absolutely think we should merge this and also enable the type param shadowing warning by default (will create a separate PR shortly), for the sake of readability and disambiguity.

That being said, I'm not sure I understand which specific problem this PR solves, maybe you can elaborate..?

From my POV, the two type parameters in your Example class are not related at all. They happen to have the same name, which only means that within def foo we cannot reference the NodeType type from the class Example, but apart from that there's no connection at all between them.

Semantically it's similar to something like this on the value level: the two bar variables are not related at all, they just happen to have the same name, and therefor we cannot reference the class-level bar variable from within def baz.

class Foo {
    val bar = 42

    def baz: Unit = {
      val bar = "123"
      ()
  }
}

as brought up in #4936 it can lead to confusing situations if type parameters are shadowed, so for the sake of readability and disambiguity alone we should enable this compiler warning IMO. That being said, I'd like to stress that it's not something fundamentally complicated, afaics. Simple example copied from #4936: ```scala class Example[NodeType <: Object](node: NodeType) extends AnyVal{ def foo[NodeType](x: List[NodeType]): List[NodeType] = x } ``` These two type parameters in your Example class are not related at all. They happen to have the same name, which only means that within `def foo` we cannot reference the `NodeType` type from the `class Example`, but apart from that there's no connection at all between them. Semantically it's similar to something like this on the value level: the two `bar` variables are not related at all, they just happen to have the same name, and therefor we cannot reference the class-level `bar` variable from within `def baz`. ```scala class Foo { val bar = 42 def baz: Unit = { val bar = "123" () } } ```

xavierpinho · 2024-09-19T10:47:06Z

Indeed, there's no ambiguity at Scala level: it's just another form of good ol' lexical scoping. It's only when it gets down to byte code that we find ourselves in a pickle, as we get 2 same-named type parameters in the same method, causing havoc with (I think) kotlin-reflect.

Briefly, and in the context of #4796, @tajobe has been working on a plugin to automatically generate ergonomic Kotlin bindings for the most relevant APIs. It starts by finding which methods to codegen for (based on their signature), and proceeds emitting a more ergonomic (Kotlin-wise) method definition for it. So for so good... until a wild method signature with 2 same-named type parameters appeared and kotlin-reflect bailed out.

Changing the type parameter name was the trivial workaround. No one was expecting to find a method with 2 same-named type parameters. I couldn't find anything about this edge case in various JVM-related docs online: not sure if it's undefined behaviour or I just missed something. At any rate, if someone knows more about this, I'd love to hear!

mpollmeier · 2024-09-19T13:23:57Z

Thanks for elaborating, yeah that all makes sense.

ExtendedCfgNodeMethods: rename type parameter

812c381

xavierpinho requested review from mpollmeier and DavidBakerEffendi September 18, 2024 21:10

mpollmeier approved these changes Sep 19, 2024

View reviewed changes

mpollmeier mentioned this pull request Sep 19, 2024

scalac: warn when type parameters are shadowed #4937

Merged

mpollmeier merged commit 236bd8f into master Sep 19, 2024
5 checks passed

xavierpinho deleted the xavierp/alpha-conversion branch September 19, 2024 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtendedCfgNodeMethods: rename type parameter #4936

ExtendedCfgNodeMethods: rename type parameter #4936

xavierpinho commented Sep 18, 2024

mpollmeier left a comment

xavierpinho commented Sep 19, 2024

mpollmeier commented Sep 19, 2024

ExtendedCfgNodeMethods: rename type parameter #4936

ExtendedCfgNodeMethods: rename type parameter #4936

Conversation

xavierpinho commented Sep 18, 2024

mpollmeier left a comment

Choose a reason for hiding this comment

xavierpinho commented Sep 19, 2024

mpollmeier commented Sep 19, 2024