Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider more JDK Types to be Instantiated when Building Call Graphs #230

Open
johannesduesing opened this issue Nov 8, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@johannesduesing
Copy link
Collaborator

Problem
In #229 we found that calls to java/io/PrintStream.println are not found for CG-Algorithms besides CHA when loading the JDK as interfaces. The reason is that the type iterator will not consider the type PrintStream to be instantiated, as any code instantiating it is not actually loaded.

Solution
There might / could already be some handling for this in place that either does not work anymore or does not cover PrintStream - see discussion in #229. Look into the issue and fix it, so that calls to println are actually found when loading the JDK as interfaces.

@johannesduesing
Copy link
Collaborator Author

I'm currently looking into this issue. For RTA, it's actually an easy fix, as we thought we just need to add some types to the BR references.conf under the key br.analyses.cg.InitialInstantiatedTypesKey.instantiatedTypes. For now, i added ["java/io/PrintStream", "java/io/InputStream", "java/lang/ClassLoader", "java/lang/Thread"], which represent System.out and System.in, as well as all preconfigured entry points. After publishing this locally, RTA finds calls to println for a simple HelloWorld program when loading the RTJar as interfaces.

However, for all PropagationBasedCallGraphKeys it still does not work. I looked into their implementation i am not sure that they support this form of preconfiguration as it currently stands. The tac.fpcf.analyses.cg.xta.InstantiatedTypesAnalysis has a method called assignInitialTypeSets that defines the set of instantiated types for every entrypoint - but it only ever considers types that are passed as parameters to the entrypoint method. Therefore, defining a type as "instantiated no matter the entry point" does not seem to be possible.

I'll try and see what can be done to improve this, just leaving this comment here for documentation purposes.

@errt
Copy link
Collaborator

errt commented Nov 18, 2024

Thanks for your investigation!

For RTA: Question is, do we want this. These types are instantiated by the JDK, but on other system, e.g., on Android, they might not be. We could of course decide that this is the default and everyone analyzing a different system should reconfigure their analysis then. At least we should document it though and perhaps provide an AndroidProject.conf that overrides this suitably.

For propagation-based: You are right and the current state of affairs is probably already incorrect. InitialInstantiatedTypesFinder always produces at least Class and String, because these can be stored in the constant pool, introducing them that way. As propagation-based CGs now don't consider these available in methods that don't have respective arguments, they will miss them when used from a constant. On the other hand, we probably don't want all initially instantiated types be available in any method - LibraryInstantiatedTypesFinder, e.g., produces all types that can be instantiated from external code, but they can't be accessed in methods to which they are not passed.
On a related note: We might have to look into how dynamic constants changed the reasoning that only classes and strings can come from the constant pool. I'm not sure whether this is correctly modelled right now (though with rewriting on, it should probably be fine).

@johannesduesing
Copy link
Collaborator Author

I have to be honest, i don't know much about Android, but we could ofc just leat the current reference.conf be the one designated for Android and provide an updated default for the JDK. Honestly, i don't know how big of a problem this is. I often load the JDK as interfaces only, so to me it makes sense to have some default type instantiations tracked ... but i can also deal with that myself locally, instead of contributing those changes to OPAL. There will also be a number of additional types instantiated by the JDK and accessible via Fields, so this effort will likely never be completed - all that to say that i'd be fine with not changing anything for OPAL, and dealing with the issue via local configuration changes 😄

Regarding the propagation-based instantiated analysis i do have a question: I understood that there is a TypeSetEntity called ExternalWorld (in TypeSetEntitySelector.scala) that represents any fields or methods that are virtual. I further (think) i understand that when i have a read-access from a field that is virtual, this registers a dependency to this ExternalWorld entity and will update whenever the set of instantiated types for the ExternalWorld is updated - i.e. when virtual methods are invoked or when we have a write to a virtual field. Doesn't it make more sense to assume that a read from a virtual field (the ExternalWorld) may in fact contain an instantiated object of any subtype of the field type? Of course this will lower precision, but it would increase soundness. Same goes for dealing with return values of virtual method invocations - don't we have to assume that the return type (or any subtype) has been instantiated in the external world?

@errt
Copy link
Collaborator

errt commented Nov 19, 2024

I think we should deal with this in OPAL, we should just be sure where to properly deal with it. I suppose at least for the types that we already model in the configured native methods, it makes sense to add them to the initially instantiated types. We could also try to get a dynamic analysis result for current JDKs with empty or Hello-World programs to see which types we might have to consider.

I'm not deeply into the propagation-based code right now. If this already registers a dependency between fields and ExternalWorld, then that should already do what you describe, right? Since "any type in the type set that is a subtype of the field type" is the propagation-based rule for field reads. For method returns, are they also already connected to the external world? I'm not sure they should be, but maybe?

@johannesduesing
Copy link
Collaborator Author

Regarding the preconfiguration: I'll look into it, maybe find a suitable analysis to gather such types and post a PR! 🤓

Regarding the propagation-based analysis: I looked into this a little more, and i am fairly sure what happens is the following:

  • We see a virtual field read
  • We register a dependency to the InstantiatedTypes property of the ExternalWorld entity and register the field type as a filter for the entity ExternalWorld (see here)
  • When we receive an update for InstatiatedTypes of ExternalWorld, we look at the newly found types and apply the filters registered for External World (see here)
  • For all new types that match the filters, we produce results

Since there will never be an update that marks PrintStream as instantiated for the ExternalWorld (or for any other entity), the set of new types for ExternalWorld will never contain the type, and thus we never consider it instantiated. That's why i think it may make sense to change this, so that in the event that we read from the external world, we assume any subtype of the field type may be instantiated. Or another option could be to assign the preconfigured instantiated types to ExternalWorld by default.

@errt
Copy link
Collaborator

errt commented Nov 19, 2024

Oh, now I understand your suggestion. I don't think we can just use any subtype of the field's type even if that type is not in the ExternalWorld set, since that would essentially degenerate the algorithm to CHA (at least for the case of virtual fields). In turn, precision would not be guaranteed to be higher than RTA in all cases anymore. However, if we add types as instantiated for RTA, we should probably add them to the ExternalWorld, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants