Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues related to boot module layer synthesizing with closed-world approach #3733

Closed
ivan-ristovic opened this issue Aug 27, 2021 · 8 comments · Fixed by #3821
Closed

Issues related to boot module layer synthesizing with closed-world approach #3733

ivan-ristovic opened this issue Aug 27, 2021 · 8 comments · Fixed by #3821
Assignees
Milestone

Comments

@ivan-ristovic
Copy link
Contributor

ivan-ristovic commented Aug 27, 2021

Describe the issue

NI boot module layer currently only contains modules which contents are reachable after analysis. This behavior follows the closed-world assumption and is naturally translated to module system as well. However, in the case of modules, it can in some cases cause unintended results. This issue shows a simple example, with the goal to open a discussion about potential changes in the current closed-world approach when it comes to modules.

Describe GraalVM and your environment:

Steps to reproduce the issue

A simple example can be a modularized application with two modules - one of which will be a "library" module exporting a package that will be used by the "main" module.

The "library" module can be defined as:

module core.app {
        exports core.util;
}

It contains a class core.util.WorkerUtil with the following definition:

package core.util;

public class WorkerUtil {
        public static void doSomething() {
                System.out.println("WorkerUtil working...");
        }
}

The "main" module can be defined as:

module main.app {
        requires core.app;
}

Our main method, contained in main.app module, can be defined as:

package example.app;

import java.lang.ModuleLayer;
import core.util.WorkerUtil;

public class AppMain {

        // This function can also be in another module
        public static void useLibrary() {
                WorkerUtil.doSomething();
        }
        
        public static void main(String[] args) {
                useLibrary();
                assert ModuleLayer.boot().modules()
                        .stream()
                        .anyMatch(m -> m.getName().equals("core.app"));
        }
}

We can build an image of this application (in this example all jar files that are created by javac are located in pkg directory):

$ USE_NATIVE_IMAGE_JAVA_PLATFORM_MODULE_SYSTEM=true native-image -ea -p pkg -m main.app/example.app.AppMain

We can now verify that the assertion will hold by running the image. If we do not invoke the useLibrary() method, the entire core.app module will not be included in the image, as types from that module are not used. Re-running native-image in the same way as above (however this time not invoking useLibrary() method), and invoking the generated executable will lead to an AssertionError:

Exception in thread "main" java.lang.AssertionError
	at example.app.AppMain.main(AppMain.java:14)

This way modules can unintentionally break dependant modules simply by removing dependencies.

More details

For this example, the size of an executable which only contains only reachable modules (14 in this case): 11481768 B (~11M)

For this example, the size of an executable which contains all modules on the module path (26 in this case): 11485864 B (~11M)

From here, we can see that the difference is 4096 B, which averages to ~341 B per module (this value correlates to the number of packages contained in the module).

@ivan-ristovic ivan-ristovic changed the title Issues related to module layer synthesizing with closed-world approach Issues related to boot module layer synthesizing with closed-world approach Aug 27, 2021
@olpaw
Copy link
Member

olpaw commented Aug 27, 2021

We can now verify that the assertion will hold by running the image. If we do not invoke the useLibrary() method, the entire core.app module will not be included in the image, as types from that module are not used. Re-running native-image in the same way as above (however this time not invoking useLibrary() method), and invoking the generated executable will lead to an AssertionError

IMO, this is a feature and not a bug.

It is equally simple to construct a trivial sample application that uses reflection that passes on JVM but fails as native-image (if no configuration is provided during image build).

The question is what compromise we are more willing to present to our users:

  • Always show all modules in the bootlayer even if no classes are needed from any of those modules. I.e. there could be images with lots of modules in the bootlayer that are all empty. This would be hard to explain to users.
  • Only show modules in the bootlayer of our actually reachable classes. For this behaviour we can provide the user with the same explanation that we already have for e.g. reflection, proxies, resources, ... that this is a result of us building the image under closed world assumption and that we are only putting into the image what can can seen as reachable by the static analysis plus image configuration.

@olpaw
Copy link
Member

olpaw commented Aug 27, 2021

Btw, comparing the output of

public class HelloJava {
    public static void main(String[] args) {
        System.out.println(java.util.Arrays.toString(Package.getPackages()));
    } 
}

when running on the JVM compared to running as native-image gives very different results.

Here also the output of Package.getPackages() is driven by the analysis results as it is now also implemented for the boot module-layer seen at image runtime. I.e. the current boot module layer synthesizing with closed-world approach is consistent with how we handle things elsewhere.

@christianwimmer
Copy link

We certainly do not want to include completely unused modules in the image heap, neither from the module path nor from the JDK. Why should a Hello World contain module information about, e.g., the java.sql module.

Now the question is: when is a module "used". Clearly, a module is used when at least one class of that module is included. But modules also have dependencies. If I understand the module system correctly, then module dependencies are an acyclic graph. So I think when a module is used, also all of its dependencies should be in the image heap. So in the example above, main.app requires core.app and therefore the module information of core.app would be in the image heap.

But if you have another module core.second that is just added to the module path on the command line, but then no class of that module is reachable, then it would not be included in the image heap.

There are also optional module dependencies (requires static) - so what to do with those? Probably not include automatically.

What to do with the "unnamed" module, which automatically depends on all named modules? Not include all dependencies of course (because that would result in all modules being always included).

@vjovanov
Copy link
Member

vjovanov commented Aug 30, 2021

We certainly do not want to include completely unused modules in the image heap, neither from the module path nor from the JDK.

Why not? The overhead is below one page in the image heap.

Why should a Hello World contain module information about, e.g., the java.sql module.

Because it does that when one runs "Hello, World!" with java.

I am asking because this would mean that we need an additional config file, agent changes, etc, for the code above to work as expected. Given the low overhead of including modules into the heap, I would argue that it is not worth complicating config further.

@christianwimmer
Copy link

As I said, the example above (where there is a direct module dependency) needs to include the "library" module. So that then works "as expected".

And for the case of JDK modules and application modules that are just passed in on the command line and then not used: here you need to define what "expected" means. Clearly, we also do not preserve the list of .jar files passed in on the class path. If you expect that every class, package, or module that you specified on the command line is available at run time, then you cannot use a closed-world assumption.

And no, we do not need an additional configuration file for modules. When you want a module to be available at runtime, register one class of it (like the module-info class) for reflection.

@vjovanov
Copy link
Member

The example above would not work if we are looking for any module from the JDK. But then, for such code, one should state in the module-info.java that the given JDK module is required.

I am OK with the proposal above given that the removal of completely unused modules can be also achieved with jlink.

@olpaw
Copy link
Member

olpaw commented Aug 31, 2021

given that the removal of completely unused modules can be also achieved with jlink

Good argument!

A real-world application that relies on seeing unused modules at runtime is bogus/fragile for exactly that reason.

@ivan-ristovic
Copy link
Contributor Author

Thanks for the feedback! This can be assigned to me, I will enhance the boot layer module set to include required modules as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants