-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-33602] [RFC] More User-friendly Output for Native Image #3955
Comments
Great idea - never sure why we didn't attempt to show some kind of progress and time estimate given how long it takes to run. |
There needs to be a flag for still having the prefix. This is very useful in CI logs where parallel native image builds can be running. |
Running native image builds in parallel is not something users normally do so this shouldn't be the default. We might keep the current output around for CI or add an option for the prefix. |
I really like the stats part:
Could be useful to print the option name for showing all packages and objects. Similar to Could we get a preview of the colourful mode? (when its ready) |
Good idea! Will extend that...
Of course, I'll share a first implementation when it's ready so that everyone can give this a spin :) |
I like all the improvements that are listed here. What I would love is for It would save me a lot of time if checks could be moved forward (fail fast(er)) and if on exception a context of the jar/module/class/method that the compiler was currently working on could be printed. |
I agree, there's still room for improvement. While good crash reports are important, I hope you don't mind if we focus on the output of a successful build in this issue and discuss your ideas in a separate one. Could you open another issue for this please? |
I've updated #3955 (comment) with a screencast of a prototype. Please feel free to leave any feedback. |
Great work @fniephaus! This is a huge improvement. Please see below for some comments GC usage
I think having a GC usage next to each phase can be valuable as well, e.g.
Or we could just have the time spent in GCs, since the number of GCs is not necessarily that important (especially for young collections), e.g.
To give a usage example of this, with that in place we can easily quantify the impact of Machine-readable output option
If we want all data in a single file we need a format that can support hierarchical structures, e.g. json, alternatively we could produce multiple csv files, e.g. one for the data from the different phases, one for the top packages/objects, etc. I personally like the tidiness of a single file but I also like the simplicity of CSV files and the ability to import data to spreadsheets without any hustle. :) No strong opinion on my side. Reflection overhead
In Quarkus we have observed that the more registrations for reflection we do the higher the overhead.
|
Thanks for your feedback, @zakkak! Glad you like it so far. :)
I'm not sure how much info we really want to display there to be honest. Your suggestion is quite GC-focused and since GCs tend to behave in non-deterministic ways, the output can be quite noisy. Maybe we could add some warnings instead that only show up if we detect an excessive number of GCs? Would that work for your use case and if yes, how would you identify excessive GCs?
As always, it depends on the use cases we want to support. A single file can only reasonably be produced at the end of a run and provide a summary while multiple smaller files could potentially be produced during the build process. I'm leaning toward a summary file that can be processed by a script during CI, for example, to detect if the image size exceeds a certain threshold.
That sounds reasonable... will look into this, thanks! |
@fniephaus I really like the demoed output. Reading both your remarks and responding, I would say that any output produced by default on the shell should somehow be actionable for me. It triggers me to think: If I see long GCs that should maybe be a trigger for me to tweak some parameter? If I see the top 5 classes, should I do something about that? Regarding the machine readable output, json is very easy to deal with so that should be fine. Please do provide something like a json schema to describe the contents of the file as well. |
I agree. While I think we should keep the amount of output text to a minimum, one way we can address this is through good documentation that explains the output items and possible actions. We can then add appropriate links from the output to the docs.
Good!
Noted. |
I guess it would, after all no matter the output we will end up using GC logs to better understand what's going on, so a warning would be enough. What comes to mind is observing the past Another alternative would be to estimate the ratio of |
Thanks, @zakkak! Those are all good ideas but I have the feeling that we may need more data points (which the new output can provide) before we can come up with a good way to detect GC issues in native image. On my local machine, for example, I can observe larger numbers of GCs when memory pressure is high, but the time spent in GC isn't anywhere near the time "doing work". What do you see in memory-constrained environments where the OOM killer likes to kick in? Also, I was under the impression that users typically |
Hi @fniephaus
In graalvm#304 (comment) (which is quite extreme I admit, since the live data set happens to be almost identical to the available memory) I observed the following:
This is like 70% of the total time spent in GC. If I was to compare the time spend in GC over the time spend in "doing work" per phase I would expect this percentage to go even higher. For instance doing such a comparison for the "universe" phase gives me:
This is about 98% of the time being spent in GC, and a 100% rate of Full GCs (indicating that there is not enough space to reclaim in the young generation) reclaiming only a couple hundreds of megabytes (indicating that there is not enough space in the old generation either).
Correct.
The answer is no to both questions. The reason I mention other GCs is because of your earlier comment which I might miss-understood:
My understanding was that by "GC-focused" you were worried that the proposed measurements would only make sense for Parallel GC and not other GCs, but it looks like you didn't mean this. |
I've implemented a test after the end of each stage that checks whether the time spent in GC dominates the overall time to run the stage and is above 15s. If the test fails, the following warning is displayed: (Here, I decreased both threshold to trigger the warning earlier.) |
Nice :)
What's the threshold to determine whether the time spend in GC dominates the overall time? Is this set to 50%? When you say "overall time" you refer to the overall time spent for the last phase, 66.3s in this case, right? Do you think it would make sense to be able to tune those thresholds (or disable the warning) to avoid warnings in memory constraint configurations that are known to be stressing the memory but for some reason need to run like this (e.g. lack of resources)? |
Yes, it's currently set to a
Technically, it checks the delta since the last check. But that's very close to the time spent for the last phase.
Yes, we could/should do that. An option for suppressing such warnings sounds reasonable. Noted! |
Starting with GraalVM 22.0 `native-image` will produce different output (see oracle/graal#3955) This patch makes the gradle integration test whether the GraalVM being used is < 22.0 and perform the corresponding assertions.
@fniephaus Shouldn't this be closed as it's been implemented with e04a9cd ? |
As mentioned by @jerboaa, we have implemented a new user-friendly build output mode with e04a9cd. The new mode is enabled by default and the old output can be restored with I will keep this issue open for the next few weeks in case there is more feedback from the community. Feel free to report any issues with the new output here or in a new issue. |
Suggestion: add the percentage of time spent in GC, not just the seconds, so
|
Keeping this open for collecting more feedback.
@eregon |
Yeah, that sounds good. |
…rter (#3955). PullRequest: graal/10568
…porter (#3955). PullRequest: graal/10626
Thanks again for all the feedback on this! The new build output has shipped with the GraalVM 22.0 release. If you have any further ideas or experience any issues, please file another issue. |
Update: This feature has landed on master and will ship with the GraalVM 22.0 release.
You can try it out using a GraalVM nightly build.
Description
Currently, the output produced during native image generation is limited to a list of stages, how long each of them took to run, and GC footprint. We want to make this output more user-friendly by providing relevant information for end-users. Overall, the goal is to help users better understand what happens during image generation and how a change, whether it is on their end or ours, influences that process in terms of time to run and memory usage.
We'd like to use this issue to discuss possible features and use cases for this new output mode with the community.
TLDR: Screencast
ni-output-demo.mp4
Full Example
(Unable to show colors and links.)
Feature List
Features are grouped by priority. Please feel free to propose changes and additional features in the comments.
Must-Have
Example:
[3/7] Performing analysis...
Note: Native image already reports
*.build_artifacts.txt
Example:
Image located at: '/path/to/image'
Example:
[7/7] Writing 44.15 MiB to disk...
Example:
[3/7] Performing analysis... ...done in 159.540s.
Example:
18,153 methods and 1,418 classes are reachable.
Example:
Generating HelloWorld with GraalVM Native Image 22.0.0...
Example see full example.
Example see full example.
[my-image:14475]
helps to distinguish parallel builds (uncommon use case, but there should be an option).Should Have
.
(similar to JUnit) or other chars.Example:
Performing Analysis... toooto
with t=typeflow, o=object graph.org.graalvm.nativeimage.hosted.Features
and having the image builder use them during the build process. When the classpath/modulepath that the user provides, contains features that are active during build it constitutes a major potential to affect every step of the image build in various ways.Example:
Found 5 user-provided Features: MyFeature, ...
Example:
GC Stats: 16 collection(s) in 1.687s (used: 0.57; committed: 3.22; max: 11.38)
Example see "Hyperlinks in console".
Example:
1024 classes loaded
.Example:
12.34MiB in code size. 31.56MiB in heap size.
Question: Can this be supported across all platforms? Probably yes.
Example:
1.11 GB (heap) / 1.87 GB (rss)
Nice to Have
System.console() != null && System.getenv("TERM") != null
). Can be disabled with a flag.*.build_artifacts.txt
file is a good example where this could be useful. Also for Stage info (links to our documentation for each stage) this would be neat. In the terminal output below e.g. /usr/lib/systemd/system/gdm.service is a hyperlink. More info."Example: [3/7] Performing analysis...
Question: Is this info really needed per stage or is max heap size more interesting?
Example:
...done in 159.540s (memory usage at 0.96GB).
Question: Is this really useful?
Example:
1 native library included: SDL2
Example:
Process CPU time: 120.02s
Won't Have (this time)
\r
\r
incompatible with log files and some terminals.Example:
18,153 methods and 1,418 classes are reachable. Press C to continue or S to search for reachable elements (C/S):
duringAnalysis
. If this is used in the wrong way it can mess up the analysis phase.Example:
Analysis required 4 rounds of typeflow operations and 9 rounds of object graph checks.
Example:
User-provided arguments: -H:+JNI -H:+AllowFoldMethods -H:FallbackThreshold=0 -H:+ReportExceptionStackTraces ...
Example:
Peak memory consumption to generate the image: 5.42GiB
Question: What would be a good output format for this? Suggestion: JSON + JSON schema
Other Things to Consider
The text was updated successfully, but these errors were encountered: