Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to increase available memory? #6

Closed
mraible opened this issue Feb 18, 2022 · 22 comments
Closed

Is it possible to increase available memory? #6

mraible opened this issue Feb 18, 2022 · 22 comments
Assignees
Labels
question Further information is requested

Comments

@mraible
Copy link

mraible commented Feb 18, 2022

I ask because I'm getting the following error:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

You can see my actions definition here.

@fniephaus
Copy link
Member

Hi @mraible, thanks for reaching out. According to actions/runner#1051 (comment), memory seems more restrictive on public Windows runners for some reason.

I don't think there's much we can do about that as GitHub provides the service. However, we are working on making GraalVM Native Image more memory-efficient.

Anyway, I can think of three options for you:

  1. Try Linux or macOS instead of Windows.
  2. Reduce the amount of work happening in one build. The build you linked ran for 1.5 hours. You could split the build into different stages, e.g., one that builds the jars of your app and one that builds a native executable with GraalVM Native Image.
  3. Set up your own Windows-based GitHub runner on a machine with additional memory resources.

Please feel free to share your experience and let us know if you found another way to make things work.

@mraible
Copy link
Author

mraible commented Feb 18, 2022

@fniephaus The job I linked to is on Linux. It works fine on Mac and the job completes in 28m. Linux gets this error after 1h and 29m. With Windows, I get a different error:

The command line is too long.

@fniephaus
Copy link
Member

4839.7s (94.8% of total time) in 839 GCs | Peak RSS: 6.13GB | CPU load: 1.97

That's a lot of time spent in GC (~81min) and explains why the build ran for so long. Nonetheless, maybe splitting the builds could help to release some memory pressure.

@fniephaus
Copy link
Member

@fniephaus The job I linked to is on Linux. It works fine on Mac and the job completes in 28m. Linux gets this error after 1h and 29m. With Windows, I get a different error:

The command line is too long.

Apologies for mixing that up. Let me look into this next week.

@fniephaus
Copy link
Member

fniephaus commented Feb 21, 2022

So, I looked into the two build failures:

  1. Regarding the Windows issue: it seems you're using an older version of Spring Boot. Could you try bumping that? Use a classpath jar under Windows native-build-tools#126 seems to fix the same problem in our native build tools so if bumping doesn't work, I think it'd makes sense to file an issue against Spring Boot.
  2. Regarding the OOM issue: it seems that for Linux and Windows, free GitHub runner only provide 7GB of RAM while macOS runners provide twice as much. That explains, why there's a lot more memory pressure in your Linux build. I see you already tried increasing the swap size. Did that help (some Linux builds were canceled because of the Windows problem). How much RAM (Peak RSS is printed as part of the Native Image build output) does the build process consume when you build your project locally?

@fniephaus
Copy link
Member

Also, GitLab recommends adjusting vm.swappiness (via sudo sysctl vm.swappiness=10) in memory-constrained environments. Maybe that helps as well?

@fniephaus
Copy link
Member

Adjusting vm.swappiness does not seem to help in your case: https://github.com/fniephaus/auth0-full-stack-java-example/runs/5275349582?check_suite_focus=true#step:5:10.

On macOS, peak RSS is around 9GB, which is not too far away from 7GB but apparently problematic on Linux and Windows. We also have to keep in mind that the Native Image builder is invoked from Maven, which also requires some memory.

I'm afraid I'm running out of ideas at the moment but will keep my eyes open.

@mraible
Copy link
Author

mraible commented Feb 22, 2022

Regarding the Windows issue: it seems you're using an older version of Spring Boot. Could you try bumping that?

The GitHub action is pointing to the spring-native branch:

https://github.com/oktadev/auth0-full-stack-java-example/blob/main/.github/workflows/publish.yml#L10

That branch is using Spring Boot 2.6.3.

@mraible
Copy link
Author

mraible commented Feb 28, 2022

I was able to fix the OOM error on Linux by specifying -J-Xmx7g in the build arguments:

<plugin>
    <groupId>org.graalvm.buildtools</groupId>
    <artifactId>native-maven-plugin</artifactId>
    <version>${native-buildtools.version}</version>
    <extensions>true</extensions>
    <executions>
        <execution>
            <id>test-native</id>
            <phase>test</phase>
            <goals>
                <goal>test</goal>
            </goals>
        </execution>
        <execution>
            <id>build-native</id>
            <phase>package</phase>
            <goals>
                <goal>build</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <imageName>native-executable</imageName>
        <buildArgs>
            <buildArg>--no-fallback --verbose -J-Xmx7g</buildArg>
        </buildArgs>
    </configuration>
</plugin>

I also found that changing it from 7g to 10g drops the build time by about 5 minutes.

I posted my problem with "command is too long" on Windows to Stack Overflow.

@fniephaus
Copy link
Member

I was able to fix the OOM error on Linux by specifying -J-Xmx7g in the build arguments

Great, thanks for the update! Now that I think about it, it makes sense: the JVM allocates only a percentage of available RAM by default, so it probably never actually used the additional swap space that we set up.

902.9s (41.4% of total time) in 150 GCs

That's still a lot of time spent in GCs. Maybe some additional tuning will help but glad things work for now.

I posted my problem with "command is too long" on Windows to Stack Overflow.

Could you bump the native build tools to 0.9.10 and try again (see graalvm/native-build-tools#214)?

@mraible
Copy link
Author

mraible commented Feb 28, 2022

Could you bump the native build tools to 0.9.10 and try again (see graalvm/native-build-tools#214)?

I tried this here. The Windows build hasn't failed yet, but it has taken over an hour (so far).

Update: it almost worked.

[2/7] Performing analysis...  [*********]                                                             (2562.6s @ 5.97GB)
Warning: Could not register complete reflection metadata for org.springframework.boot.actuate.health.ReactiveHealthEndpointWebExtension. Reason(s): java.lang.NoClassDefFoundError: reactor/core/publisher/Mono
  33,826 (93.95%) of 36,005 classes reachable
  56,486 (79.62%) of 70,946 fields reachable
 170,797 (65.78%) of 259,658 methods reachable
   2,318 classes,   803 fields, and 11,795 methods registered for reflection
      82 classes,    78 fields, and    67 methods registered for JNI access
[3/7] Building universe...                                                                             (299.9s @ 6.08GB)
Error: Image build request failed with exit status 1

@fniephaus
Copy link
Member

Update: it almost worked.

Good! I assume this ran with -J-Xmx7g? It's weird that we don't see an error, so maybe this is another OOM crash? Maybe try again with -J-Xmx8g?

@mraible
Copy link
Author

mraible commented Mar 1, 2022

I assume this ran with -J-Xmx7g

I'm currently using -J-Xmx10g.

I was able to fix the windows build by setting the minimum pagefile size to 10GB! 🎉

You can see the successful run for details.

Time spent building native images:

  • macOS: 31m 30s
  • Linux: 33m 50s
  • Windows: 59m 45s

@fniephaus
Copy link
Member

Interestingly, the successful Windows run used -Xmx6012577376 and not -Xmx10g, which you seem to have dropped from your PR. So now I wonder how stable those builds are going to run.

I'm experimenting with using the SerialGC on GitHub actions, which seems to work a bit better than the ParallelGC. However, doing that is currently a bit awkward: -J-XX:-UseParallelGC -J-XX:+UseSerialGC (need to disable UseParallelGC first).

@mraible
Copy link
Author

mraible commented Mar 3, 2022

@fniephaus I accidentally removed the setting. I restored it in oktadev/auth0-full-stack-java-example@b779335.

The windows build worked a couple of days ago. Now it's failing with:

[INFO] npm ERR! code 1
[INFO] npm ERR! path D:\a\auth0-full-stack-java-example\auth0-full-stack-java-example\node_modules\puppeteer
[INFO] npm ERR! command failed
[INFO] npm ERR! command C:\Windows\system32\cmd.exe /d /s /c node install.js
[INFO] npm ERR! ERROR: Failed to set up Chromium r869685! Set "PUPPETEER_SKIP_DOWNLOAD" env variable to skip download.
[INFO] npm ERR! [Error: ENOSPC: no space left on device, write] {
[INFO] npm ERR!   errno: -4055,
[INFO] npm ERR!   code: 'ENOSPC',
[INFO] npm ERR!   syscall: 'write'
[INFO] npm ERR! }

This issue seems to indicate it's using more than 14GB of disk space.

I'll try setting PUPPETEER_SKIP_DOWNLOAD, but I'm not sure this will help.

@mraible
Copy link
Author

mraible commented Mar 3, 2022

@fniephaus Changing the build to use windows-2019 instead of windows-latest solves the problem.

@fniephaus
Copy link
Member

For anyone reading this, I highly recommend upgrading to GraalVM 22.2+. We have made Native Image significantly more robust in memory-constrained environments, which means you should now be able to build large Java applications with Native Image on GitHub Actions without any problems.

@linghengqian
Copy link

@fniephaus Hi, do I need to open a new issue? I found in oracle/graalvm-reachability-metadata#122 (comment) that the memory occupied by setup-graalvm made the Github Action device crash.

@fniephaus fniephaus reopened this Dec 6, 2022
@fniephaus
Copy link
Member

No need @linghengqian, I've reopen this issue. How do you know that the build in question failed due to not enough memory?

@linghengqian
Copy link

linghengqian commented Dec 6, 2022

No need @linghengqian, I've reopen this issue. How do you know that the build in question failed due to not enough memory?

  • Because this problem is similar to the problem I encountered locally before (GraalVM CE 22.3.0, JDK 11 and JDK 17). I compile projects related to GraalVM Native Image in WSL under Windows, and I only give 8GB of memory to WSL by default. Once I handle multiple tasks at the same time (such as running multiple GUI applications through WSL), and execute the nativeTest task of GraalVM Native Build Tools through gradle, once the memory usage exceeds 8GB, the entire WSL instance will become unresponsive, and I must execute wsl --shutdown in powershell to restart WSL, in order to re-use WSL.

  • Even so, I'm not sure how to collect the log of Github Action.

@fniephaus
Copy link
Member

The build job you mention in oracle/graalvm-reachability-metadata#122 (comment) does not show any signs of memory issues, so I'm going to close this again. Maybe there's something wrong with the metadata you're contributing.

@fniephaus
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants