Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeQL execution is very slow #2378

Open
dbrezhniev opened this issue Jul 18, 2024 · 5 comments
Open

CodeQL execution is very slow #2378

dbrezhniev opened this issue Jul 18, 2024 · 5 comments

Comments

@dbrezhniev
Copy link

Hi! We've recently adopted CodeQL into our system and noticed very slow analysis for one of our codebases, which consists of java + kotlin.
For comparison:

  • regular build takes 20-30 minutes
  • codeql analysis with autobuild mode takes 4 hours on average.

To be frank, our codebase is quite large, but I didn't expect this action to take 8x longer than the build itself. Can it be sped up somehow?
Let me know if you need more info.

Workflow file for reference:

name: "CodeQL"
on:
...
jobs:
...
  analyze-java:
    name: Analyze java-kotlin
    container:
      image: XXXX
      credentials:
        username: XXXX
        password: XXXX
    steps:
    - name: Checkout repository
      uses: actions/checkout@v4
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v3
      with:
        languages: java-kotlin
        build-mode: autobuild
    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v3
      with:
        category: "/language:java-kotlin"
@aeisenberg
Copy link
Contributor

Yes. We have recently rolled out buildless analysis for Java. It is likely that your analysis will run significantly faster with this enabled.

There are caveats, though. For example, if your project does lot of code generation then results will be worse (since we can't analyze the generated code). If buildless isn't the right choice for you, I would recommend using a larger runner.

@aibaars
Copy link
Collaborator

aibaars commented Jul 19, 2024

@dbrezhniev Apart from the normal build, CodeQL "compiles" all sources files into a special purpose database, and runs queries against that database. As a result the code is "compiled" twice during a CodeQL run with autobuild. In addition running the queries takes time too, and overall a CodeQL run typically takes about 3 times longer as a normal build. A 4 hour analysis time is indeed much higher than what I would have expected.

It would be good to know which phase of the CodeQL run is taking so long. Is it the "autobuild" step, the "database import/finalize" step, or the "analyze/query run" step? If the "autobuild" step is taking very long then @aeisenberg 's suggestion would be a good thing to try. If the other steps are slow then most likely increasing CPU and RAM could help. If you are using GitHub Actions then switching to a large runner should do the trick. Otherwise, you can also try setting the CODEQL_RAM and CODEQL_THREADS environment variables or the --ram, --threads CLI flags. It can also be that only a few queries in the "analyze" step are slow, but others run fast. In this case let us know, perhaps we can work with you to improve the performance of those queries.

@dbrezhniev
Copy link
Author

dbrezhniev commented Jul 19, 2024

Thanks for the fast response!
Running codeql in build mode none does indeed speed up the execution, but as the majority of our code is in Kotlin, this method does not suit us. Difference in summary of none and autobuild modes:
None: CodeQL scanned 629 out of 630 Java files in this invocation.
Autobuild: CodeQL scanned 4727 out of 4875 Kotlin files and 629 out of 630 Java files in this invocation.

If buildless isn't the right choice for you, I would recommend using a larger runner.

This is something we tried already, but it didn't help. There is no time reduction with a larger size. In an attempt to debug it, we noticed that the runner is underutilized: Only 1 of 8 cores are used and more than half of memory is free. What is also interesting is that 3 of 4 hours are spent on a single сompileKotlin task during the analysis step:

  [2024-07-19 06:25:02] [build-stdout] [2024-07-19 06:25:02] [autobuild] > Task :aa:aa:testClasses UP-TO-DATE
  [2024-07-19 09:34:14] [build-stdout] [2024-07-19 09:34:14] [autobuild] > Task :bb:bb:compileKotlin

Let me know if you need more info!

@aibaars
Copy link
Collaborator

aibaars commented Jul 19, 2024

You're right that buildmode: none is not good for your case since it does not support Kotlin at the moment and your code base is mostly Kotlin.

What is also interesting is that 3 of 4 hours are spent on a single kompileKotlin task during the analysis step:

That is very interesting indeed. I'll ask the Kotlin team to have a look, there may be a performance issue in CodeQL's extractor for Kotlin code. Would you be able make a performance profile of the kompileKotlin task? I don't have much experience with Java profilers so I can't give you much help with that. Another thing would be some simple stack trace dumps using jstack to get an idea about what the kompileKotlin task is doing/stuck at.

@corneliusroemer
Copy link

Codeql failed completely due to OOM until I used this build command with extra JVM memory settings (don't ask me which of the 2 memory settings work, I sprinkled out of desperation but this eventually worked):

export JAVA_OPTS="-Xmx4096m"
./gradlew --no-daemon -Dorg.gradle.jvmargs=-Xmx1g build --info --stacktrace -x ktlintCheck -x test

loculus-project/loculus#2705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants