Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of module-ified TypeScript compiler repo takes >6 hours, normally 6 minutes #10937

Closed
jakebailey opened this issue Oct 21, 2022 · 4 comments
Labels

Comments

@jakebailey
Copy link

Description of the issue

The TypeScript repo uses the CodeQL action, with the default javascript-queries set. For TypeScript 5.0, we're going to be changing the codebase from namespaces to modules. However, in my testing of that new compiler, I noticed that the CodeQL workflow would always time out at 6 hours. See: https://github.com/microsoft/TypeScript/actions/runs/3229429756/jobs/5286722214

I ran CodeQL locally, and it looks like it gets stuck on these queries:

TaintedPath.ql                 : iteration 2 of Configuration#e7756e4f::appendStep#5#fffff etc
CommandInjection.ql            : iteration 26 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
CodeInjection.ql               : iteration 26 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
ImproperCodeSanitization.ql    : iteration 25 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
UnsafeDynamicMethodAccess.ql   : iteration 20 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
CleartextLogging.ql            : iteration 24 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
RegExpInjection.ql             : iteration 26 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
UnvalidatedDynamicMethodCall.ql: iteration 12 of Configuration#e7756e4f::appendStep#5#fffff etc
InsecureDownload.ql            : iteration 34 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
PrototypePollutingAssignment.ql: iteration 21 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc
RequestForgery.ql              : iteration 26 of Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 etc

Letting these run to completion overnight took upwards of 9 hours on my beefy machine.

I ignored these queries on my fork of TypeScript, and that brought the analysis time down to about 7 minutes on the builder: https://github.com/jakebailey/TypeScript/actions/runs/3298853660/jobs/5441505083

I'm not quite sure what the problem is here; it could be the circularities in the new codebase (which are now explicit, rather than being hidden in namespaces without explicit imports). Or, it's just that now there are imports, which exposes the dependencies between files properly (for the same reason).

The branch to test is located here: https://github.com/jakebailey/TypeScript/tree/typeformer-2

@jakebailey jakebailey added the question Further information is requested label Oct 21, 2022
@jakebailey
Copy link
Author

FWIW each of these get stuck at Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2; my capture above just was a little early.

Happy to provide any info, logs, etc as needed. I just downloaded https://github.com/github/codeql-action/releases/tag/codeql-bundle-20221010, then did:

codeql database create codeql-database --language=javascript --source-root=./src
codeql database analyze codeql-database --format=csv --output=codeql.csv -j=-2 ~/work/codeql/qlpacks/codeql/javascript-queries/0.4.1/Security/CWE-022/TaintedPath.ql
# or 
codeql database analyze codeql-database --format=csv --output=codeql.csv -j=-2 codeql/javascript-queries

@hmakholm hmakholm added JS and removed question Further information is requested labels Oct 24, 2022
@hmakholm
Copy link
Contributor

Thanks for the report. I've been able to reproduce this blow-up locally (CLI 2.11.2 for extraction, locally built main for evaluation).

Pinging @github/codeql-javascript for further performance analysis. My preliminary findings are that the Configuration#e7756e4f::reachesReturn#4#ffff#reorder_3_0_1_2 delta sizes increase monotonically over the first 30 iterations, reaching about 2.5 million per delta at the point where I killed my test run. (Even though named as a reorder, it's actually the main instance of the reachesReturn#4#ffff predicate). It's the only predicate in the 35-strong SCC that reaches deltas of more than a million tuples.

@erik-krogh
Copy link
Contributor

Thanks for that, you inspired some great performance improvements to our global dataflow analysis.

Both me and @asgerf found some improvements, and the combination of those improvements gets codeql database analyze down to ~6m15s on my machine (M1 Max).

Asgers fix just got merged, so the next CodeQL release will be able to analyze the module-ified code without issues.
But it will take a few weeks for that to land, so in the meantime I suggest you keep your changes to codeql-configuration.yml.

Closing as the issue has been fixed.

@jakebailey
Copy link
Author

Wonderful, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants