-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277
Conversation
Ugh, don't love that this will slow everyone else down. |
Could we do something a bit hacky and say "if the project has above X fuzzers then use only a smaller set of CPUs", where the point is many fuzzers has a higher chance of exhausting the memory? We could e.g. set X to ~20 -- this would affect a limited number of projects? Or, do another thing of checking the disk size of all the fuzzers, and if they are large in size we could reduce the number of parallel jobs? |
Yeah capping it is probably fine! |
I capped it at 10 (inclusive)! This affects ~90 judging by the numbers here https://introspector.oss-fuzz.com/projects-overview |
@jonathanmetzman could you take a look at this? I'm still interested in seeing this merged as it's still affecting the Tensorflow coverage runs. |
/gcbrun trial_build.py all --fuzzing-engines libfuzzer --sanitizers introspector |
If this works, let me know and I will land it. |
Trial builds failed because of:
This is due to a previous issue: #10936 Rebasing |
The current number of parallel fuzzers running is set to the number of available CPUs. This is causing issues in Tensorflow: ``` Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! Step #5: /usr/local/bin/coverage: line 75: 4501 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file ... Step #5: error: decode_compressed_fuzz: Failed to load coverage: No such file or directory Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! Step #5: /usr/local/bin/coverage: line 75: 4873 Killed lvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LL VM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport Step #5: /usr/local/bin/coverage: line 75: 4897 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file ... Step #5: error: saved_model_fuzz: Failed to load coverage: No such file or directory Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! Step #5: /usr/local/bin/coverage: line 75: 4638 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file Step #5: [2023-05-08 11:57:05,246 INFO] Finding shared libraries for targets (if any). ... Step #5: [2023-05-08 11:57:09,911 INFO] Finished finding shared libraries for targets. Step #5: /usr/local/bin/coverage: line 75: 4276 Killed llvm-cov expor -summary-only -instr-profile=$profdata_file -object=$target $shared_libraries $LLVM_COV_COMMON_ARGS > $FUZZER_STATS_DIR/$target.json Step #5: /usr/local/bin/coverage: line 75: 5450 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file Step #5: [2023-05-08 11:57:40,282 INFO] Finding shared libraries for targets (if any). Step #5: [2023-05-08 11:57:40,323 INFO] Finished finding shared libraries for targets. Step #5: error: end_to_end_fuzz: Failed to load coverage: No such file or directory Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! ``` I assume this is because the fuzzers take up lots of the memory. A Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow. caps max processes at 10 Signed-off-by: David Korczynski <david@adalogics.com>
2361584
to
6e0fc44
Compare
@jonathanmetzman could you rerun the trial build? The previous one failed due to an old issue #10936 and I rebased now, this should be working now. |
Friendly ping @jonathanmetzman |
/gcbrun trial_build.py all --fuzzing-engines libfuzzer --sanitizers introspector |
@jonathanmetzman the failures on the trial run are all generic build failures and not related to this PR: Step #1: INFO:root:gpsd, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-4ecfd7aa-d300-4919-86f6-716eac4d9735.txt Is a generic build failure:
The build script is now working again following fix in #11051 Step #1: INFO:root:hdf5, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-9a00292a-4809-47a0-97bf-c56f198741e5.txt Looks unrelated, I think this was also due to a broken build which was fixed in #11037
Step #1: INFO:root:libzip, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-f3e9fe13-7ef4-4f8b-84fb-dfec4129c926.txt Generic build failure:
Step #1: INFO:root:ntopng, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-60ddc4df-f05b-4c4a-ac78-069bce1f84ec.txt
Step #1: INFO:root:wget2, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-5ae9972c-4f5c-47ed-8860-71f7ac07cc8d.txt Generic build failure which as since been fixed in #11049
|
friendly ping @jonathanmetzman |
The current number of parallel fuzzers running is set to the number of available CPUs. This is causing issues in Tensorflow:
log (don't open in a browser but
wget
/curl
it, as it's quite a large file and will probably annoy the browser).I assume this is because the fuzzers take up lots of the memory. A Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow, so I think the artifacts read by
llvm-profdata merge
will eat up memory, which consequently starts to crash processes on the system.I could imagine this happens for more projects with many fuzzers of large size?