infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277

DavidKorczynski · 2023-05-08T19:50:30Z

The current number of parallel fuzzers running is set to the number of available CPUs. This is causing issues in Tensorflow:

Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4501 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: decode_compressed_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4873 Killed                  lvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LL
VM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport
Step #5: /usr/local/bin/coverage: line 75:  4897 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: saved_model_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4638 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:05,246 INFO] Finding shared libraries for targets (if any).
...
Step #5: [2023-05-08 11:57:09,911 INFO] Finished finding shared libraries for targets.
Step #5: /usr/local/bin/coverage: line 75:  4276 Killed                  llvm-cov expor -summary-only -instr-profile=$profdata_file -object=$target $shared_libraries $LLVM_COV_COMMON_ARGS > 
$FUZZER_STATS_DIR/$target.json
Step #5: /usr/local/bin/coverage: line 75:  5450 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:40,282 INFO] Finding shared libraries for targets (if any).
Step #5: [2023-05-08 11:57:40,323 INFO] Finished finding shared libraries for targets.
Step #5: error: end_to_end_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!

log (don't open in a browser but wget/curl it, as it's quite a large file and will probably annoy the browser).
I assume this is because the fuzzers take up lots of the memory. A Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow, so I think the artifacts read by llvm-profdata merge will eat up memory, which consequently starts to crash processes on the system.

I could imagine this happens for more projects with many fuzzers of large size?

jonathanmetzman · 2023-05-10T18:15:16Z

Ugh, don't love that this will slow everyone else down.

DavidKorczynski · 2023-05-10T18:28:11Z

Could we do something a bit hacky and say "if the project has above X fuzzers then use only a smaller set of CPUs", where the point is many fuzzers has a higher chance of exhausting the memory? We could e.g. set X to ~20 -- this would affect a limited number of projects?

Or, do another thing of checking the disk size of all the fuzzers, and if they are large in size we could reduce the number of parallel jobs?

jonathanmetzman · 2023-05-12T20:36:38Z

Could we do something a bit hacky and say "if the project has above X fuzzers then use only a smaller set of CPUs", where the point is many fuzzers has a higher chance of exhausting the memory? We could e.g. set X to ~20 -- this would affect a limited number of projects?

Or, do another thing of checking the disk size of all the fuzzers, and if they are large in size we could reduce the number of parallel jobs?

Yeah capping it is probably fine!

DavidKorczynski · 2023-06-27T20:46:40Z

Yeah capping it is probably fine!

I capped it at 10 (inclusive)!

This affects ~90 judging by the numbers here https://introspector.oss-fuzz.com/projects-overview

DavidKorczynski · 2023-09-24T20:18:43Z

@jonathanmetzman could you take a look at this? I'm still interested in seeing this merged as it's still affecting the Tensorflow coverage runs.

jonathanmetzman · 2023-09-27T21:19:46Z

/gcbrun trial_build.py all --fuzzing-engines libfuzzer --sanitizers introspector

jonathanmetzman · 2023-09-27T21:20:03Z

If this works, let me know and I will land it.

DavidKorczynski · 2023-09-28T19:53:02Z

Trial builds failed because of:

(Reading database ... 18157 files and directories currently installed.)
Step #1: Step #4: #5 77.28 Preparing to unpack .../nodejs_19.9.0-deb-1nodesource1_amd64.deb ...
Step #1: Step #4: #5 77.29 Unpacking nodejs (19.9.0-deb-1nodesource1) ...
Step #1: Step #4: #5 81.71 Setting up nodejs (19.9.0-deb-1nodesource1) ...
Step #1: Step #4: #5 81.75 + npm install --global npm
Step #1: Step #4: #5 82.49 npm ERR! code EBADENGINE
Step #1: Step #4: #5 82.50 npm ERR! engine Unsupported engine
Step #1: Step #4: #5 82.50 npm ERR! engine Not compatible with your version of node/npm: npm@10.1.0
Step #1: Step #4: #5 82.50 npm ERR! notsup Not compatible with your version of node/npm: npm@10.1.0
Step #1: Step #4: #5 82.50 npm ERR! notsup Required: {"node":"^18.17.0 || >=20.5.0"}
Step #1: Step #4: #5 82.50 npm ERR! notsup Actual:   {"npm":"9.6.3","node":"v19.9.0"}
Step #1: Step #4: #5 82.50 
Step #1: Step #4: #5 82.50 npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2023-09-27T21_45_38_375Z-debug-0.log
Step #1: Step #4: #5 ERROR: executor failed running [/bin/sh -c install_javascript.sh]: exit code: 1
Step #1: Step #4: ------
Step #1: Step #4:  > importing cache manifest from gcr.io/oss-fuzz-base/base-builder-javascript-testing-davidkorczynski-patch-9:
Step #1: Step #4: ------

This is due to a previous issue: #10936

Rebasing

The current number of parallel fuzzers running is set to the number of available CPUs. This is causing issues in Tensorflow: ``` Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! Step #5: /usr/local/bin/coverage: line 75: 4501 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file ... Step #5: error: decode_compressed_fuzz: Failed to load coverage: No such file or directory Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! Step #5: /usr/local/bin/coverage: line 75: 4873 Killed lvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LL VM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport Step #5: /usr/local/bin/coverage: line 75: 4897 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file ... Step #5: error: saved_model_fuzz: Failed to load coverage: No such file or directory Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! Step #5: /usr/local/bin/coverage: line 75: 4638 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file Step #5: [2023-05-08 11:57:05,246 INFO] Finding shared libraries for targets (if any). ... Step #5: [2023-05-08 11:57:09,911 INFO] Finished finding shared libraries for targets. Step #5: /usr/local/bin/coverage: line 75: 4276 Killed llvm-cov expor -summary-only -instr-profile=$profdata_file -object=$target $shared_libraries $LLVM_COV_COMMON_ARGS > $FUZZER_STATS_DIR/$target.json Step #5: /usr/local/bin/coverage: line 75: 5450 Killed llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file Step #5: [2023-05-08 11:57:40,282 INFO] Finding shared libraries for targets (if any). Step #5: [2023-05-08 11:57:40,323 INFO] Finished finding shared libraries for targets. Step #5: error: end_to_end_fuzz: Failed to load coverage: No such file or directory Step #5: error: Could not load coverage information Step #5: error: No such file or directory: Could not read profile data! ``` I assume this is because the fuzzers take up lots of the memory. A Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow. caps max processes at 10 Signed-off-by: David Korczynski <david@adalogics.com>

DavidKorczynski · 2023-09-28T22:32:24Z

@jonathanmetzman could you rerun the trial build? The previous one failed due to an old issue #10936 and I rebased now, this should be working now.

DavidKorczynski · 2023-10-02T15:11:53Z

Friendly ping @jonathanmetzman

jonathanmetzman · 2023-10-02T15:44:32Z

/gcbrun trial_build.py all --fuzzing-engines libfuzzer --sanitizers introspector

DavidKorczynski · 2023-10-05T14:27:25Z

@jonathanmetzman the failures on the trial run are all generic build failures and not related to this PR:

Step #1: INFO:root:gpsd, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-4ecfd7aa-d300-4919-86f6-716eac4d9735.txt

Is a generic build failure:

Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1mFuzzPacket.c:48:26: �[0m�[0;1;31merror: �[0m�[1mtoo few arguments to function call, expected 2, have 1�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":         lexer_init(&lexer);
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m        ~~~~~~~~~~       ^

The build script is now working again following fix in #11051

Step #1: INFO:root:hdf5, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-9a00292a-4809-47a0-97bf-c56f198741e5.txt

Looks unrelated, I think this was also due to a broken build which was fixed in #11037

Step #22 - "compile-libfuzzer-introspector-x86_64": [Log level 1] : 16:46:01 : Ended wrapping all functions
Step #22 - "compile-libfuzzer-introspector-x86_64": [Log level 1] : 16:46:02 : Finished introspector module
Step #22 - "compile-libfuzzer-introspector-x86_64": /usr/bin/ld.gold: warning: LLVM gold plugin: stack frame size (43624) exceeds limit (16384) in function 'H5Z__xform_eval_full'
Step #22 - "compile-libfuzzer-introspector-x86_64": + zip -j /workspace/out/libfuzzer-introspector-x86_64/h5_extended_fuzzer_seed_corpus.zip '/src/hdf5/test/*.h5'
Step #22 - "compile-libfuzzer-introspector-x86_64": 	zip warning: name not matched: /src/hdf5/test/*.h5
Step #22 - "compile-libfuzzer-introspector-x86_64": 
Step #22 - "compile-libfuzzer-introspector-x86_64": zip error: Nothing to do! (/workspace/out/libfuzzer-introspector-x86_64/h5_extended_fuzzer_seed_corpus.zip)
Step #22 - "compile-libfuzzer-introspector-x86_64": ********************************************************************************

Step #1: INFO:root:libzip, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-f3e9fe13-7ef4-4f8b-84fb-dfec4129c926.txt

Generic build failure:

Step #22 - "compile-libfuzzer-introspector-x86_64": RUSTFLAGS=--cfg fuzzing -Zsanitizer=introspector -Cdebuginfo=1 -Cforce-frame-pointers
Step #22 - "compile-libfuzzer-introspector-x86_64": ---------------------------------------------------------------
Step #22 - "compile-libfuzzer-introspector-x86_64": + /src/libzip/regress/ossfuzz.sh
Step #22 - "compile-libfuzzer-introspector-x86_64": /src/build.sh: line 18: /src/libzip/regress/ossfuzz.sh: No such file or directory

Step #1: INFO:root:ntopng, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-60ddc4df-f05b-4c4a-ac78-069bce1f84ec.txt
Generic build failure:

command-line-argument -fsanitize=fuzzer-no-link -stdlib=libc++ -g -I/src/ntopng -I/src/ntopng/include -Wno-address-of-packed-member -Wno-unused-function -I/usr/local/include -I/src/ntopng/third-party/http-client-c/src/  -I/usr/include/openssl  -DDATA_DIR='"/usr/local/share"'  -c src/nDPIStats.cpp -o src/nDPIStats.o
Step #22 - "compile-libfuzzer-introspector-x86_64": �[1msrc/NetworkInterface.cpp:12391:7: �[0m�[0;1;31merror: �[0m�[1munknown type name 'flow_interfaces_stats'; did you mean 'sFlowInterfaceStats'?�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":   if (flow_interfaces_stats) {
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m      ^~~~~~~~~~~~~~~~~~~~~
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[0;32m      sFlowInterfaceStats
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1m/src/ntopng/include/ntop_typedefs.h:731:3: �[0m�[0;1;30mnote: �[0m'sFlowInterfaceStats' declared here�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64": } sFlowInterfaceStats;
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m  ^
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1msrc/NetworkInterface.cpp:12391:28: �[0m�[0;1;31merror: �[0m�[1mexpected unqualified-id�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":   if (flow_interfaces_stats) {
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m                           ^
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1msrc/NetworkInterface.cpp:12393:5: �[0m�[0;1;31merror: �[0m�[1muse of undeclared identifier 'flow_interfaces_stats'�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":     flow_interfaces_stats->luaDeviceList(vm);
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m    ^

Step #1: INFO:root:wget2, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-5ae9972c-4f5c-47ed-8860-71f7ac07cc8d.txt

Generic build failure which as since been fixed in #11049

Step #22 - "compile-libfuzzer-introspector-x86_64": ./bootstrap: autopoint --force
Step #22 - "compile-libfuzzer-introspector-x86_64": autopoint: *** The AM_GNU_GETTEXT_VERSION declaration in your configure.ac
Step #22 - "compile-libfuzzer-introspector-x86_64":                file requires the infrastructure from gettext-0.21 but this version
Step #22 - "compile-libfuzzer-introspector-x86_64":                is older. Please upgrade to gettext-0.21 or newer.
Step #22 - "compile-libfuzzer-introspector-x86_64": autopoint: *** Stop.
Step #22 - "compile-libfuzzer-introspector-x86_64": ./bootstrap: could not generate auxiliary files
Step #22 - "compile-libfuzzer-introspector-x86_64": ********************************************************************************
Step #22 - "compile-libfuzzer-introspector-x86_64": Failed to build.

DavidKorczynski · 2023-10-23T16:29:48Z

friendly ping @jonathanmetzman

DavidKorczynski changed the title ~~infra: base-runner: set max parallel jobs to be half of CPU count~~ infra: base-runner: coverage: set max parallel jobs to be half of CPU count May 8, 2023

DavidKorczynski force-pushed the DavidKorczynski-patch-9 branch from 2361584 to 6e0fc44 Compare September 28, 2023 19:55

Merge branch 'master' into DavidKorczynski-patch-9

0619344

oliverchang approved these changes Nov 23, 2023

View reviewed changes

oliverchang merged commit f716590 into master Nov 23, 2023

oliverchang deleted the DavidKorczynski-patch-9 branch November 23, 2023 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277

infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277

DavidKorczynski commented May 8, 2023 •

edited

Loading

jonathanmetzman commented May 10, 2023

DavidKorczynski commented May 10, 2023 •

edited

Loading

jonathanmetzman commented May 12, 2023

DavidKorczynski commented Jun 27, 2023

DavidKorczynski commented Sep 24, 2023

jonathanmetzman commented Sep 27, 2023

jonathanmetzman commented Sep 27, 2023

DavidKorczynski commented Sep 28, 2023

DavidKorczynski commented Sep 28, 2023

DavidKorczynski commented Oct 2, 2023

jonathanmetzman commented Oct 2, 2023

DavidKorczynski commented Oct 5, 2023

DavidKorczynski commented Oct 23, 2023

infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277

infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277

Conversation

DavidKorczynski commented May 8, 2023 • edited Loading

jonathanmetzman commented May 10, 2023

DavidKorczynski commented May 10, 2023 • edited Loading

jonathanmetzman commented May 12, 2023

DavidKorczynski commented Jun 27, 2023

DavidKorczynski commented Sep 24, 2023

jonathanmetzman commented Sep 27, 2023

jonathanmetzman commented Sep 27, 2023

DavidKorczynski commented Sep 28, 2023

DavidKorczynski commented Sep 28, 2023

DavidKorczynski commented Oct 2, 2023

jonathanmetzman commented Oct 2, 2023

DavidKorczynski commented Oct 5, 2023

DavidKorczynski commented Oct 23, 2023

DavidKorczynski commented May 8, 2023 •

edited

Loading

DavidKorczynski commented May 10, 2023 •

edited

Loading