Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra: base-runner: coverage: set max parallel jobs to be half of CPU count #10277

Merged
merged 2 commits into from
Nov 23, 2023

Conversation

DavidKorczynski
Copy link
Collaborator

@DavidKorczynski DavidKorczynski commented May 8, 2023

The current number of parallel fuzzers running is set to the number of available CPUs. This is causing issues in Tensorflow:

Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4501 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: decode_compressed_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4873 Killed                  lvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LL
VM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport
Step #5: /usr/local/bin/coverage: line 75:  4897 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: saved_model_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4638 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:05,246 INFO] Finding shared libraries for targets (if any).
...
Step #5: [2023-05-08 11:57:09,911 INFO] Finished finding shared libraries for targets.
Step #5: /usr/local/bin/coverage: line 75:  4276 Killed                  llvm-cov expor -summary-only -instr-profile=$profdata_file -object=$target $shared_libraries $LLVM_COV_COMMON_ARGS > 
$FUZZER_STATS_DIR/$target.json
Step #5: /usr/local/bin/coverage: line 75:  5450 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:40,282 INFO] Finding shared libraries for targets (if any).
Step #5: [2023-05-08 11:57:40,323 INFO] Finished finding shared libraries for targets.
Step #5: error: end_to_end_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!

log (don't open in a browser but wget/curl it, as it's quite a large file and will probably annoy the browser).
I assume this is because the fuzzers take up lots of the memory. A Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow, so I think the artifacts read by llvm-profdata merge will eat up memory, which consequently starts to crash processes on the system.

I could imagine this happens for more projects with many fuzzers of large size?

@DavidKorczynski DavidKorczynski changed the title infra: base-runner: set max parallel jobs to be half of CPU count infra: base-runner: coverage: set max parallel jobs to be half of CPU count May 8, 2023
@jonathanmetzman
Copy link
Contributor

Ugh, don't love that this will slow everyone else down.

@DavidKorczynski
Copy link
Collaborator Author

DavidKorczynski commented May 10, 2023

Could we do something a bit hacky and say "if the project has above X fuzzers then use only a smaller set of CPUs", where the point is many fuzzers has a higher chance of exhausting the memory? We could e.g. set X to ~20 -- this would affect a limited number of projects?

Or, do another thing of checking the disk size of all the fuzzers, and if they are large in size we could reduce the number of parallel jobs?

@jonathanmetzman
Copy link
Contributor

Could we do something a bit hacky and say "if the project has above X fuzzers then use only a smaller set of CPUs", where the point is many fuzzers has a higher chance of exhausting the memory? We could e.g. set X to ~20 -- this would affect a limited number of projects?

Or, do another thing of checking the disk size of all the fuzzers, and if they are large in size we could reduce the number of parallel jobs?

Yeah capping it is probably fine!

@DavidKorczynski
Copy link
Collaborator Author

Yeah capping it is probably fine!

I capped it at 10 (inclusive)!

This affects ~90 judging by the numbers here https://introspector.oss-fuzz.com/projects-overview

@DavidKorczynski
Copy link
Collaborator Author

@jonathanmetzman could you take a look at this? I'm still interested in seeing this merged as it's still affecting the Tensorflow coverage runs.

@jonathanmetzman
Copy link
Contributor

/gcbrun trial_build.py all --fuzzing-engines libfuzzer --sanitizers introspector

@jonathanmetzman
Copy link
Contributor

If this works, let me know and I will land it.

@DavidKorczynski
Copy link
Collaborator Author

Trial builds failed because of:

(Reading database ... 18157 files and directories currently installed.)
Step #1: Step #4: #5 77.28 Preparing to unpack .../nodejs_19.9.0-deb-1nodesource1_amd64.deb ...
Step #1: Step #4: #5 77.29 Unpacking nodejs (19.9.0-deb-1nodesource1) ...
Step #1: Step #4: #5 81.71 Setting up nodejs (19.9.0-deb-1nodesource1) ...
Step #1: Step #4: #5 81.75 + npm install --global npm
Step #1: Step #4: #5 82.49 npm ERR! code EBADENGINE
Step #1: Step #4: #5 82.50 npm ERR! engine Unsupported engine
Step #1: Step #4: #5 82.50 npm ERR! engine Not compatible with your version of node/npm: npm@10.1.0
Step #1: Step #4: #5 82.50 npm ERR! notsup Not compatible with your version of node/npm: npm@10.1.0
Step #1: Step #4: #5 82.50 npm ERR! notsup Required: {"node":"^18.17.0 || >=20.5.0"}
Step #1: Step #4: #5 82.50 npm ERR! notsup Actual:   {"npm":"9.6.3","node":"v19.9.0"}
Step #1: Step #4: #5 82.50 
Step #1: Step #4: #5 82.50 npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2023-09-27T21_45_38_375Z-debug-0.log
Step #1: Step #4: #5 ERROR: executor failed running [/bin/sh -c install_javascript.sh]: exit code: 1
Step #1: Step #4: ------
Step #1: Step #4:  > importing cache manifest from gcr.io/oss-fuzz-base/base-builder-javascript-testing-davidkorczynski-patch-9:
Step #1: Step #4: ------

This is due to a previous issue: #10936

Rebasing

The current number of parallel fuzzers running is set to the number of available CPUs. This is causing issues in Tensorflow:

```
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4501 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: decode_compressed_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4873 Killed                  lvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LL
VM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport
Step #5: /usr/local/bin/coverage: line 75:  4897 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: saved_model_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4638 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:05,246 INFO] Finding shared libraries for targets (if any).
...
Step #5: [2023-05-08 11:57:09,911 INFO] Finished finding shared libraries for targets.
Step #5: /usr/local/bin/coverage: line 75:  4276 Killed                  llvm-cov expor -summary-only -instr-profile=$profdata_file -object=$target $shared_libraries $LLVM_COV_COMMON_ARGS >
$FUZZER_STATS_DIR/$target.json
Step #5: /usr/local/bin/coverage: line 75:  5450 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:40,282 INFO] Finding shared libraries for targets (if any).
Step #5: [2023-05-08 11:57:40,323 INFO] Finished finding shared libraries for targets.
Step #5: error: end_to_end_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
```
I assume this is because the fuzzers take up lots of the memory. A Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow.

caps max processes at 10

Signed-off-by: David Korczynski <david@adalogics.com>
@DavidKorczynski
Copy link
Collaborator Author

@jonathanmetzman could you rerun the trial build? The previous one failed due to an old issue #10936 and I rebased now, this should be working now.

@DavidKorczynski
Copy link
Collaborator Author

Friendly ping @jonathanmetzman

@jonathanmetzman
Copy link
Contributor

/gcbrun trial_build.py all --fuzzing-engines libfuzzer --sanitizers introspector

@DavidKorczynski
Copy link
Collaborator Author

@jonathanmetzman the failures on the trial run are all generic build failures and not related to this PR:


Step #1: INFO:root:gpsd, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-4ecfd7aa-d300-4919-86f6-716eac4d9735.txt

Is a generic build failure:

Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1mFuzzPacket.c:48:26: �[0m�[0;1;31merror: �[0m�[1mtoo few arguments to function call, expected 2, have 1�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":         lexer_init(&lexer);
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m        ~~~~~~~~~~       ^

The build script is now working again following fix in #11051


Step #1: INFO:root:hdf5, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-9a00292a-4809-47a0-97bf-c56f198741e5.txt

Looks unrelated, I think this was also due to a broken build which was fixed in #11037

Step #22 - "compile-libfuzzer-introspector-x86_64": [Log level 1] : 16:46:01 : Ended wrapping all functions
Step #22 - "compile-libfuzzer-introspector-x86_64": [Log level 1] : 16:46:02 : Finished introspector module
Step #22 - "compile-libfuzzer-introspector-x86_64": /usr/bin/ld.gold: warning: LLVM gold plugin: stack frame size (43624) exceeds limit (16384) in function 'H5Z__xform_eval_full'
Step #22 - "compile-libfuzzer-introspector-x86_64": + zip -j /workspace/out/libfuzzer-introspector-x86_64/h5_extended_fuzzer_seed_corpus.zip '/src/hdf5/test/*.h5'
Step #22 - "compile-libfuzzer-introspector-x86_64": 	zip warning: name not matched: /src/hdf5/test/*.h5
Step #22 - "compile-libfuzzer-introspector-x86_64": 
Step #22 - "compile-libfuzzer-introspector-x86_64": zip error: Nothing to do! (/workspace/out/libfuzzer-introspector-x86_64/h5_extended_fuzzer_seed_corpus.zip)
Step #22 - "compile-libfuzzer-introspector-x86_64": ********************************************************************************

Step #1: INFO:root:libzip, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-f3e9fe13-7ef4-4f8b-84fb-dfec4129c926.txt

Generic build failure:

Step #22 - "compile-libfuzzer-introspector-x86_64": RUSTFLAGS=--cfg fuzzing -Zsanitizer=introspector -Cdebuginfo=1 -Cforce-frame-pointers
Step #22 - "compile-libfuzzer-introspector-x86_64": ---------------------------------------------------------------
Step #22 - "compile-libfuzzer-introspector-x86_64": + /src/libzip/regress/ossfuzz.sh
Step #22 - "compile-libfuzzer-introspector-x86_64": /src/build.sh: line 18: /src/libzip/regress/ossfuzz.sh: No such file or directory

Step #1: INFO:root:ntopng, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-60ddc4df-f05b-4c4a-ac78-069bce1f84ec.txt
Generic build failure:

command-line-argument -fsanitize=fuzzer-no-link -stdlib=libc++ -g -I/src/ntopng -I/src/ntopng/include -Wno-address-of-packed-member -Wno-unused-function -I/usr/local/include -I/src/ntopng/third-party/http-client-c/src/  -I/usr/include/openssl  -DDATA_DIR='"/usr/local/share"'  -c src/nDPIStats.cpp -o src/nDPIStats.o
Step #22 - "compile-libfuzzer-introspector-x86_64": �[1msrc/NetworkInterface.cpp:12391:7: �[0m�[0;1;31merror: �[0m�[1munknown type name 'flow_interfaces_stats'; did you mean 'sFlowInterfaceStats'?�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":   if (flow_interfaces_stats) {
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m      ^~~~~~~~~~~~~~~~~~~~~
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[0;32m      sFlowInterfaceStats
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1m/src/ntopng/include/ntop_typedefs.h:731:3: �[0m�[0;1;30mnote: �[0m'sFlowInterfaceStats' declared here�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64": } sFlowInterfaceStats;
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m  ^
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1msrc/NetworkInterface.cpp:12391:28: �[0m�[0;1;31merror: �[0m�[1mexpected unqualified-id�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":   if (flow_interfaces_stats) {
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m                           ^
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0m�[1msrc/NetworkInterface.cpp:12393:5: �[0m�[0;1;31merror: �[0m�[1muse of undeclared identifier 'flow_interfaces_stats'�[0m
Step #22 - "compile-libfuzzer-introspector-x86_64":     flow_interfaces_stats->luaDeviceList(vm);
Step #22 - "compile-libfuzzer-introspector-x86_64": �[0;1;32m    ^

Step #1: INFO:root:wget2, FAILURE, https://oss-fuzz-gcb-logs.storage.googleapis.com/log-5ae9972c-4f5c-47ed-8860-71f7ac07cc8d.txt

Generic build failure which as since been fixed in #11049

Step #22 - "compile-libfuzzer-introspector-x86_64": ./bootstrap: autopoint --force
Step #22 - "compile-libfuzzer-introspector-x86_64": autopoint: *** The AM_GNU_GETTEXT_VERSION declaration in your configure.ac
Step #22 - "compile-libfuzzer-introspector-x86_64":                file requires the infrastructure from gettext-0.21 but this version
Step #22 - "compile-libfuzzer-introspector-x86_64":                is older. Please upgrade to gettext-0.21 or newer.
Step #22 - "compile-libfuzzer-introspector-x86_64": autopoint: *** Stop.
Step #22 - "compile-libfuzzer-introspector-x86_64": ./bootstrap: could not generate auxiliary files
Step #22 - "compile-libfuzzer-introspector-x86_64": ********************************************************************************
Step #22 - "compile-libfuzzer-introspector-x86_64": Failed to build.

@DavidKorczynski
Copy link
Collaborator Author

friendly ping @jonathanmetzman

@oliverchang oliverchang merged commit f716590 into master Nov 23, 2023
@oliverchang oliverchang deleted the DavidKorczynski-patch-9 branch November 23, 2023 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants