Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exception handling in Worker::prePrefetchSelectionAsync() #44891

Merged
merged 1 commit into from
May 3, 2024

Conversation

makortel
Copy link
Contributor

@makortel makortel commented May 2, 2024

PR description:

This PR fixes a bug introduced in #44767 where a return when the worker is not selected was moved inside a lambda passed to convertException(). This PR fixes the function to return at the same spot as before if the worker is not selected. (this part fixes the ASAN failures in reported in #44888)

Other problem was that if implDoPrePrefetchSelection() throws an exception, the exception status needs to stored in the Worker::status_, and the success task must be destroyed. (this part changes the assertion failures reported in #44887 to exceptions)

Thanks to @Dr15Jones for helping to figure out the cause.

Fixes #44888
Fixes #44887
Resolves cms-sw/framework-team#912

PR validation:

Workflow 4.17 succeeded in CMSSW_14_1_ASAN_X_2024-05-01-2300. Workflow 1001.3 properly reports the exceptions.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44891/40153

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2024

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/Framework (core)

@smuzaffar, @Dr15Jones, @cmsbuild, @makortel can you please review it and eventually sign? Thanks.
@missirol, @wddgit this is something you requested to watch as well.
@rappoccio, @antoniovilela, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

test parameters:

  • enable_tests = threading
  • release = CMSSW_14_1_ASAN_X
  • workflows = 4.17, 1001.3, 1041.0

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

@cmsbuild, please test

@smuzaffar
Copy link
Contributor

@makortel , if you want 4.17, 1001.3, 1041.0 workflows in threaded mode then please update test parameters to also have workflow_threading = 4.17, 1001.3, 1041.0

@smuzaffar
Copy link
Contributor

also it is better to use please test for CMSSW_14_1_ASAN_X instead of using release = CMSSW_14_1_ASAN_X as test parameters. Using explicit please test will allow you to start default and ASAN tests in parallel :-)

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

@cmsbuild, please abort

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

test parameters:

  • enable_tests = threading
  • workflows_threading = 1001.3, 1041.0

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

@cmsbuild, please test

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

@cmsbuild, please test for CMSSW_14_1_ASAN_X

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

Thanks @smuzaffar. For these parallel ASAN tests is there no way to specify additional workflows? (in this case it probably won't matter, the ASAN failures were quite widespread)

@smuzaffar
Copy link
Contributor

Thanks @smuzaffar. For these parallel ASAN tests is there no way to specify additional workflows? (in this case it probably won't matter, the ASAN failures were quite widespread)

@makortel , no not for threaded mode. If you say please test workflows 1001.3, 1041.0 for CMSSW_14_1_ASAN_X then bot will run these extra workflows for only ASAN but not in threaded mode

@Dr15Jones
Copy link
Contributor

looks good

@makortel
Copy link
Contributor Author

makortel commented May 2, 2024

  • workflows_threading = 1001.3,1040.1,1041.0,1042.0

Note that with this PR these workflows are still expected to fail. But now they should report the exception properly (that will be followed up separately) instead of dying in assertion failure.

@makortel makortel changed the title Fix Worker::prePrefetchSelectionAsync() to return early when not selected Fix exception handling in Worker::prePrefetchSelectionAsync() May 2, 2024
@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2024

-1

Failed Tests: RelVals-THREADING
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df182c/39221/summary.html
COMMIT: c90b071
CMSSW: CMSSW_14_1_ASAN_X_2024-05-01-2300/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44891/39221/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-THREADING

  • 1041.01041.0_RunExpressPhy2017F/step3_RunExpressPhy2017F.log
  • 1040.11040.1_RunExpressPhy2017F/step3_RunExpressPhy2017F.log
  • 1042.01042.0_RunExpressPhy2017F/step3_RunExpressPhy2017F.log
Expand to see more relval errors ...

@Dr15Jones
Copy link
Contributor

Note: for a future PR we should modify the exception message to include the module label.

@Dr15Jones
Copy link
Contributor

All the failing workflows are expected to fail as this PR is meant to find these problems (which were previously hidden)

@makortel
Copy link
Contributor Author

makortel commented May 3, 2024

The 4 workflows report now exceptions properly

@makortel
Copy link
Contributor Author

makortel commented May 3, 2024

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor Author

makortel commented May 3, 2024

test parameters:

@makortel
Copy link
Contributor Author

makortel commented May 3, 2024

@cmsbuild, please test

Maybe one more round to have "clean tests"

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-df182c/39226/summary.html
COMMIT: c90b071
CMSSW: CMSSW_14_1_X_2024-05-02-2300/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44891/39226/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 5004fc6 into cms-sw:master May 3, 2024
19 of 20 checks passed
@makortel makortel deleted the fixPrePrefetchSelectionAsync branch May 3, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment