-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Improved Egamma PFID model selection consistency #38356
[bugfix] Improved Egamma PFID model selection consistency #38356
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38356/30544
|
A new Pull Request was created by @valsdav (Davide Valsecchi) for master. It involves the following packages:
@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
mvaOutput.dnn_e_bkgPhoton = values[4]; | ||
} else { | ||
mvaOutput.dnn_e_sigIsolated = values[0]; | ||
if (iModel <= 3) { // models 0,1,2,3 have 5 outpus in this version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor typo in comment
if (iModel <= 3) { // models 0,1,2,3 have 5 outpus in this version | |
if (iModel <= 3) { // models 0,1,2,3 have 5 outputs in this version |
} else { | ||
mvaOutput.dnn_e_sigIsolated = values[0]; | ||
if (iModel <= 3) { // models 0,1,2,3 have 5 outpus in this version | ||
mvaOutput.dnn_e_sigIsolated = values.at(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each at
call will check the size of the container. That isn't the most efficient. Instead I'd suggest adding
assert(values.size() == 5)
at the beginning of the if and then just use []
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the assert
and removed the .at()
. We wanted a way to be sure that the code crashes if there is a model index misconfiguration, and the assert is a good choice. Thanks.
Thanks for making the change! |
The model index used to evaluate the candidate is now saved in the DNNHelper output and used in the producer to select how many DNN outputs should be saved, without performing again the pt/eta binning. Moreover the eta selection is now performed with SuperCluster.eta instead of Electron.eta.
65004d6
to
acc34c7
Compare
@cmsbuild please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38356/30545
|
Pull request #38356 was updated. @jpata, @clacaputo, @slava77 can you please check and sign again. |
-1 Failed Tests: RelVals-INPUT RelVals-INPUTThe relvals timed out after 4 hours. Comparison SummarySummary:
|
urgent |
please test |
Dear @qliphy I think something broke in the tests.. shall we restart them? |
please abort |
please test |
@valsdav do you expect any physics differences in the MVA? Is a larger-scale validation possible? |
You haven't actually changed the model, right? Shouldn't the validation be identical? |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7fa79a/25506/summary.html Comparison SummarySummary:
|
@kdlong @jpata So only at boundaries (barrel-endcap, endcap-extended-endcap) we might expect some minor differences for electrons that are in barrel according to supercluster.eta() but say, in endcap according to eta(). We thus expect no significant physics differences. Given we are at the deadline for 12_4_0, we would like to know if this can be merged without a full-scale validation. We can still parallelly start a full-scale validation, but based on our experience with crab from last time, this could take a week. |
Thanks for the summary. I'm fine with this explanation. There are small differences in the MVA output due to the bugfix, and it should be validated separately, but let's proceed anyway. BTW: this didn't show up in the previous large-scale validation, right? Did any of the jobs crash? |
@jpata No crashes were reported in the final validation. We did see some initial crashes but once crab was fixed, all went fine. So the crashes were crab specific. To make a note of it, I want the stress that only way a heap-buffer-overflow could have occured in the our earlier MVA code would have been when an electron is in |eta|>2.65 according to ele.eta(), but it is in |eta|<2.65 according to ele->supercluster.eta(). This is because the only model that has a different number of nodes is the model in |eta|>2.65. Already the efficiency of electrons is in this region is low, and then the chance the above condition being satisfied is even lower, maybe that is why we never saw this. This PR will fix it though. |
Could you please also open a backport to 12_4? |
@@ -49,8 +49,9 @@ namespace egammaTools { | |||
// which has access to all the variables. | |||
std::pair<uint, std::vector<float>> getScaledInputs(const std::map<std::string, float>& variables) const; | |||
|
|||
std::vector<std::vector<float>> evaluate(const std::vector<std::map<std::string, float>>& candidates, | |||
const std::vector<tensorflow::Session*>& sessions) const; | |||
std::vector<std::pair<uint, std::vector<float>>> evaluate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but I think the same comment applies here as was suggested at the DeepSC PR:
these kind of supernested structures are easy to write down but hard to reason about later. It would be better to define classes that encapsulate the required data.
+reconstruction
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
This PR solves the issue #38175.
The crash happened because the model selection by "eta" requirement was different in the ElectronDNNEstimator and in the GsfElectronProducer (one was using electron.eta, the other superCluster.eta). Now the model index is directly passed from the DNNHelper evaluator to the caller code, ensuring the consistency in the number of outputs. (Following comment #38175 (comment))
Moreover the electron model selection is now performed correctly with SuperCluster.eta instead of Electron.eta.
PR Validation:
The PR has been validated with local tests.
Release notes:
This is urgently needed for the 12_4_0 release.