Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix weights to be identical to original commit #3

Merged
merged 1 commit into from
Mar 15, 2023

Conversation

ssrothman
Copy link
Contributor

When updating the model files in the previous PR I accidentally used weights from a different training. The physics performance between the two trainings is identical, but for consistency this PR reverts the weights to be identical with the original training.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

A new Pull Request was created by @ssrothman for branch main.

@smuzaffar, @aandvalenzuela, @iarspider, @clacaputo, @cmsbuild, @mandrenguyen can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@kpedro88
Copy link
Contributor

kpedro88 commented Mar 2, 2023

test parameters

  • workflows = 10805.31

@kpedro88
Copy link
Contributor

kpedro88 commented Mar 2, 2023

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eec557/31014/summary.html
COMMIT: 38d5cd0
CMSSW: CMSSW_13_1_X_2023-03-02-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-data/RecoEgamma-EgammaPhotonProducers/3/31014/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eec557/31014/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eec557/31014/git-merge-result

Comparison Summary

Summary:

  • You potentially added 10 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3632285
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3632261
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 219 log files, 169 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link

perrotta commented Mar 3, 2023

please test

  • wf 10805.31 shows now differences wrt CMSSW_13_1_X_2023-03-02-1100 plus the original PRs that this one is supposed to fix: I would have expected none...
  • Let try to test once again on top of a clean CMSSW_13_1_X_2023-03-02-2300: here I would expect to see reverted the differences observed for #40814

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eec557/31022/summary.html
COMMIT: 38d5cd0
CMSSW: CMSSW_13_1_X_2023-03-02-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-data/RecoEgamma-EgammaPhotonProducers/3/31022/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eec557/31022/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eec557/31022/git-merge-result

Comparison Summary

Summary:

  • You potentially added 8 lines to the logs
  • Reco comparison results: 14 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3632285
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3632257
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 219 log files, 169 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link

perrotta commented Mar 3, 2023

@ssrothman @kpedro88

@perrotta
Copy link

perrotta commented Mar 7, 2023

@ssrothman @kpedro88 any news on this? None of you was present at the today's ORP meeting, where this issue was addressed...
Is there some better understanding of the issue by now?

@kpedro88
Copy link
Contributor

kpedro88 commented Mar 7, 2023

@perrotta apologies, I'm away at a workshop this week.

We also saw this behavior in our private tests last week and realized that there is some randomness inherent to the network itself. The random behavior has been there all along, but http://github.com/cms-sw/cmssw/pull/40814 may actually have been the first time that comparison tests were run for 10805.31 (since it is not part of the short matrix), so it wasn't noticed before. This PR does correctly restore the original weights, but we need to make some more changes to make the network deterministic (this is a work in progress right now).

@perrotta
Copy link

@ssrothman @kpedro88
As a cross-check, I made bot running the comparisons for the wf 10805.31 in another PR which is completely uncorrelated from the update of these weights. And the outcome is exactly as what you say here, that is there is some randomness in the results which is inherent to the network itself. The "randomness" seems quite significant, i.e. not just a numerical fluctuation in the last digi somewhere.

Therefore:

  • Let merge this PR with the correct weights, as you verified independently
  • Open a github issue to take record of the progresses in the solution of this "randomness" problem

@perrotta
Copy link

+1

@perrotta
Copy link

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants