Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrained L1 Tau Trigger Integration #41492

Merged
merged 10 commits into from
Jun 22, 2023
Merged

Conversation

Duchstf
Copy link
Contributor

@Duchstf Duchstf commented May 2, 2023

PR description:

Hello, this PR updates the Tau NN training in the L1 trigger. The main changes include:

  • Adding HW option for the tau as well as the new e/m seeding for tau jets.
  • Updated Tau NN training with the lastest training samples.
  • GT formatting for Tau Jets.

PR validation:

The improvements in performance is documented in the following talks:

The new cmssw performance has been cross-checked by the L1 team and the results are included in the 2023 L1 Annual Review.

The original PR with related discussions/checks is here: cms-l1t-offline#1087

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41492/35361

ERROR: Build errors found during clang-tidy run.

L1Trigger/Phase2L1ParticleFlow/plugins/L1NNTauProducer.cc:81:17: error: use of undeclared identifier 'TauNNTFCache' [clang-diagnostic-error]
std::unique_ptr<TauNNTFCache> L1NNTauProducer::initializeGlobalCache(const edm::ParameterSet& cfg) {
                ^
L1Trigger/Phase2L1ParticleFlow/plugins/L1NNTauProducer.cc:84:10: error: no viable conversion from returned value of type 'typename _MakeUniq<SessionCache>::__single_object' (aka 'unique_ptr<tensorflow::SessionCache>') to function return type 'int' [clang-diagnostic-error]
  return std::make_unique<tensorflow::SessionCache>(graphPath);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L1Trigger/Phase2L1ParticleFlow/plugins/L1NNTauProducer.cc:115:7: warning: the 'empty' method should be used to check for emptiness instead of 'size' [readability-container-size-empty]
--
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02783/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/lib/gcc/x86_64-redhat-linux-gnu/11.2.1/../../../../include/c++/11.2.1/bits/unique_ptr.h:962:34: error: no matching constructor for initialization of 'TauNNId' [clang-diagnostic-error]
    { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
                                 ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~
L1Trigger/Phase2L1ParticleFlow/plugins/L1NNTauProducer.cc:75:22: note: in instantiation of function template specialization 'std::make_unique<TauNNId, const char *, const tensorflow::SessionCache *&, std::basic_string<char> &, int &>' requested here
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@aloeliger
Copy link
Contributor

@Duchstf I have been searching both the L1 fork and CMSSW and cannot find any instance of a TauNNTFCache does this code exist in the integration branch somewhere that I have missed? Is this something that you have introduced?

@qvyz
Copy link
Contributor

qvyz commented May 23, 2023

@Duchstf I have been searching both the L1 fork and CMSSW and cannot find any instance of a TauNNTFCache does this code exist in the integration branch somewhere that I have missed? Is this something that you have introduced?

Hi, I was running in this issue when trying to test the GT emulator:
The definition of TauNNTFCache is present in the snapshot branch 3 in the l1t offline repo.
(https://github.com/cms-l1t-offline/cmssw/blob/9fc02faa5eff9a7ad19bb9cd43e6c0cfb9f78867/L1Trigger/Phase2L1ParticleFlow/interface/TauNNId.h#L8-L11) maybe this file was forgotten

@aloeliger
Copy link
Contributor

@Duchstf I have been searching both the L1 fork and CMSSW and cannot find any instance of a TauNNTFCache does this code exist in the integration branch somewhere that I have missed? Is this something that you have introduced?

Hi, I was running in this issue when trying to test the GT emulator: The definition of TauNNTFCache is present in the snapshot branch 3 in the l1t offline repo. (https://github.com/cms-l1t-offline/cmssw/blob/9fc02faa5eff9a7ad19bb9cd43e6c0cfb9f78867/L1Trigger/Phase2L1ParticleFlow/interface/TauNNId.h#L8-L11) maybe this file was forgotten

The blame shows that particular structure was committed 3 years ago. How did this never make it into CMSSW? Let me do some investigation but this might have to wait until we have a fixing commit present in CMSSW.

@aloeliger
Copy link
Contributor

aloeliger commented May 26, 2023

Okay, this particular structure was in CMSSW and actually removed, via agreement about 5 months ago in #40333. It seems like this change/removal was never backported, and this branch was never properly rebased onto a more recent branch with these changes, and adapted for them.

@Duchstf @EmyrClement You need to rebase these changes on to a more recent branch, and you will very likely need to fix some merge conflicts, retest the compilation of the emulator, and adapt it to the changes that have happened in the main fork.

something like:

cmsrel CMSSW_13_2_0_pre1
cd CMSSW_13_2_0_pre1/src/
cmsenv && git cms-init
git cms-rebase -u Duchstf:Tau_L1_2023_cmssw

at which point you will likely need to introduce some fixes, and make sure it all compiles again. Please make sure this work is backed up in a backup branch in case anything goes wrong!. Once it is properly adjusted, try pushing the changes back here.

If this was rebased already, it was not tested after rebasing, and commits need to be added to account for the current state of the repository.

Just one of the perils of trying to maintain a parallel git fork/branch I guess.

@Duchstf
Copy link
Contributor Author

Duchstf commented May 26, 2023

@aloeliger Thank you so much for tracking down the error!!

@aloeliger
Copy link
Contributor

CMSSW_13_2_X_2023-06-20-2300 is broken, we should wait until the IB comes.

I see CMSSW_13_2_X_2023-06-21-1100 and CMSSW_13_2_X_2023-06-21-1600 in the tags. Out of curiosity, when does buildbot pick up a new tag for IB's?

@smuzaffar
Copy link
Contributor

when these builds are visible on IB dashboard.

  • CMSSW_13_2_X_2023-06-21-1100 was tagged but I aborted the build as Revert "Powheg+Vincia matching" #42020 was not merged
  • CMSSW_13_2_X_2023-06-21-1600 started an hour ago and should be available for test in 3 hours

@srimanob
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-703c17/33310/summary.html
COMMIT: 70ee234
CMSSW: CMSSW_13_2_X_2023-06-21-1600/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41492/33310/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
25034.114 step 2
25034.21 step 2
25034.9921 step 2
25034.999 step 2
25034.99 step 2
25061.97 step 1
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3200270
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3200245
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@AdrianoDee
Copy link
Contributor

+upgrade

@aloeliger
Copy link
Contributor

+l1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

Can someone explain why hundreds of magic numbers are added to a dataformats class? This looks extremely suboptimal. At the very least it should be configurable, but in the best case scenario this should come from the CondDB.

@srimanob
Copy link
Contributor

I would propose to open git issue to follow up on performance and code cleaning. I agree that some part can migrate to db or cms data. However, this PR does not introduce new data file, but updates the one which are in the release already.

@aloeliger
Copy link
Contributor

aloeliger commented Jun 22, 2023

@rappoccio What you are probably referring to are the neural network weights for the phase 2 taus? This phase 2 project was started before L1 settled on a definite HLS4ML emulation strategy developed for CICADA and so I think just tried to insert the firmware model into CMSSW as-is, so updating a training relies on changing these weights this way.

@EmyrClement can comment more than i can, but I believe that it is on the to-do list to migrate emulation strategies eventually.

EDIT: was on my phone and typo'd

@Duchstf
Copy link
Contributor Author

Duchstf commented Jun 22, 2023

Can someone explain why hundreds of magic numbers are added to a dataformats class? This looks extremely suboptimal. At the very least it should be configurable, but in the best case scenario this should come from the CondDB

Hello, I'm just replacing the weights of the neural net in this case (with no addition of new files). In terms of migrating to the external hls4ml library it is on the to-do list and the current NN emulation would be moved to hls4ml eventually.

@rappoccio
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.