-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new ECAL DQM GpuTask to monitor and compare CPU and GPU generated ECAL RECO objects #36742
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36742/27844
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
A new Pull Request was created by @alejands (Alejandro Sanchez) for master. It involves the following packages:
@emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
I may have confused cms-bot by submitting the formatting patch before it could tell me there were formatting errors |
Maybe squash the two commits and push again. I guess that would trigger the code checks again. |
from Configuration.ProcessModifiers.gpu_cff import gpu | ||
from DQM.EcalMonitorTasks.GpuTask_cfi import ecalGpuTask | ||
|
||
gpu.toModify(ecalGpuTask.params, runGpuTask = cms.untracked.bool(True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that by setting runGpuTask
to True
here by default this will run the GPU to CPU comparison by default in the offline DQM if the gpu
modifier is given. I do not think this is what we want since it implies to run the CPU and the GPU reconstruction.
As discussed in #35879 we should use a different modifier that is only given when the GPU vs. CPU comparison should be done. @jfernan2 does such a modifier exist already from DQM? In the issue mentioned it was proposed to call it gpu-validation
.
…d Ecal reco objects
42f52e6
to
c26b2fc
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36742/27868
|
Pull request #36742 was updated. @emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please check and sign again. |
I've gone ahead and squashed the formatting commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not start variable names with an underscore _
.
Such names are reserved by the C++ standard (in the global namespace) and should not be used according to the CMSSW coding rules:
2.14
Do not use “_” as first character, except for user-defined suffixes (used in user-defined literals). Only use it as the last character for class data member names, not local variable names.
class GpuTask : public DQWorkerTask { | ||
public: | ||
GpuTask(); | ||
~GpuTask() override {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~GpuTask() override {} | |
~GpuTask() override = default; |
// Static cast to EB/EEDigiCollection when using | ||
// Defined as void pointers to make compiler happy | ||
void const* EBCpuDigis_; | ||
void const* EECpuDigis_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with
// Static cast to EB/EEDigiCollection when using | |
// Defined as void pointers to make compiler happy | |
void const* EBCpuDigis_; | |
void const* EECpuDigis_; | |
EBDigiCollection const* EBCpuDigis_; | |
EEDigiCollection const* EECpuDigis_; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that initially and got a compiler complaint at these lines:
cmssw/DQM/EcalMonitorTasks/src/GpuTask.cc
Lines 120 to 125 in c26b2fc
// Static cast to EB/EEDigiCollection during use | |
// Stored as void pointers to make compiler happy | |
if (iSubdet == EcalBarrel) | |
EBCpuDigis_ = &_cpuDigis; | |
else | |
EECpuDigis_ = &_cpuDigis; |
EBDigiCollection
and EEDigiCollection
cannot be static_cast
into each other. The compiler tries to protect from the situation where the function argument _cpuDigis
is one digi flavor and cast into the other. Doing the following will also result in a compiler error saying you can't cast EB/EE to EE/EB:
if (iSubdet == EcalBarrel)
EBCpuDigis_ = static_cast<EBDigiCollection const *>(&_cpuDigis);
else
EECpuDigis_ = static_cast<EEDigiCollection const *>(&_cpuDigis);
The compiler appears to learn that the template type DigiCollection
can be either flavor from the calls function calls here:
cmssw/DQM/EcalMonitorTasks/interface/GpuTask.h
Lines 53 to 62 in c26b2fc
case kEBCpuDigi: | |
if (_p && runGpuTask_) | |
runOnCpuDigis(*static_cast<EBDigiCollection const*>(_p), _collection); | |
return runGpuTask_; | |
break; | |
case kEECpuDigi: | |
if (_p && runGpuTask_) | |
runOnCpuDigis(*static_cast<EEDigiCollection const*>(_p), _collection); | |
return runGpuTask_; | |
break; |
and tries to ensure that every line can execute for either type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you do EECpuDigis_ = static_cast<EBDigiCollection const *>(&_cpuDigis);
and not EECpuDigis_ = static_cast<EEDigiCollection const *>(&_cpuDigis);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler does not "learn" anything, it simply tries to compile the code as it is written.
When you call GpuTask::runOnCpuDigis()
passing an EBDigiCollection
as the first argument, the whole function is compiled substituting EBDigiCollection
in place of the template type DigiCollection
, and becomes something like
void GpuTask::runOnCpuDigis(EBDigiCollection const& _cpuDigis, Collections _collection) {
MESet& meDigiCpu(MEs_.at("DigiCpu"));
MESet& meDigiCpuAmplitude(MEs_.at("DigiCpuAmplitude"));
int iSubdet(_collection == kEBCpuDigi ? EcalBarrel : EcalEndcap);
// Save CpuDigis for comparison with GpuDigis
// Static cast to EB/EEDigiCollection during use
// Stored as void pointers to make compiler happy
if (iSubdet == EcalBarrel)
EBCpuDigis_ = &_cpuDigis;
else
EECpuDigis_ = &_cpuDigis;
...
so both assignment EBCpuDigis_ = &_cpuDigis
and EECpuDigis_ = &_cpuDigis
must be valid.
If one is of type EBDigiCollection *
and the other is of type EEDigiCollection *
, it cannot compile - it doesn't matter that only one of the branches would be executed.
One way to fix the code could be to conditionally compile one of the two assignments, depending on the type of the digi collection:
if constexpr (std::is_same_v<DigiCollection, EBDigiCollection>) {
assert(iSubdet == EcalBarrel);
EBCpuDigis_ = &_cpuDigis;
} else {
assert(iSubdet == EcalEndcap);
EECpuDigis_ = &_cpuDigis;
}
The use of if constexpr
instead of a plain if
tells the compiler to evaluate the condition at compile time instead of runtime, and compile only the corresponding branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you do
EECpuDigis_ = static_cast<EBDigiCollection const *>(&_cpuDigis);
and notEECpuDigis_ = static_cast<EEDigiCollection const *>(&_cpuDigis);
?
@thomreis Sorry that was a typo in the comment. I did try it with the correct cast.
@fwyzard Thank you for the clarification and suggestion! I'll implement this along with the other suggestions.
DQM/EcalMonitorTasks/src/GpuTask.cc
Outdated
for (typename DigiCollection::const_iterator cpuItr(_cpuDigis.begin()); cpuItr != _cpuDigis.end(); ++cpuItr) { | ||
// EcalDataFrame is not a derived class of edm::DataFrame, but can take edm::DataFrame (digis) in the constructor | ||
EcalDataFrame cpuDataFrame(*cpuItr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a range for loop:
for (typename DigiCollection::const_iterator cpuItr(_cpuDigis.begin()); cpuItr != _cpuDigis.end(); ++cpuItr) { | |
// EcalDataFrame is not a derived class of edm::DataFrame, but can take edm::DataFrame (digis) in the constructor | |
EcalDataFrame cpuDataFrame(*cpuItr); | |
for (auto const& digis: _cpuDigis) { | |
// EcalDataFrame is not a derived class of edm::DataFrame, but can take edm::DataFrame (digis) in the constructor | |
EcalDataFrame cpuDataFrame(digis); |
DQM/EcalMonitorTasks/src/GpuTask.cc
Outdated
// EcalDataFrame is not a derived class of edm::DataFrame, but can take edm::DataFrame (digis) in the constructor | ||
EcalDataFrame cpuDataFrame(*cpuItr); | ||
|
||
for (int iSample = 0; iSample < 10; iSample++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to always loop from 0 to 10 ?
If not, please derive the maximum from the data themselves.
If yes, please replace 10
with a named constant, possibly defined in the data format itself, that makes it clear it is always safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-efc172/22491/summary.html GPU Comparison SummarySummary:
Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
Too many workflows, too little numbers... OK, I guess it's time to make some order (again) :-/
Yes, indeed. |
So a WF dedicated to ECAL GPU validation. I guess that would be good. I would do it in a separate PR though to get this one in so that it can be run manually already. |
So, I understand then that this PR is good to go and we will have a new one for the dedicated WF, right? |
Yes, that would be the idea. |
+1 |
In order for the HCAL GPU validation PR #36998 to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor fix: no need to delay this PR for it, but please take it into account, for example when making the PR with the updates anticipated bu @thomreis in #36742 (comment)
from Configuration.ProcessModifiers.gpuValidationEcal_cff import gpuValidationEcal | ||
from DQM.EcalMonitorTasks.ecalGpuTask_cfi import ecalGpuTask | ||
|
||
gpuValidationEcal.toModify(ecalGpuTask.params, runGpuTask = cms.untracked.bool(True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuValidationEcal.toModify(ecalGpuTask.params, runGpuTask = cms.untracked.bool(True)) | |
gpuValidationEcal.toModify(ecalGpuTask.params, runGpuTask = True) |
+operations |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
PPD has requested DQM subsystems to monitor several GPU-enabled collections being introduced in CMSSW_12_1_X. We have introduced a new EcalMonitorTask called GpuTask designed to take in CPU and GPU generated objects and produce plots comparing several object quantities for each run. Objects include ECAL Digis, Uncalibrared RecHits, and RecHits.
This task will not run by default on the regular Online DQM workflow.
PR validation:
Module was run with the runTheMatrix workflow 10842.512 and Digi and Uncalib RecHit plots look as expected. GPU RecHit plots showed up empty, which was expected due to GPU RecHits currently being disabled in CMSSW. To test functionality with large amounts of data, the module was run on some 2018 data (only CPU objects available) and plots look as expected.