Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AlpakaBackendProducer and AlpakaBackendFilter #44387

Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 13, 2024

PR description:

Implement AlpakaBackendProducer, an empty alpaka-based EDProducer whose only purpose is to save in the event what alpaka backend has been used.

Implement AlpakaBackendFilter, an EDFilter that selects events based on the alpaka backend used to run a previous producer.

Implement a unit test for both modules.

The aim is to replace this SwitchProducer in the HLT menu

process.statusOnGPU = SwitchProducerCUDA(
   cpu = cms.EDProducer( "BooleanProducer",
       value = cms.bool( False )
   ),
  cuda = cms.EDProducer( "BooleanProducer",
       value = cms.bool( True )
   ),
 )

process.statusOnGPUFilter = cms.EDFilter( "BooleanFilter",
    src = cms.InputTag( "statusOnGPU" )
)

process.Status_OnCPU = cms.Path( process.statusOnGPU + ~process.statusOnGPUFilter )
process.Status_OnGPU = cms.Path( process.statusOnGPU + process.statusOnGPUFilter )

with this alpaka-based solution

process.hltBackend = cms.EDProducer('AlpakaBackendProducer@alpaka')

process.hltStatusOnGPUFilter = cms.EDFilter('AlpakaBackendFilter',
  producer = cms.InputTag('hltBackend', 'backend'),
  backends = cms.vstring('CudaAsync', 'ROCmAsync')
)

process.Status_OnCPU = cms.Path( process.hltBackend + ~process.hltStatusOnGPUFilter )
process.Status_OnGPU = cms.Path( process.hltBackend + process.hltStatusOnGPUFilter )

PR validation:

The new unit test passes.

Backport status

To be backported to 14.0.x for data taking.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 13, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 13, 2024

@makortel could you have a look an let me know if you have any suggestions ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 13, 2024

@mmusich @Martin-Grunewald @missirol FYI

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 13, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 13, 2024

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44387/39449

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaCore (heterogeneous)

@makortel, @fwyzard can you please review it and eventually sign? Thanks.
@missirol, @makortel, @rovere this is something you requested to watch as well.
@rappoccio, @antoniovilela, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4d4957/38097/summary.html
COMMIT: 1f3f9fe
CMSSW: CMSSW_14_1_X_2024-03-12-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/44387/38097/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 11 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 503
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 39237
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

Implement AlpakaBackendProducer, an empty alpaka-based EDProducer whose
only purpose is to save in the event what alpaka backend has been used.

Implement AlpakaBackendFilter, an EDFilter that selects events based on
the alpaka backend used to run a previous producer.

Implement a unit test for both modules.
@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 15, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 15, 2024

please test

@cmsbuild
Copy link
Contributor

Pull request #44387 was updated. can you please check and sign again.

@antoniovilela
Copy link
Contributor

+1

@smuzaffar
Copy link
Contributor

@smuzaffar I've updated the test to use the SCRAM_ALPAKA_BACKEND environment variable, if set.

@fwyzard , looks like build rule change worked. I see SCRAM_ALPAKA_BACKEND was properly set for testAlpakaBackendFilterCudaAsync

Testing the alpaka backend CudaAsync using the process accelerator gpu-nvidia

@antoniovilela
Copy link
Contributor

-orp

  • Signed before tests run by mistake. Will re-sign later.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4d4957/38177/summary.html
COMMIT: d51e58e
CMSSW: CMSSW_14_1_X_2024-03-15-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/44387/38177/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 108 lines to the logs
  • Reco comparison results: 46 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3297383
  • DQMHistoTests: Total failures: 9
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3297354
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 202 log files, 165 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 19 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 670
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 39070
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 15, 2024

@fwyzard , looks like build rule change worked.

Yep, I also tested it locally with a recent IB, and it worked correctly.

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit ec85b46 into cms-sw:master Mar 16, 2024
14 checks passed
<flags EDM_PLUGIN="1"/>
</library>

<library file="alpaka/*.cc" name="HeterogeneousCoreAlpakaTestPluginsPortable">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem lies here, duplicate plugin name wrt

<library file="alpaka/*.cc" name="HeterogeneousCoreAlpakaTestPluginsPortable">

@makortel
Copy link
Contributor

#44445 should fix

@antoniovilela
Copy link
Contributor

@fwyzard @makortel
Do you understand why the unit test is still failing?

@smuzaffar
Copy link
Contributor

smuzaffar commented Mar 19, 2024

@antoniovilela , unit tests only fail in patch IBs. This is because we still have old duplicates in the full IB. Errors should go away when we will full IB (hopefully tonight as I want to merge cms-sw/cmsdist#9080 for tonight IB)

@antoniovilela
Copy link
Contributor

@antoniovilela , unit tests only fail in patch IBs. This is because we still have old duplicates in the full IB. Errors should go away when we will full IB (hopefully tonight as I want to merge cms-sw/cmsdist#9080 for tonight IB)

Thanks.

@fwyzard fwyzard deleted the implement_AlpakaBackendFilter_14_1_x branch March 19, 2024 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants