Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ALPAKA_STATIC_ACC_MEM_GLOBAL with HOST_DEVICE_CONSTANT #47108

Merged
merged 2 commits into from
Jan 16, 2025

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 15, 2025

PR description:

Replace ALPAKA_STATIC_ACC_MEM_GLOBAL with HOST_DEVICE_CONSTANT.

ALPAKA_STATIC_ACC_MEM_GLOBAL gets a different, more complex syntax in alpaka 1.2.0 (in order to support Intel oneAPI).
It also has a slightly different meaning, providing global symbols that can be device-memcpy'ed over from the host, while HOST_DEVICE_CONSTANT supports only plain constants.

PR validation:

None.

No changes expected.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 15, 2025

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 15, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 15, 2025

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 15, 2025

This is a prerequisite before updating from alpaka 1.1.0 to 1.2.0 in CMSSW 15.0.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47108/43297

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • RecoLocalTracker/SiPixelClusterizer (reconstruction)
  • RecoTracker/LSTCore (reconstruction)

@jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @dkotlins, @felicepantaleo, @ferencek, @gpetruc, @missirol, @mmusich, @mroguljic, @mtosi, @rovere, @threus, @tsusa, @tvami this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build HeaderConsistency ClangBuild
Size: This PR adds an extra 252KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f185e/43769/summary.html
COMMIT: a3c73c3
CMSSW: CMSSW_15_0_X_2025-01-14-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47108/43769/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f185e/43769/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f185e/43769/git-merge-result

Build

I found compilation error when building:

>> Package Geometry/HcalTowerAlgo built
>> Subsystem Geometry built
Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1884: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a] Error 1
Copying tmp/el8_amd64_gcc12/src/RecoLocalTracker/SiPixelClusterizer/plugins/RecoLocalTrackerSiPixelClusterizerPluginsPortableROCmAsync/libRecoLocalTrackerSiPixelClusterizerPluginsPortableROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a to productstore area:
>> Entering Package RecoLocalTracker/SiPixelClusterizer
>> Leaving Package RecoLocalTracker/SiPixelClusterizer
>> Package RecoLocalTracker/SiPixelClusterizer built


Clang Build

I found compilation error while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' /usr/bin/time -v scram build -k -j 32 COMPILER='llvm compile'

>> Creating project symlinks
>> Entering Package Geometry/HcalTowerAlgo
>> Entering Package RecoLocalTracker/SiPixelClusterizer
>> Entering Package RecoTracker/LSTCore
>> Compile sequence completed for CMSSW CMSSW_15_0_X_2025-01-14-2300
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1
Command exited with non-zero status 1
	Command being timed: "scram build -k -j 32 COMPILER=llvm compile BUILD_LOG=yes"
	User time (seconds): 346.84
	System time (seconds): 46.46
	Percent of CPU this job got: 597%


@fwyzard fwyzard force-pushed the replace_ALPAKA_STATIC_ACC_MEM_GLOBAL branch from a3c73c3 to f3b9a73 Compare January 15, 2025 14:30
@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 15, 2025

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47108/43300

@cmsbuild
Copy link
Contributor

Pull request #47108 was updated. @jfernan2, @mandrenguyen can you please check and sign again.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47108/43305

@fwyzard fwyzard force-pushed the replace_ALPAKA_STATIC_ACC_MEM_GLOBAL branch from 2790723 to 5be4575 Compare January 15, 2025 15:45
@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 15, 2025

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47108/43307

@cmsbuild
Copy link
Contributor

Pull request #47108 was updated. @jfernan2, @mandrenguyen can you please check and sign again.

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f185e/43776/summary.html
COMMIT: 5be4575
CMSSW: CMSSW_15_0_X_2025-01-15-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47108/43776/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f185e/43776/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4f185e/43776/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3932183
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3932160
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53071
  • DQMHistoTests: Total failures: 872
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52199
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 547258c into cms-sw:master Jan 16, 2025
14 checks passed
@mandrenguyen
Copy link
Contributor

@fwyzard There's a unit test flagged as failed in today's 11AM IB. Is it a consequence of this PR?
https://cmssdt.cern.ch/SDT/cgi-bin/showBuildLogs.py/el8_amd64_gcc12/www/thu/15.0-thu-11/CMSSW_15_0_X_2025-01-16-1100?utests

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 16, 2025

This failure

===== Test "testHeterogeneousCoreAlpakaTestModulesCPU" ====
Failed to parse CPUID
cmsRun testAlpakaModules_cfg.py --accelerators=cpu --expectBackend=serial_sync
----- Begin Fatal Exception 16-Jan-2025 12:28:03 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_15_0_X_2025-01-16-1100/src/HeterogeneousCore/AlpakaTest/test/testAlpakaModules_cfg.py
Exception Message:
 unknown python problem occurred.
ImportError: cannot import name 'TestAlpakaVerifyObjectOnDevice_alpaka' from 'HeterogeneousCore.AlpakaTest.modules' (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02872/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-15-1100/cfipython/el8_amd64_gcc12/HeterogeneousCore/AlpakaTest/modules.py)

At:
  /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_15_0_X_2025-01-16-1100/src/HeterogeneousCore/AlpakaTest/test/testAlpakaModules_cfg.py(148): <module>

----- End Fatal Exception -------------------------------------------------
Failed cmsRun testAlpakaModules_cfg.py --accelerators=cpu --expectBackend=serial_sync: status 90

---> test testHeterogeneousCoreAlpakaTestModulesCPU had ERRORS
TestTime:1
^^^^ End Test testHeterogeneousCoreAlpakaTestModulesCPU ^^^^

seems to be related to finding or parsing the python configuration file, so it would be strange to be a consequence of this PR.

@fwyzard fwyzard deleted the replace_ALPAKA_STATIC_ACC_MEM_GLOBAL branch January 16, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants