Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing TF 2.15 without GPU #9393

Closed
wants to merge 4 commits into from
Closed

Conversation

smuzaffar
Copy link
Contributor

No description provided.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

A new Pull Request was created by @smuzaffar for branch IB/CMSSW_14_2_X/master.

@aandvalenzuela, @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

cms-bot internal usage

@smuzaffar
Copy link
Contributor Author

enable gpu

@smuzaffar
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2024

-1

Failed Tests: GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0ca98b/41303/summary.html
COMMIT: 7fd2ef7
CMSSW: CMSSW_14_2_X_2024-09-04-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9393/41303/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0ca98b/41303/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0ca98b/41303/git-merge-result

GPU Unit Tests

I found 2 errors in the following unit tests:

---> test testBrokenLineFitGPU_t had ERRORS
---> test testFitsGPU_t had ERRORS

Comparison Summary

Summary:

  • You potentially added 12 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 2167 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3328462
  • DQMHistoTests: Total failures: 30340
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3298102
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 9.165999999999997 KiB( 43 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): 0.709 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 13234.0,... ): 0.467 KiB Physics/NanoAODDQM
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 6
  • DQMHistoTests: Total histograms compared: 37044
  • DQMHistoTests: Total failures: 18
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 37026
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 5 files compared)
  • Checked 20 log files, 25 edm output root files, 6 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor Author

@fwyzard , can you please check why two unit tests [a] fail for CMSSW_14_2_TF_X IBs when run gpu enable node. I think out patch to eigen needs more changes.

You can login to lxplus-gpu to reproduce this failure

ssh lxplus-gpu
cmssw-el8 --nv
scram p CMSSW_14_2_TF_X_2024-09-06-2300
cd CMSSW_14_2_TF_X_2024-09-06-2300
cmsenv
git cms-addpkg RecoTracker/PixelTrackFitting
scram b -j 10
scram b runtests_testFitsGPU_t
# Fail    3s ... RecoTracker/PixelTrackFitting/testFitsGPU_t

[a]

---> test testBrokenLineFitGPU_t had ERRORS
---> test testFitsGPU_t had ERRORS

@smuzaffar smuzaffar closed this Sep 16, 2024
@smuzaffar smuzaffar deleted the 142x-tf215-wogpu branch September 18, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants