Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenable and patch up HIP tests #711

Merged
merged 12 commits into from
Apr 12, 2023

Conversation

sethrj
Copy link
Member

@sethrj sethrj commented Apr 3, 2023

These should have been active for a long time.

@sethrj sethrj added documentation Documentation, examples, tests, and CI minor Minor internal changes or fixes labels Apr 3, 2023
@sethrj sethrj requested a review from amandalund April 3, 2023 14:01
Copy link
Contributor

@amandalund amandalund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@sethrj
Copy link
Member Author

sethrj commented Apr 3, 2023

Well this sucks, the HIP and CUDA codes are giving us different results for what should be identical device answers in the geo heuristic:

/var/jenkins/workspace/Celeritas_PR-711/test/celeritas/geo/HeuristicGeoTestBase.cc:93: Failure
Values in: avg_path
 Expected: this->reference_avg_path()
7 of 7 elements differ
by 0.001 relative error or 1.0000000000000001e-05 absolute error
i            EXPECTED           avg_path         Difference
0               58.02     56.656110952718 -0.0235072224626329
1               408.3    397.566004787006 -0.0262894812956022
2               274.9    264.307336236228 -0.0385327892461687
3               540.9    524.555623669406 -0.0302170019053313
4               496.3    481.338119354422 -0.0301468479661046
5                1134    1115.68753036875 -0.0161485622850543
6                1694    1647.72532660886 -0.0273168083772974

They're all a bit low, which is odd...

@sethrj
Copy link
Member Author

sethrj commented Apr 7, 2023

@amandalund Possibly related to #715: I cherry-picked the GeoParams HIP onto v0.1 and v0.2; it passes on the former but fails on the latter. Remember also that the ORANGE "unit vector" assertion failures happened fairly consistently on v0.2 but never on v0.1.

At the very least, we might be able to bisect to find out when the geometry answers changed...

@amandalund
Copy link
Contributor

Ah, that's a good data point... I agree, it would be useful to know when the HIP results changed, especially if it might be related to that other bug.

@amandalund
Copy link
Contributor

Not sure why yet, but it looks like with HIP OrangeTrackView::cross_boundary() sometimes fails to find the post-crossing volume, and then because it's a release build sets the volume to be outside... which explains why the average paths are always a little smaller.

@sethrj
Copy link
Member Author

sethrj commented Apr 11, 2023

OK, I git bisected and the first bad commit is #530...

$ git bisect start fd287ec35 ccc088df --
$ git bisect run ../git-bisect-run.sh

with the shell script based on the git bisect examples:

#!/bin/sh -x

BUILD=build-rocm-ndebug

if git cherry-pick --no-commit 1362c427 && ninja -C $BUILD
then
  GTEST_COLOR=1 \ctest --test-dir $BUILD -R celeritas/geo/Geometry -V
  STATUS=$?
else
  # Failed to build: return 'untestable'
  STATUS=125
fi

git reset --hard
exit $STATUS

@sethrj
Copy link
Member Author

sethrj commented Apr 11, 2023

OK, I'm digging into this now and will have an answer later this evening. Reverting the testem3 to the non-background version doesn't fix it, and removing the "background intersect" doesn't fix it...

@sethrj
Copy link
Member Author

sethrj commented Apr 11, 2023

These should be effectively equivalent but aren't... I'm thinking compiler bug?
83465ec

@amandalund
Copy link
Contributor

A well-placed printf fixes it too.. there's definitely some undefined behavior.

@sethrj sethrj changed the title Reenable unintentionally disabled tests Reenable and patch up HIP tests Apr 12, 2023
@sethrj sethrj merged commit da0010e into celeritas-project:develop Apr 12, 2023
@sethrj sethrj deleted the reenable-tests branch April 12, 2023 12:21
sethrj added a commit that referenced this pull request Apr 12, 2023
* Re-enable geometry test (see #517)
* Enable track init test on HIP
* Remove unused include
* Add ROCm build scripts for gilgamesh and omnia
sethrj added a commit that referenced this pull request Apr 12, 2023
* Re-enable test (see #517)
* Enable track init test on HIP
* Remove unused include
* Add ROCm build scripts for gilgamesh and omnia
* This inexplicably fixes HIP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation, examples, tests, and CI minor Minor internal changes or fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants