-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs/1627 bug test vmap fails on multi node runs on hardware accelerators #1738
base: main
Are you sure you want to change the base?
Bugs/1627 bug test vmap fails on multi node runs on hardware accelerators #1738
Conversation
Thank you for the PR! |
seems to be even more strange: the GitHub runner also only achieves tol=1e-4 on CPU for the respective test |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1738 +/- ##
=======================================
Coverage 92.26% 92.26%
=======================================
Files 84 84
Lines 12447 12447
=======================================
Hits 11484 11484
Misses 963 963
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
for more information, see https://pre-commit.ci
…_runs_on_hardware_accelerators
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
…_runs_on_hardware_accelerators
Thank you for the PR! |
…_runs_on_hardware_accelerators
Thank you for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some tests, tried with extra configurations on CPU and GPU. I could not find why the error is so large when running on GPU. Test pass with lower tolerance when using float64.
Here is how I tested the multiple configurations, I would recommend rewriting the test like this.
def test_vmap_with_chunks(self):
x1_splits = [None, 1]
chunk_sizes = list(range(1,5))
dtypes = [ht.float32, ht.float64]
for x1_split in x1_splits:
for cs in chunk_sizes:
for dtype in dtypes:
with self.subTest(x1_split=x1_split, chunk_size=cs, dtype=dtype):
# same as before but now with prescribed chunk sizes for the vmap
x0 = ht.random.randn(5 * ht.MPI_WORLD.size, 10, 10, split=0, dtype=dtype)
x1 = ht.random.randn(10, 5 * ht.MPI_WORLD.size, split=x1_split, dtype=dtype)
out_dims = (0, 0)
def func(x0, x1, k=2, scale=1e-2):
return torch.topk(torch.linalg.svdvals(x0), k)[0] ** 2, scale * x0 @ x1
vfunc = ht.vmap(func, out_dims, chunk_size=cs)
y0, y1 = vfunc(x0, x1, k=2, scale=-2.2)
# compare with torch
x0_torch = x0.resplit(None).larray
x1_torch = x1.resplit(None).larray
vfunc_torch = torch.vmap(func, (0, x1_split), out_dims)
y0_torch, y1_torch = vfunc_torch(x0_torch, x1_torch, k=2, scale=-2.2)
self.assertTrue(torch.allclose(y0.resplit(None).larray, y0_torch))
tol = 1e-12 if dtype == ht.float64 else 1e-4
self.assertTrue(torch.allclose(y1.resplit(None).larray, y1_torch, atol=tol))
Due Diligence
benchmarks: created for new functionalitybenchmarks: performance improved or maintaineddocumentation updated where neededDescription
Issue/s resolved: #1627
Changes proposed:
default accuracy in allclose is to high for one test on GPUs and some CPUs
Type of change
decrease accuracy (to tolerance 1e-4, which is still small enough to exclude passing of the test by lucky chance)