Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test error: Distributed call failed in min-dep-os #6696

Open
mingxin-zheng opened this issue Jul 4, 2023 · 3 comments
Open

Test error: Distributed call failed in min-dep-os #6696

mingxin-zheng opened this issue Jul 4, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@mingxin-zheng
Copy link
Contributor

Describe the bug

/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader PILReader is not installed, or the version doesn't match requirement.
Traceback (most recent call last):
  warnings.warn(
  File "/Users/runner/work/MONAI/MONAI/tests/utils.py", line 541, in _wrapper
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader ITKReader is not installed, or the version doesn't match requirement.
  warnings.warn(
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader NrrdReader is not installed, or the version doesn't match requirement.
  warnings.warn(
/Users/runner/work/MONAI/MONAI/monai/transforms/io/array.py:213: UserWarning: required package for reader PydicomReader is not installed, or the version doesn't match requirement.
  warnings.warn(
/Users/runner/work/MONAI/MONAI/monai/transforms/utils.py:561: UserWarning: Num foregrounds 27, Num backgrounds 0, unable to generate class balanced samples, setting `pos_ratio` to 1.
  warnings.warn(
    assert results.get(), "Distributed call failed."
AssertionError: Distributed call failed.

To Reproduce

https://github.com/Project-MONAI/MONAI/actions/runs/5455742504/jobs/9927617836?pr=6623

Expected behavior

The test should pass.

Add any other context about the problem here.

@wyli
Copy link
Contributor

wyli commented Jul 4, 2023

root cause seems to be the github ci runner

test_even (tests.test_sampler_dist.DistributedSamplerTest) ... ok
Process SpawnProcess-80:
Traceback (most recent call last):
  File "/Users/runner/work/MONAI/MONAI/tests/utils.py", line 505, in run_process
    raise e
  File "/Users/runner/work/MONAI/MONAI/tests/utils.py", line 489, in run_process
    dist.init_process_group(
  File "/Users/runner/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
  File "/Users/runner/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1009, in _new_process_group_helper
    backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout)
RuntimeError: [enforce fail at /Users/runner/work/pytorch/pytorch/pytorch/third_party/gloo/gloo/transport/uv/device.cc:153] rp != nullptr. Unable to find address for: Mac-1688480011779.local

@wyli wyli added the bug Something isn't working label Jul 4, 2023
@mingxin-zheng
Copy link
Contributor Author

Should we have any next steps?

@wyli
Copy link
Contributor

wyli commented Jul 5, 2023

Let's keep this open, currently in most cases manually rerunning the pipelines clears the error. if it's becoming frequent we can remove the multiprocess tests on macos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants