Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures with explicit comms #549

Closed
jakirkham opened this issue Mar 9, 2021 · 3 comments · Fixed by #555 · May be fixed by dask/distributed#4575
Closed

Test failures with explicit comms #549

jakirkham opened this issue Mar 9, 2021 · 3 comments · Fixed by #555 · May be fixed by dask/distributed#4575

Comments

@jakirkham
Copy link
Member

In PR ( #546 ), we noticed some errors cropping up recently. Not sure exactly the cause, but they may be related to PR ( dask/distributed#4531 ). Copying some more details about what was observed in the log below

Seeing this in the log

10:58:40   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/ucp/core.py", line 628, in recv
10:58:40     ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
10:58:40 ucp.exceptions.UCXMsgTruncated: <[Recv #112] ep: 0x7f0e25cde0d8, tag: 0xfa85496d273cdec2, nbytes: 260, type: <class 'numpy.ndarray'>>: length mismatch: 16 (got) != 260 (expected)

Also seeing this

11:05:40   File "/var/lib/jenkins/workspace/rapidsai/gpuci/dask-cuda/prb/dask-cuda-gpu-test/CUDA/10.1/GPU_LABEL/gpu-t4||gpu/OS/ubuntu16.04/PYTHON/3.7/dask_cuda/explicit_comms/dataframe/shuffle.py", line 196, in local_shuffle
11:05:40     out_parts[i] = None
11:05:40 TypeError: 'tuple' object does not support item assignment
11:05:40 FAILED

cc @rjzamora @madsbk

@jakirkham
Copy link
Member Author

I think this comes down to the fact that PR ( dask/distributed#4531 ) always encodes lists and tuples to tuples, which is an older issue with MsgPack ( dask/distributed#3716 ). For explicit comms, we could start coercing tuples to lists when needed. Though we could also look at fixing this in Distributed by encoding lists specially in MsgPack

@jakirkham
Copy link
Member Author

Addressing with PR ( dask/distributed#4575 )

madsbk added a commit to madsbk/dask-cuda that referenced this issue Mar 23, 2021
madsbk added a commit to madsbk/dask-cuda that referenced this issue Mar 23, 2021
@rapids-bot rapids-bot bot closed this as completed in #555 Mar 24, 2021
rapids-bot bot pushed a commit that referenced this issue Mar 24, 2021
Fixes #549 by converting received tuples to lists.

Depend on dask/distributed#4621, which fixes an unrelated bug also triggered by our explicit-comms tests.

Authors:
  - Mads R. B. Kristensen (@madsbk)

Approvers:
  - Peter Andreas Entschev (@pentschev)

URL: #555
@mrocklin
Copy link
Contributor

An alternative here would be to not mutate the input list, and instead make a copy or make a new list. If this does not impact performance this might be a good habit to get into regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants