-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cudf_udf failed in all spark release intermittently #2521
Comments
passed in todays' run. We will keep monitoring this |
The error started failing other tests, rapids_integration-dev spark-301 302, ID 186 vanilla spark standalone executor logs,
|
still randomly failed in different ENVs. |
Signed-off-by: Peixin Li <pxli@nyu.edu>
reopen. accidentally closed by #2539 |
It is an env issue. Seems cudf python is broken. Verified locally by importing the cudf Python lib in Python shell, meeting the same error. firestarman@firestarman-ubuntu18:~/work/projects/on_github/spark-rapids$ docker run --runtime=nvidia -it --name debug-cudf-test -v ~/.m2:/root/.m2 -v /usr/local/spark:/usr/local/spark ${docker-repo}/plugin:it-ubuntu18.04-cuda11.0-blossom-dev
root@27d721fdaf1b:/# conda list cudf
# packages in environment at /opt/conda:
#
# Name Version Build Channel
cudf 21.06.00a210530 cuda_11.0_py38_g0eeb0c9239_404 rapidsai-nightly
libcudf 21.06.00a210525 cuda11.0_g6dbf2d58d1_379 rapidsai-nightly
root@27d721fdaf1b:/# python
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/cudf/__init__.py", line 11, in <module>
from cudf import core, datasets, testing
File "/opt/conda/lib/python3.8/site-packages/cudf/core/__init__.py", line 3, in <module>
from cudf.core import _internals, buffer, column, column_accessor, common
File "/opt/conda/lib/python3.8/site-packages/cudf/core/_internals/__init__.py", line 3, in <module>
from cudf.core._internals.where import where
File "/opt/conda/lib/python3.8/site-packages/cudf/core/_internals/where.py", line 11, in <module>
from cudf.core.column import ColumnBase
File "/opt/conda/lib/python3.8/site-packages/cudf/core/column/__init__.py", line 3, in <module>
from cudf.core.column.categorical import CategoricalColumn
File "/opt/conda/lib/python3.8/site-packages/cudf/core/column/categorical.py", line 25, in <module>
from cudf import _lib as libcudf
File "/opt/conda/lib/python3.8/site-packages/cudf/_lib/__init__.py", line 4, in <module>
from . import (
ImportError: /opt/conda/lib/python3.8/site-packages/cudf/_lib/groupby.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN4cudf7groupby7groupby5shiftERKNS_10table_viewENS_9host_spanIKiLm18446744073709551615EEERKSt6vectorISt17reference_wrapperIKNS_6scalarEESaISC_EEPN3rmm2mr22device_memory_resourceE
>>>
|
filed rapidsai/cudf#8404 to track conda install versions mismatch issue |
Since it's not a code issue in spark-rapids, move it to 21.08 target. |
So, what's the next action here ? |
Yes, I think so. Let's wait for the fixing. |
looks like the version mismatching of 21.06 nightly did not happen again in recent 5 days. Going to re-enable cudf_udf tests |
This reverts commit 19bb201. Signed-off-by: Peixin Li <pxli@nyu.edu>
verified integration tests w/ new cudf-py on multiple databricks and standalone ENVs, worked as expected. close the issue for now. will reopen if happen again |
Signed-off-by: Peixin Li <pxli@nyu.edu>
* Revert "disable cudf_udf tests for NVIDIA#2521" This reverts commit 19bb201. Signed-off-by: Peixin Li <pxli@nyu.edu> * add minAllocFraction for nightly cudf_udf test
Signed-off-by: Peixin Li <pxli@nyu.edu>
* Revert "disable cudf_udf tests for NVIDIA#2521" This reverts commit 19bb201. Signed-off-by: Peixin Li <pxli@nyu.edu> * add minAllocFraction for nightly cudf_udf test
Describe the bug
rapids_databricks_nightly-dev, ID 19, 20
rapids_integration-dev spark-301 302, ID 186, 187
rapids_it-3.1.x-SNAPSHOT-dev spark-312-SNAPSHOT, ID 149, 150
this actually is not 100% reproducible, and it could fail in all envs
cudf_udf integration tests failed,
The text was updated successfully, but these errors were encountered: