-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently #4060
Comments
@andygrove can you help take a look? thanks~ |
@pxLi How can I confirm that the cudf jar in this run included rapidsai/cudf#9537 which should be the fix for this issue? |
I just tested with Databricks 7.3 and I cannot reproduce the issue.
|
the commit info of cudf jar can be simply fetched as we also see failures today in but it passed in rapids_databricks_nightly-dev-github today, looks like the test result is non-deterministic. we may not always reproduce the error here I checked the cudf jar in these failed builds, it is based on 3280be2 which should include the #9537 |
The issue appears to be that the last bucket in the t-digest data is corrupted (intermittently). Here are the values for the last 10 buckets in the t-digest when the test is passing: mean
weight
Here are the last 10 values when the test is failing. The last entry for both mean
weight
|
If I'm not mistaken, the last few entries in the
Aren't these sorted in increasing value of mean? |
|
relates to #4060 skip some of the tests that intermittently fail in 21.12 to make sure they don't affect CI and release. Signed-off-by: Thomas Graves <tgraves@nvidia.com>
`detail::segmented_gather()` inadvertently uses `cuda_default_stream` in some parts of its implementation, while using the user-specified stream in others. This applies to the calls to `copy_range_in_place()`, `allocate_like()`, and `make_lists_column()`. ~This might produce race conditions, which might explain NVIDIA/spark-rapids/issues/4060. It's a rare failure that's quite hard to reproduce.~ This might lead to over-synchronization, though bad output is unlikely. The commit here should sort this out, by switching to the `detail` APIs corresponding to the calls above. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #9679
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Describe the bug
related to #3770
rapids_databricks_nightly-dev-github build ID 212
The text was updated successfully, but these errors were encountered: