-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent failures in fuzz_cases::join_fuzz::test_anti_join_1k_filtered
#11555
Comments
Could potentially be related to #11535 |
Note the test passes on re-run for PR #11527: Failure: https://github.com/apache/datafusion/actions/runs/10001535587/job/27645407724?pr=11527 |
I think the nature can be similar to https://github.com/apache/datafusion/pull/11041/files#r1648318160 I'll do a fix, thanks @alamb for reporting it |
🤔 it seems to have happened again on main right after #11604 was merged: https://github.com/apache/datafusion/actions/runs/10046115241/job/27764888311 |
I have disabled the test for now. I'll spend more time on investigation why this happens |
I'm still on it. It has a pretty tricky condition for cross buffered batches. UPD: I built a repro, working on solution |
I found the problem happens if for 1 stream row there are multiple matched buffered rows, but those buffered rows are in separate batches. In this case the datafusion SMJ reacts on the first batch without knowing the next one is coming. I'm still experimenting to find a solution even a hacky one |
the repro test case
|
Attached more accurate test case
|
well the problem is AntiJoin needs to wait for the very last right batch to read for the respective left row. I tried couple of options how to identify the very last right batch,
But each of them has its own false positives or false negatives. Perhaps we need a separate function or index to calculate the very last batch. @korowa do you have any other ideas on that, as you contributed a lot to SMJ, appreciate if you can help |
From what I remember -- doesn't SMJ already fetches buffered side until it meets the first key which is non-equal to the current streamed side value ( Or maybe I've mistunderstood the problem? |
Thanks @korowa that is something I'm also trying, I hope to make a PR soon adding you as a reviewer |
I think this local test may cover lots of cases
My initial thought was to do like :
But this approach has a flaw, namely So for (0, 30) there are 2 batches, 3 matched rows each. |
This can be closed |
Describe the bug
I have seen this test fail twice now on two unrelated PRs:
#11540: https://github.com/apache/datafusion/actions/runs/10011021548/job/27673684873?pr=11540
#11527: https://github.com/apache/datafusion/actions/runs/10001535587/job/27645407724?pr=11527
And
To Reproduce
Not sure -- it is happening on CI intermittently
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: