-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: weird behavior with collect continuity #648
Comments
I agree it's weird. I'm somewhat suspicious that it may be caused by the multiple merges. Some thoughts / questions:
|
This issue arose from attempting to hack a fix for trailing windows by shifting the input forward, merging that with the original, feeding that to the collect, then filtering on original non null input. The above plan shows the query with hack:
And the weird behavior we see is that collect in operation 2 is correctly clearing the buffers and producing an empty, non-null list. That output is sent to a select, then to the final merge (operation4), where it is correctly latching values. However, the select after the collect is filtering on the is_valid(input) . And because the input was null for several rows, we’re not sending the empty list rows to the final merge , so it’s not latching the correct new state (an empty list). This behavior is strange in a sense because at first glance it may seem like adding a filter after any aggregation would cause it to not obey interpolation rules further downstream. However, the reason this is okay and expected is that Thus, given the (pseudocode) example:
|
This is a behavior that is likely correct, but possible to have subtle and meaningful impact on complex queries. We should discuss ways to alleviate this risk |
Description
Test that splits up each index because we can’t print structs in the csv results:
But the problem is that now that collect is as-of/continuous , we’re storing the last non-null value for each individual index.
It may be “technically correct” based on our continuity rules. See the compute plan — each b field ref is going to a separate select and merge because of the is_valid, so each merge is keeping the latched state.
Problem:
Specifically the two lines:
should have nulls for the last 3 columns, I'd expect. The problem is that the last non-null values for each are being saved, as if it were pushed to a
last
(which, it almost technically is, since it's going in latched spread).The text was updated successfully, but these errors were encountered: