-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect query result in vector runtime aggregation on multi-step generated CSUP #5679
Comments
Curiously, the problem doesn't show up if I start from the original hits Parquet (instead of the CSV) and convert that directly to CSUP and query that in vector runtime.
Comparing the types in the two Parquet files, we do see some differences, but only in the time-related fields. Not any of the fields referenced in the query.
|
In attempting to verify the fix merged from linked PR #5690, I see we've changed the result when this query is executed in vector runtime against CSUP to show a row of
|
This commit fixes an issue with filters in the vector runtime where boolean vectors with true values that were also null were not getting filtered. Closes #5679
This commit fixes an issue with filters in the vector runtime where boolean vectors where null values that were also set to true were not getting filtered as they should. Closes #5679
This commit fixes an issue with filters in the vector runtime where boolean vectors where null values that were also set to true were not getting filtered as they should. Closes #5679
tl;dr
Given the
WHERE
clause, the top line of output in the following aggregation result is incorrect.Details
Repro is with super commit 2f1a964. This is the ClickBench q12 query.
For reasons unrelated to this particular issue (but that ended up surfacing this issue anyway) I happened to be generating my CSUP test data from the original hits CSV in multiple steps:
ddl_duckdb.sql
Here I'll show those steps while issuing a query along the way to see the presumed correct result when querying the DuckDB table.
At this point if we issue the same query we did in DuckDB in sequential runtime against the CSUP file, our result matches the one from DuckDB.
However, once we repeat it in vector runtime, now we get that first row that includes a count for
SearchPhrase:""
which should have been excluded per theWHERE
clause.The text was updated successfully, but these errors were encountered: