Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Add over clause in read_sql percentile reads #3094

Merged
merged 6 commits into from
Oct 23, 2024
Merged

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Oct 21, 2024

Addresses: #3075

SQL server requires an OVER clause to be specified in percentile queries (because it's a window function). Read sql uses percentiles to determine partition bounds.

Adds AzureSqlEdge as a test database. Might as well since a lot of ppl use us to read sqlserver, and have had bugs with sql server. Kind of a pain to get it set up since it requires odbc and drivers etc. but it works. It's also not much of a hit on CI times, installing drivers takes around ~15s and the extra tests take around 5s.

Additionally made some modifications to some tests and pushdowns, left comments on the rationale.

@github-actions github-actions bot added the bug Something isn't working label Oct 21, 2024
Copy link

codspeed-hq bot commented Oct 21, 2024

CodSpeed Performance Report

Merging #3094 will not alter performance

Comparing colin/sql-over (86e1f62) with main (26d639b)

Summary

✅ 17 untouched benchmarks

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 78.64%. Comparing base (5795adc) to head (86e1f62).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
daft/sql/sql_scan.py 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3094      +/-   ##
==========================================
+ Coverage   78.35%   78.64%   +0.29%     
==========================================
  Files         611      616       +5     
  Lines       72517    73140     +623     
==========================================
+ Hits        56819    57522     +703     
+ Misses      15698    15618      -80     
Files with missing lines Coverage Δ
src/daft-dsl/src/expr/mod.rs 77.12% <ø> (+1.26%) ⬆️
daft/sql/sql_scan.py 25.56% <0.00%> (-0.20%) ⬇️

... and 43 files with indirect coverage changes

@@ -990,21 +990,9 @@ impl Expr {
to_sql_inner(inner, buffer)?;
write!(buffer, ") IS NOT NULL")
}
Expr::IfElse {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL server only accepts values in case when then clauses, so it was failing on the if_else pushdowns test, because we test with conditional expressions inside the if_else branches.

I think it's better to just remove it, the benefit of having if_else pushdowns in read_sql is probably not much at all, and i'd instead rather just avoid having this be a bug in the future.

Comment on lines +144 to +146
# Skip invalid comparisons for bool_col
if column == "bool_col" and operator not in ("=", "!="):
pytest.skip(f"Operator {operator} not valid for bool_col")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL server only accepts equality based comparisons on bools, e.g. bool_col = True or bool_col != True.

In fact, we shouldn't even need to test comparisons on bools other than = and != anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for the test file, not necessarily for this PR. But not-equal is typically <> in sql, so maybe we should test that too.

Copy link
Contributor

@desmondcheongzx desmondcheongzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM!

container_name: azuresqledge
environment:
ACCEPT_EULA: "Y"
MSSQL_SA_PASSWORD: "StrongPassword!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

Comment on lines +144 to +146
# Skip invalid comparisons for bool_col
if column == "bool_col" and operator not in ("=", "!="):
pytest.skip(f"Operator {operator} not valid for bool_col")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for the test file, not necessarily for this PR. But not-equal is typically <> in sql, so maybe we should test that too.

daft/sql/sql_scan.py Show resolved Hide resolved
@colin-ho colin-ho enabled auto-merge (squash) October 23, 2024 00:01
@colin-ho colin-ho merged commit 459ba82 into main Oct 23, 2024
40 checks passed
@colin-ho colin-ho deleted the colin/sql-over branch October 23, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants