Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL tests for when sorting exceeded available memory and had to spill to disk #1573

Closed
Tracked by #1568
alamb opened this issue Jan 15, 2022 · 5 comments · Fixed by #1706
Closed
Tracked by #1568

SQL tests for when sorting exceeded available memory and had to spill to disk #1573

alamb opened this issue Jan 15, 2022 · 5 comments · Fixed by #1706
Labels
datafusion Changes in the datafusion crate

Comments

@alamb
Copy link
Contributor

alamb commented Jan 15, 2022

No description provided.

@alamb alamb changed the title Tests for the spilling (SQL level) SQL level tests for when sorting exceeded available memory and had to spill to disk Jan 16, 2022
@yjshen
Copy link
Member

yjshen commented Jan 20, 2022

Currently, #1596 has one SQL test on spill called test_sort_spill that spills during the test by controlling the total memory of the pool.

https://github.com/apache/arrow-datafusion/blob/6a458c67b5d11cf9dfa09281289c4d2da9484a2d/datafusion/src/physical_plan/sorts/sort.rs#L739-L765

However, since spill metrics are not exposed by sort yet, the current test cannot assert spill count > 0. Do you think it's appropriate if I add spill count as well as spill bytes into BaselineMetrics? I think these might be common metrics as we proceed with more memory consumers.

@alamb
Copy link
Contributor Author

alamb commented Jan 21, 2022

However, since spill metrics are not exposed by sort yet, the current test cannot assert spill count > 0. Do you think it's appropriate if I add spill count as well as spill bytes into BaselineMetrics? I think these might be common metrics as we proceed with more memory consumers.

I think adding those metrics to BaselineMetrics makes sense to me @yjshen

@alamb alamb changed the title SQL level tests for when sorting exceeded available memory and had to spill to disk SQL tests for when sorting exceeded available memory and had to spill to disk Jan 21, 2022
@yjshen
Copy link
Member

yjshen commented Jan 23, 2022

@alamb, I think this one can be closed. Solved in #1641.

@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2022

@yjshen I was thinking this ticket could track actually writing sql tests that spilled to disk (and verified both that they spilled as well as the correct answers).

Something like:

  1. Make a table with 10,000 rows but set memory limits to only allow 1K rows, and then validate the answer comes out
  2. A test harness that runs the same basic test, but checks various combinations of batch_size, and memory_limits to ensure the correct answer is produced

@yjshen
Copy link
Member

yjshen commented Jan 25, 2022

@alamb Ah, that makes sense. I will do it.

@alamb alamb added the datafusion Changes in the datafusion crate label Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants