Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllogictest] Read subdirectories in test_files #5033

Merged
merged 6 commits into from
Jan 25, 2023

Conversation

melgenek
Copy link
Contributor

Which issue does this PR close?

Closes #4709.

Rationale for this change

Tests could be categorized into folders.

What changes are included in this PR?

Sqllogictest reads test_files directory recursively.

Are these changes tested?

Github actions.

Are there any user-facing changes?

No.

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 23, 2023
)
}

fn read_dir_recursive<P: AsRef<Path>>(path: P) -> Box<dyn Iterator<Item = PathBuf>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WalkDir crate seemed like an overkill because there is no need for link resolution and similar features, but only recursive traversing.
Of course, WalkDir can be thrown in here instead of the custom code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree walkdir is not needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to avoid the Box using something like

Suggested change
fn read_dir_recursive<P: AsRef<Path>>(path: P) -> Box<dyn Iterator<Item = PathBuf>> {
fn read_dir_recursive<P: AsRef<Path>>(path: P) -> impl Iterator<Item = PathBuf> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that the resolved type for the whole implementation inside this function is FlatMap. But inside flat_map there is also std::iter::once for the case when there is only one file. And then rust cannot resolve a concrete implementation based on these options.

It is probably possible to apply your suggestion, but my rust skills were not good enough.

Copy link
Contributor

@alamb alamb Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think https://docs.rs/itertools/latest/itertools/enum.Either.html# is one way to handle this. However, I don't think it is important to change

.map(|path| {
(
path.clone(),
path.to_string_lossy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done to output not only names but relative paths. For example, a query from the pg_compat dir would be printed this way.

[pg_compat/pg_compat_union.slt] Running query: "DROP TABLE aggregate_test_100_by_sql"

)
})
.filter(|(_, file_name)| options.check_test_file(file_name))
.filter(|(path, _)| options.check_pg_compat_file(path.as_path())),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres file filter doesn't account for folder names, but it can. For example, instead of expecting a pg_compat prefix on the file name, the runner could check this prefix on a relative file path.

@melgenek melgenek marked this pull request as ready for review January 23, 2023 21:54
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me -- thank you @melgenek

fn schema_name(relative_path: &Path) -> &str {
relative_path
.file_name()
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I think it is nice to avoid the unwrap() which panic's but in this case it would happen if the filenames were non utf8 -- so I think it is ok here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic to only keep Ascii characters 93f4ffd. I had issues with having numbers in postgres schema, so Ascii chars should be the safest option.

)
}

fn read_dir_recursive<P: AsRef<Path>>(path: P) -> Box<dyn Iterator<Item = PathBuf>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree walkdir is not needed

)
}

fn read_dir_recursive<P: AsRef<Path>>(path: P) -> Box<dyn Iterator<Item = PathBuf>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to avoid the Box using something like

Suggested change
fn read_dir_recursive<P: AsRef<Path>>(path: P) -> Box<dyn Iterator<Item = PathBuf>> {
fn read_dir_recursive<P: AsRef<Path>>(path: P) -> impl Iterator<Item = PathBuf> {

@alamb
Copy link
Contributor

alamb commented Jan 24, 2023

FYI any "no space left on device" CI errors should be fixed by merging up from master to get #5047

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks again @melgenek

@alamb alamb merged commit 2aa1490 into apache:master Jan 25, 2023
@ursabot
Copy link

ursabot commented Jan 25, 2023

Benchmark runs are scheduled for baseline = b5d90b6 and contender = 2aa1490. 2aa1490 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@melgenek melgenek deleted the 4709-recursive-directory-tests branch January 25, 2023 11:33
@Tracyca209
Copy link

Which issue does this PR close?

Closes #4709.

Rationale for this change

Tests could be categorized into folders.

What changes are included in this PR?

Sqllogictest reads test_files directory recursively.

Are these changes tested?

Github actions.

Are there any user-facing changes?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support sub directories in sqllogictest runner
4 participants