Skip to content

Improve regeneration of sqlite expected test suite #14906

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Thanks to some great work from @Omega359 as part of each commit to main DataFusion runs many thousand queries from the sqlite test suite ❤

This is documented here:

## Running Tests: `sqlite`
Test files in `data/sqlite` directory of the datafusion-testing crate were
sourced from the [sqlite test suite](https://www.sqlite.org/sqllogictest/dir?ci=tip) and have been cleansed and updated to
run within DataFusion's sqllogictest runner.
To run the sqlite tests you need to increase the rust stack size and add
`INCLUDE_SQLITE=true` to run the sqlite tests:
```shell
export RUST_MIN_STACK=30485760;
INCLUDE_SQLITE=true cargo test --test sqllogictests
```
Note that there are well over 5 million queries in these tests and running the
sqlite tests will take a long time. You may wish to run them in release-nonlto mode:
```shell
INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
```
The sqlite tests can also be run with the postgres runner to verify compatibility:
```shell
export RUST_MIN_STACK=30485760;
PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres --test sqllogictests
```
To update the sqllite expected answers use the `datafusion/sqllogictest/regenerate_sqlite_files.sh` script.
Note this must be run with an empty postgres instance. For example
```shell
PG_URI=postgresql://postgres@localhost:5432/postgres bash datafusion/sqllogictest/regenerate_sqlite_files.sh
```

When expected output changes (for example, error messages) we currently use a script https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/regenerate_sqlite_files.sh that:

  1. It relies on a fork of sqlogictest
  2. It relies on a modified driver program

This is problematic because

  1. As the code in datafusion is updated (for example to new sqlogictest versions) the modified driver program may not work with the new sqlogictest version
  2. The modified driver program may not work with new versions of sqllogictest

This happened with #14824 which made it hard to update the expected output

Describe the solution you'd like

I would like to make sure that regenerate_sqlite_files.sh will always work and will not bitrot over time

@Omega359 says:

That is exactly what I was thinking and hopefully will fix tonight. I think a decent short-term fix is to 'lock' the sqllogictest-rs dependency version and add a comment that any update to it will require a full run of the regenerate script before committing.

Long term ideally would be to improve my changes to my fork of sqllogictest-rs such that they would be suitable to submit a PR to that project. That is not an insignificant amount of work to be honest and I'm a bit thin on time for the next month or two.

Describe alternatives you've considered

No response

Additional context

See last time we had to update the scripts based on changes:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions