Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllogictets] Remove postgres container orchestration #5015

Merged
merged 12 commits into from
Jan 23, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 21, 2023

Which issue does this PR close?

Closes #5009

Rationale for this change

#4834 added a way to run sqllogictests using postgres 🦾 (kudos to @melgenek )

However, it currently orchestrates the postgres containers with rust test code which means running these tests requires docker to run the (full) datafusion test suit locally

What changes are included in this PR?

  1. Remove container orchestration logic
  2. Implement special support for COPY so it can be run from the postgres tests
  3. Postgres connection configuration is specified by PG_DSN="postgresql://postgres@127.0.0.1/postgres" environment
  4. Add documentation on how to run with existing postgres container
PG_COMPAT=true PG_DSN="postgresql://postgres@127.0.0.1/postgres" cargo test -p datafusion --test sqllogictests

Are these changes tested?

Yes, in CI

Are there any user-facing changes?

Not really -- only to other devs

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 21, 2023
@@ -232,9 +232,21 @@ jobs:
name: "Run sqllogictest with Postgres runner"
needs: [linux-build-lib]
runs-on: ubuntu-latest
services:
postgres:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this starts the container using the code from integration-tests (above in this file) rather than running docker from within the harness

@@ -1,21 +0,0 @@
CREATE TABLE aggregate_test_100_by_sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these statements are now run in the individual setup files

/// ```
///
/// And read the file locally.
async fn run_copy_command(&mut self, sql: &str) -> Result<DBOutput> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was quite pleased with this :bowtie: -- it allows the tests to run COPY FROM 'file'

@@ -15,6 +15,28 @@
# specific language governing permissions and limitations
# under the License.


statement ok
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by clean up -- this can be run directly in the .slt file

## Setup test for postgres
###

onlyif postgres
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the onlyif statement to run different setups for DataFusion and posgres

@alamb alamb changed the title [sqllogictets] Remove postgres container [sqllogictets] Remove postgres container orchestration Jan 21, 2023
@alamb alamb marked this pull request as ready for review January 21, 2023 18:58
@alamb alamb requested a review from xudong963 January 21, 2023 18:58
@alamb
Copy link
Contributor Author

alamb commented Jan 21, 2023

cc @melgenek

Copy link
Contributor

@melgenek melgenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of having setup per file. It is more clear where data comes from this way.

Removing a container-per-file approach ditches isolation for tests. I would suggest having a schema per test file.

debug!("Using posgres dsn: {dsn}");

let config = tokio_postgres::Config::from_str(&dsn)?;

let mut retry = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably get rid of the retry. It was sort of a hack because queries were executed immediately after a docker startup. And sometimes Postgres wasn't fully ready. Assuming that Postgres gets started in advance, you can assume that the connection succeeds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the options from the container setup (that I copied from @jimexist ) may help.

        ports:
          - 5432/tcp
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed retry in c41a0a8

datafusion/core/tests/sqllogictests/README.md Outdated Show resolved Hide resolved
`ORDER BY` will not match DataFusion and the tests will diff.

```shell
docker run \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docker run \
docker run --rm \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think --rm removes the container. I normally prefer to leave them around for debugging


onlyif postgres
statement ok
DROP TABLE IF EXISTS aggregate_test_100_by_sql;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping a table seems like a hack to the fact that Postgres database is reused for multiple test files.
It would be nice to introduce isolation by, for example, having a schema per file. The schema could be either random or based on the test file name.

I drafted a very basic implementation of the approach. melgenek@3394c87

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- this is a good point @melgenek

At the very least the tests should clean up after themselves (as in I should put a drop_table at the end). I like the idea of a separate schema, however. I will try and add that shortly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about the relevance of schemas and isolation.

If Datafusion's test suite ever becomes big enough, you would be able to parallelize it. And schemas would help run multiple files in parallel without interference.

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks andrew

@xudong963 xudong963 merged commit 930c8de into apache:master Jan 23, 2023
@ursabot
Copy link

ursabot commented Jan 23, 2023

Benchmark runs are scheduled for baseline = 25ab1f9 and contender = 930c8de. 930c8de is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb deleted the alamb/external_postgres branch July 26, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[sqllogictest] Don't orchestrate the postgres containers with rust / docker
4 participants