Run `datastore` tests against multiple schemas #1277

tgeoghegan · 2023-04-19T17:36:37Z

We want to prove in tests that some Janus version can run safely on multiple schema versions to make database schema migrations safer. In this commit:

aggregator_core::test_util::EphemeralDatastore can now be constructed with a max_schema_version argument to control which migration scripts are applied during tests
We adopt rstest to inject an EphemeralDatastore instance into tests in aggregator_core::datastore::Datatore::tests. Using rstest_reuse, we can automatically stamp out versions of existing tests that run using multiple schema versions.

We only use this dependency injection technique in the datastore module, because that should be the only module that's tightly coupled to the database schema. We could run all tests that use a datastore this way, at the cost of increasing overall test runtime.

tgeoghegan · 2023-04-19T17:40:11Z

Some additional notes here:

For now I've only applied this to three tests in datastore.rs, because I want to illustrate how this works so we can discuss it without all the noise of adding a #[future] #[case] ephemeral_datastore: EphemeralDatastore argument to every test in the module. I'll do that if we agree to move forward with this strategy, though. Most recent commit sets this up for all the tests in datastore.rs.
We (should) observe that the test datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 fails, because indeed the current Janus code doesn't work on a schema that doesn't define column report_aggregators.last_prep_step, so I think that shows the tests working as expected: we've learned that we need to drop schema version 20230405185602 from the supported versions list.

tgeoghegan · 2023-04-19T18:03:42Z

aggregator_core/src/datastore.rs

+    #[rstest]
+    // The version numbers in these cases must match SUPPORTED_SCHEMA_VERSIONS
+    #[case::version_20230405185602(ephemeral_datastore_max_schema_version(20230405185602))]
+    #[case::version_20230417204528(ephemeral_datastore_max_schema_version(20230417204528))]


Ideally this list of cases would be automatically stamped out from SUPPORTED_SCHEMA_VERSIONS, but I am not sure how to do that gracefully.

As declarative macros cannot produce attributes, we'd have to write our own procedural macro in order to do that. We can either do so as a progressive enhancement later, or just make coupled changes to the template and SUPPORTED_SCHEMA_VERSIONS by hand.

I think the main value of rstest is that we run each datastore test on each supported schema version. However, I don't like that we have to remember to manually add a case each time we add a new schema version -- we should have one canonical list of supported schemas, and that is SUPPORTED_SCHEMAS. Currently, if I add a version to SUPPORTED_SCHEMAS but forget to add a case here, the tests would pass even if the new schema versions was broken; I (or a reviewer) have to manually notice that I'm missing a case. IMO, that's a blocker.

A few potential solutions come to mind:

I assume something like #[values(SUPPORTED_SCHEMAS)] doesn't work, but if so, that's easy enough (docs).

We could write a macro to generate the necessary #[case] attributes -- based on this StackOverflow question it looks like generating attributes from a macro is possible.

If the above aren't feasible/easy, another approach would be to avoid using rstest at all. We could write a "wrapper function" like run_with_ephemeral_datastore which would take a Fn(EphemeralDatastore), then call that function in a loop with an ephemeral datastore at the proper version. Every place that currently uses the template would be changed to call this function instead, wrapping the current test body. This would work, but wouldn't separate different versions of the same test into separate cases or provide parallelization of tests across versions, and less importantly would induce an additional level of indentation in most test bodies.

I would say writing a quick macro is probably the best approach, assuming there's no syntax that would allow SUPPORTED_VERSIONS to be used directly.

That inspires one way we could do this with a declarative macro: we could define and then use a macro that takes multiple schema versions as arguments, and then emits two top-level items: the const declaration of SUPPORTED_SCHEMA_VERSIONS, and the entire test template function, decorated with one #[case] attribute macro per macro argument.

That's a great idea. I'll give the macro a try this afternoon.

The macro was pretty easy to write (I did in the latest pushed commit). The downside is that because you can't construct identifiers with macro variables, we can't use rstest's case::label syntax to interpolate the schema version into test names.

tgeoghegan · 2023-04-19T18:34:06Z

Sample test output:

running 48 tests
test datastore::tests::aggregation_job_not_found ... ok
test datastore::tests::aggregation_job_acquire_release ... ok
test datastore::tests::check_report_aggregation_exists ... ok
test datastore::tests::collection_job_acquire_job_max ... ok
test datastore::tests::collection_job_acquire_no_aggregation_job_with_agg_param ... ok
test datastore::tests::collection_job_acquire_no_aggregation_job_with_task_id ... ok
test datastore::tests::collection_job_acquire_release_aggregation_job_in_progress ... ok
test datastore::tests::collection_job_acquire_release_job_finished ... ok
test datastore::tests::collection_job_acquire_report_shares_outside_interval ... ok
test datastore::tests::count_client_reports_for_batch_id ... ok
test datastore::tests::count_client_reports_for_interval ... ok
test datastore::tests::crypter ... ok
test datastore::tests::collection_job_acquire_state_filtering ... ok
test datastore::tests::delete_expired_client_reports ... ok
test datastore::tests::delete_expired_aggregation_artifacts ... ok
test datastore::tests::delete_expired_collection_artifacts ... ok
test datastore::tests::get_aggregation_jobs_for_task ... ok
test datastore::tests::get_collection_job ... ok
test datastore::tests::get_report_aggregations_for_aggregation_job ... ok
test datastore::tests::get_task_ids ... ok
test datastore::tests::get_task_metrics ... ok
test datastore::tests::get_unaggregated_client_report_ids_for_task ... ok
test datastore::tests::fixed_size_collection_job_acquire_release_happy_path ... ok
test datastore::tests::get_unaggregated_client_report_ids_with_agg_param_for_task ... ok
test datastore::tests::report_aggregation_not_found ... ok
test datastore::tests::report_not_found ... ok
test datastore::tests::roundtrip_aggregate_share_job ... ok
test datastore::tests::roundtrip_aggregation_job ... ok
test datastore::tests::roundtrip_batch_aggregation_fixed_size ... ok
test datastore::tests::roundtrip_batch_aggregation_time_interval ... ok
test datastore::tests::roundtrip_interval_sql ... ok
test datastore::tests::roundtrip_outstanding_batch ... ok
test datastore::tests::roundtrip_report ... ok
test datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 ... FAILED
test datastore::tests::roundtrip_report_aggregation::case_2_version_20230417204528 ... ok
test datastore::tests::roundtrip_report_share ... ok
test datastore::tests::roundtrip_task::case_1_version_20230405185602 ... ok
test datastore::tests::roundtrip_task::case_2_version_20230417204528 ... ok
test datastore::tests::update_collection_jobs ... ok
test query_type::tests::validate_collect_identifier ... ok
test task::tests::aggregator_endpoints_end_in_slash ... ok
test task::tests::aggregator_request_paths ... ok
test task::tests::collector_auth_tokens ... ok
test task::tests::deserialize_docs_sample_tasks ... ok
test task::tests::reject_invalid_auth_tokens ... ok
test task::tests::task_serde ... ok
test task::tests::task_serialization ... ok
test datastore::tests::time_interval_collection_job_acquire_release_happy_path ... ok

error: test failed, to rerun pass `-p janus_aggregator_core --lib`
failures:

---- datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 stdout ----
  2023-04-19T18:25:59.505694Z ERROR janus_aggregator_core::datastore: error: DB error: db error: ERROR: column "last_prep_step" of relation "report_aggregations" does not exist
    at aggregator_core/src/datastore.rs:2036 on ThreadId(108)

thread 'datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602' panicked at 'called `Result::unwrap()` on an `Err` value: Db(Error { kind: Db, cause: Some(DbError { severity: "ERROR", parsed_severity: Some(Error), code: SqlState(E42703), message: "column \"last_prep_step\" of relation \"report_aggregations\" does not exist", detail: None, hint: None, position: Some(Original(169)), where_: None, schema: None, table: None, column: None, datatype: None, constraint: None, file: Some("parse_target.c"), line: Some(1061), routine: Some("checkInsertTargets") }) })', aggregator_core/src/datastore.rs:7366:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602

test result: FAILED. 47 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 16.42s

Note we get distinct test cases per schema version:

test datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 ... FAILED
test datastore::tests::roundtrip_report_aggregation::case_2_version_20230417204528 ... ok
<...>
test datastore::tests::roundtrip_task::case_1_version_20230405185602 ... ok
test datastore::tests::roundtrip_task::case_2_version_20230417204528 ... ok

I like this because I believe it means cargo can run these in parallel and also a test failure with one schema version doesn't stop tests on other schema versions from running.

divergentdave · 2023-04-24T17:54:59Z

aggregator_core/src/lib.rs

+// We must import `rstest_reuse` at the top of the crate
+// https://docs.rs/rstest_reuse/0.5.0/rstest_reuse/#use-rstest_reuse-at-the-top-of-your-crate
+#[cfg(test)]
+use rstest_reuse;


This is unusual, but we can live with it. They must be using ::crate::rstest_reuse in the macro expansion, expecting to hit this private imported name.

divergentdave · 2023-04-24T17:55:14Z

aggregator_core/src/datastore.rs

@@ -5439,6 +5441,8 @@ mod tests {
        vdaf::prio3::{Prio3, Prio3Count},
    };
    use rand::{distributions::Standard, random, thread_rng, Rng};
+    use rstest::rstest;
+    //use rstest_reuse::{self, template};


This can be deleted.

divergentdave · 2023-04-24T18:08:04Z

aggregator_core/src/datastore.rs

+    #[rstest]
+    // The version numbers in these cases must match SUPPORTED_SCHEMA_VERSIONS
+    #[case::version_20230405185602(ephemeral_datastore_max_schema_version(20230405185602))]
+    #[case::version_20230417204528(ephemeral_datastore_max_schema_version(20230417204528))]


As declarative macros cannot produce attributes, we'd have to write our own procedural macro in order to do that. We can either do so as a progressive enhancement later, or just make coupled changes to the template and SUPPORTED_SCHEMA_VERSIONS by hand.

branlwyd · 2023-04-24T17:51:40Z

aggregator_core/src/datastore.rs

@@ -5439,6 +5441,8 @@ mod tests {
        vdaf::prio3::{Prio3, Prio3Count},
    };
    use rand::{distributions::Standard, random, thread_rng, Rng};
+    use rstest::rstest;
+    //use rstest_reuse::{self, template};


nit: commented code

branlwyd · 2023-04-24T18:18:53Z

aggregator_core/src/datastore.rs

+    #[rstest]
+    // The version numbers in these cases must match SUPPORTED_SCHEMA_VERSIONS
+    #[case::version_20230405185602(ephemeral_datastore_max_schema_version(20230405185602))]
+    #[case::version_20230417204528(ephemeral_datastore_max_schema_version(20230417204528))]


I think the main value of rstest is that we run each datastore test on each supported schema version. However, I don't like that we have to remember to manually add a case each time we add a new schema version -- we should have one canonical list of supported schemas, and that is SUPPORTED_SCHEMAS. Currently, if I add a version to SUPPORTED_SCHEMAS but forget to add a case here, the tests would pass even if the new schema versions was broken; I (or a reviewer) have to manually notice that I'm missing a case. IMO, that's a blocker.

A few potential solutions come to mind:

I assume something like #[values(SUPPORTED_SCHEMAS)] doesn't work, but if so, that's easy enough (docs).

We could write a macro to generate the necessary #[case] attributes -- based on this StackOverflow question it looks like generating attributes from a macro is possible.

If the above aren't feasible/easy, another approach would be to avoid using rstest at all. We could write a "wrapper function" like run_with_ephemeral_datastore which would take a Fn(EphemeralDatastore), then call that function in a loop with an ephemeral datastore at the proper version. Every place that currently uses the template would be changed to call this function instead, wrapping the current test body. This would work, but wouldn't separate different versions of the same test into separate cases or provide parallelization of tests across versions, and less importantly would induce an additional level of indentation in most test bodies.

I would say writing a quick macro is probably the best approach, assuming there's no syntax that would allow SUPPORTED_VERSIONS to be used directly.

branlwyd · 2023-04-24T21:39:26Z

docs/DEPLOYING.md

@@ -114,6 +114,12 @@ This will generate two new migration scripts. Fill the `*.up.sql` file with the
 migration you want to run and the `*.down.sql` file with a script that reverses
 the first script.

+After adding a migration, you must add its version number to
+`datastore::SUPPORTED_SCHEMA_VERSIONS` as Janus will refuse to run if it does
+not recognize the database schema version. You also must add a `#[case]`


nit: the docs suggesting that a #[case] attribute needs to be created can now be dropped.

We want to prove in tests that some Janus version can run safely on multiple schema versions to make database schema migrations safer. In this commit: - `aggregator_core::test_util::EphemeralDatastore` can now be constructed with a `max_schema_version` argument to control which migration scripts are applied during tests - `datastore::SUPPORTED_SCHEMA_VERSIONS` is now emitted by the `supported_schema_versions` macro, which also constructs a template for stamping out tests. - We adopt [`rstest`][1] to inject an `EphemeralDatastore` instance into tests in `aggregator_core::datastore::Datatore::tests`. Using [`rstest_reuse`][2], we can automatically stamp out versions of existing tests that run using multiple schema versions. We only use this dependency injection technique in the `datastore` module, because that should be the only module that's tightly coupled to the database schema. We could run all tests that use a datastore this way, at the cost of increasing overall test runtime. [1]: https://docs.rs/rstest [2]: https://docs.rs/rstest_reuse

tgeoghegan requested a review from a team as a code owner April 19, 2023 17:36

tgeoghegan mentioned this pull request Apr 19, 2023

Check DB schema version at startup #1272

Merged

tgeoghegan commented Apr 19, 2023

View reviewed changes

tgeoghegan mentioned this pull request Apr 20, 2023

Test SQL migration down scripts #1279

Closed

divergentdave approved these changes Apr 24, 2023

View reviewed changes

branlwyd requested changes Apr 24, 2023

View reviewed changes

tgeoghegan force-pushed the timg/startup-check-schema-version branch from 47264b0 to 9c1457a Compare April 24, 2023 18:45

Base automatically changed from timg/startup-check-schema-version to main April 24, 2023 19:44

tgeoghegan force-pushed the timg/rstest branch from cd259ca to ef4d30c Compare April 24, 2023 20:37

tgeoghegan requested a review from branlwyd April 24, 2023 20:38

branlwyd approved these changes Apr 24, 2023

View reviewed changes

tgeoghegan force-pushed the timg/rstest branch from ef4d30c to ba91769 Compare April 24, 2023 22:34

tgeoghegan merged commit 3aea5f2 into main Apr 25, 2023

tgeoghegan deleted the timg/rstest branch April 25, 2023 00:19

tgeoghegan mentioned this pull request Apr 25, 2023

Janus should refuse to start up if it sees an unsupported schema version #1241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run `datastore` tests against multiple schemas #1277

Run `datastore` tests against multiple schemas #1277

tgeoghegan commented Apr 19, 2023 •

edited

Loading

tgeoghegan commented Apr 19, 2023 •

edited

Loading

tgeoghegan Apr 19, 2023

divergentdave Apr 24, 2023

branlwyd Apr 24, 2023

divergentdave Apr 24, 2023

tgeoghegan Apr 24, 2023

tgeoghegan Apr 24, 2023

tgeoghegan commented Apr 19, 2023

divergentdave Apr 24, 2023

divergentdave Apr 24, 2023

divergentdave Apr 24, 2023

branlwyd Apr 24, 2023

branlwyd Apr 24, 2023

branlwyd Apr 24, 2023

Run datastore tests against multiple schemas #1277

Run datastore tests against multiple schemas #1277

Conversation

tgeoghegan commented Apr 19, 2023 • edited Loading

tgeoghegan commented Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgeoghegan commented Apr 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Run `datastore` tests against multiple schemas #1277

Run `datastore` tests against multiple schemas #1277

tgeoghegan commented Apr 19, 2023 •

edited

Loading

tgeoghegan commented Apr 19, 2023 •

edited

Loading