Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run datastore tests against multiple schemas #1277

Merged
merged 1 commit into from
Apr 25, 2023
Merged

Run datastore tests against multiple schemas #1277

merged 1 commit into from
Apr 25, 2023

Conversation

tgeoghegan
Copy link
Contributor

@tgeoghegan tgeoghegan commented Apr 19, 2023

We want to prove in tests that some Janus version can run safely on multiple schema versions to make database schema migrations safer. In this commit:

  • aggregator_core::test_util::EphemeralDatastore can now be constructed with a max_schema_version argument to control which migration scripts are applied during tests
  • We adopt rstest to inject an EphemeralDatastore instance into tests in aggregator_core::datastore::Datatore::tests. Using rstest_reuse, we can automatically stamp out versions of existing tests that run using multiple schema versions.

We only use this dependency injection technique in the datastore module, because that should be the only module that's tightly coupled to the database schema. We could run all tests that use a datastore this way, at the cost of increasing overall test runtime.

@tgeoghegan tgeoghegan requested a review from a team as a code owner April 19, 2023 17:36
@tgeoghegan
Copy link
Contributor Author

tgeoghegan commented Apr 19, 2023

Some additional notes here:

  • For now I've only applied this to three tests in datastore.rs, because I want to illustrate how this works so we can discuss it without all the noise of adding a #[future] #[case] ephemeral_datastore: EphemeralDatastore argument to every test in the module. I'll do that if we agree to move forward with this strategy, though. Most recent commit sets this up for all the tests in datastore.rs.
  • We (should) observe that the test datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 fails, because indeed the current Janus code doesn't work on a schema that doesn't define column report_aggregators.last_prep_step, so I think that shows the tests working as expected: we've learned that we need to drop schema version 20230405185602 from the supported versions list.

#[rstest]
// The version numbers in these cases must match SUPPORTED_SCHEMA_VERSIONS
#[case::version_20230405185602(ephemeral_datastore_max_schema_version(20230405185602))]
#[case::version_20230417204528(ephemeral_datastore_max_schema_version(20230417204528))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this list of cases would be automatically stamped out from SUPPORTED_SCHEMA_VERSIONS, but I am not sure how to do that gracefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As declarative macros cannot produce attributes, we'd have to write our own procedural macro in order to do that. We can either do so as a progressive enhancement later, or just make coupled changes to the template and SUPPORTED_SCHEMA_VERSIONS by hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main value of rstest is that we run each datastore test on each supported schema version. However, I don't like that we have to remember to manually add a case each time we add a new schema version -- we should have one canonical list of supported schemas, and that is SUPPORTED_SCHEMAS. Currently, if I add a version to SUPPORTED_SCHEMAS but forget to add a case here, the tests would pass even if the new schema versions was broken; I (or a reviewer) have to manually notice that I'm missing a case. IMO, that's a blocker.

A few potential solutions come to mind:

  • I assume something like #[values(SUPPORTED_SCHEMAS)] doesn't work, but if so, that's easy enough (docs).
  • We could write a macro to generate the necessary #[case] attributes -- based on this StackOverflow question it looks like generating attributes from a macro is possible.
  • If the above aren't feasible/easy, another approach would be to avoid using rstest at all. We could write a "wrapper function" like run_with_ephemeral_datastore which would take a Fn(EphemeralDatastore), then call that function in a loop with an ephemeral datastore at the proper version. Every place that currently uses the template would be changed to call this function instead, wrapping the current test body. This would work, but wouldn't separate different versions of the same test into separate cases or provide parallelization of tests across versions, and less importantly would induce an additional level of indentation in most test bodies.

I would say writing a quick macro is probably the best approach, assuming there's no syntax that would allow SUPPORTED_VERSIONS to be used directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That inspires one way we could do this with a declarative macro: we could define and then use a macro that takes multiple schema versions as arguments, and then emits two top-level items: the const declaration of SUPPORTED_SCHEMA_VERSIONS, and the entire test template function, decorated with one #[case] attribute macro per macro argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea. I'll give the macro a try this afternoon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro was pretty easy to write (I did in the latest pushed commit). The downside is that because you can't construct identifiers with macro variables, we can't use rstest's case::label syntax to interpolate the schema version into test names.

@tgeoghegan
Copy link
Contributor Author

Sample test output:

running 48 tests
test datastore::tests::aggregation_job_not_found ... ok
test datastore::tests::aggregation_job_acquire_release ... ok
test datastore::tests::check_report_aggregation_exists ... ok
test datastore::tests::collection_job_acquire_job_max ... ok
test datastore::tests::collection_job_acquire_no_aggregation_job_with_agg_param ... ok
test datastore::tests::collection_job_acquire_no_aggregation_job_with_task_id ... ok
test datastore::tests::collection_job_acquire_release_aggregation_job_in_progress ... ok
test datastore::tests::collection_job_acquire_release_job_finished ... ok
test datastore::tests::collection_job_acquire_report_shares_outside_interval ... ok
test datastore::tests::count_client_reports_for_batch_id ... ok
test datastore::tests::count_client_reports_for_interval ... ok
test datastore::tests::crypter ... ok
test datastore::tests::collection_job_acquire_state_filtering ... ok
test datastore::tests::delete_expired_client_reports ... ok
test datastore::tests::delete_expired_aggregation_artifacts ... ok
test datastore::tests::delete_expired_collection_artifacts ... ok
test datastore::tests::get_aggregation_jobs_for_task ... ok
test datastore::tests::get_collection_job ... ok
test datastore::tests::get_report_aggregations_for_aggregation_job ... ok
test datastore::tests::get_task_ids ... ok
test datastore::tests::get_task_metrics ... ok
test datastore::tests::get_unaggregated_client_report_ids_for_task ... ok
test datastore::tests::fixed_size_collection_job_acquire_release_happy_path ... ok
test datastore::tests::get_unaggregated_client_report_ids_with_agg_param_for_task ... ok
test datastore::tests::report_aggregation_not_found ... ok
test datastore::tests::report_not_found ... ok
test datastore::tests::roundtrip_aggregate_share_job ... ok
test datastore::tests::roundtrip_aggregation_job ... ok
test datastore::tests::roundtrip_batch_aggregation_fixed_size ... ok
test datastore::tests::roundtrip_batch_aggregation_time_interval ... ok
test datastore::tests::roundtrip_interval_sql ... ok
test datastore::tests::roundtrip_outstanding_batch ... ok
test datastore::tests::roundtrip_report ... ok
test datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 ... FAILED
test datastore::tests::roundtrip_report_aggregation::case_2_version_20230417204528 ... ok
test datastore::tests::roundtrip_report_share ... ok
test datastore::tests::roundtrip_task::case_1_version_20230405185602 ... ok
test datastore::tests::roundtrip_task::case_2_version_20230417204528 ... ok
test datastore::tests::update_collection_jobs ... ok
test query_type::tests::validate_collect_identifier ... ok
test task::tests::aggregator_endpoints_end_in_slash ... ok
test task::tests::aggregator_request_paths ... ok
test task::tests::collector_auth_tokens ... ok
test task::tests::deserialize_docs_sample_tasks ... ok
test task::tests::reject_invalid_auth_tokens ... ok
test task::tests::task_serde ... ok
test task::tests::task_serialization ... ok
test datastore::tests::time_interval_collection_job_acquire_release_happy_path ... ok

error: test failed, to rerun pass `-p janus_aggregator_core --lib`
failures:

---- datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 stdout ----
  2023-04-19T18:25:59.505694Z ERROR janus_aggregator_core::datastore: error: DB error: db error: ERROR: column "last_prep_step" of relation "report_aggregations" does not exist
    at aggregator_core/src/datastore.rs:2036 on ThreadId(108)

thread 'datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602' panicked at 'called `Result::unwrap()` on an `Err` value: Db(Error { kind: Db, cause: Some(DbError { severity: "ERROR", parsed_severity: Some(Error), code: SqlState(E42703), message: "column \"last_prep_step\" of relation \"report_aggregations\" does not exist", detail: None, hint: None, position: Some(Original(169)), where_: None, schema: None, table: None, column: None, datatype: None, constraint: None, file: Some("parse_target.c"), line: Some(1061), routine: Some("checkInsertTargets") }) })', aggregator_core/src/datastore.rs:7366:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602

test result: FAILED. 47 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 16.42s

Note we get distinct test cases per schema version:

test datastore::tests::roundtrip_report_aggregation::case_1_version_20230405185602 ... FAILED
test datastore::tests::roundtrip_report_aggregation::case_2_version_20230417204528 ... ok
<...>
test datastore::tests::roundtrip_task::case_1_version_20230405185602 ... ok
test datastore::tests::roundtrip_task::case_2_version_20230417204528 ... ok

I like this because I believe it means cargo can run these in parallel and also a test failure with one schema version doesn't stop tests on other schema versions from running.

Comment on lines +8 to +11
// We must import `rstest_reuse` at the top of the crate
// https://docs.rs/rstest_reuse/0.5.0/rstest_reuse/#use-rstest_reuse-at-the-top-of-your-crate
#[cfg(test)]
use rstest_reuse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unusual, but we can live with it. They must be using ::crate::rstest_reuse in the macro expansion, expecting to hit this private imported name.

@@ -5439,6 +5441,8 @@ mod tests {
vdaf::prio3::{Prio3, Prio3Count},
};
use rand::{distributions::Standard, random, thread_rng, Rng};
use rstest::rstest;
//use rstest_reuse::{self, template};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be deleted.

#[rstest]
// The version numbers in these cases must match SUPPORTED_SCHEMA_VERSIONS
#[case::version_20230405185602(ephemeral_datastore_max_schema_version(20230405185602))]
#[case::version_20230417204528(ephemeral_datastore_max_schema_version(20230417204528))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As declarative macros cannot produce attributes, we'd have to write our own procedural macro in order to do that. We can either do so as a progressive enhancement later, or just make coupled changes to the template and SUPPORTED_SCHEMA_VERSIONS by hand.

@@ -5439,6 +5441,8 @@ mod tests {
vdaf::prio3::{Prio3, Prio3Count},
};
use rand::{distributions::Standard, random, thread_rng, Rng};
use rstest::rstest;
//use rstest_reuse::{self, template};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: commented code

#[rstest]
// The version numbers in these cases must match SUPPORTED_SCHEMA_VERSIONS
#[case::version_20230405185602(ephemeral_datastore_max_schema_version(20230405185602))]
#[case::version_20230417204528(ephemeral_datastore_max_schema_version(20230417204528))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main value of rstest is that we run each datastore test on each supported schema version. However, I don't like that we have to remember to manually add a case each time we add a new schema version -- we should have one canonical list of supported schemas, and that is SUPPORTED_SCHEMAS. Currently, if I add a version to SUPPORTED_SCHEMAS but forget to add a case here, the tests would pass even if the new schema versions was broken; I (or a reviewer) have to manually notice that I'm missing a case. IMO, that's a blocker.

A few potential solutions come to mind:

  • I assume something like #[values(SUPPORTED_SCHEMAS)] doesn't work, but if so, that's easy enough (docs).
  • We could write a macro to generate the necessary #[case] attributes -- based on this StackOverflow question it looks like generating attributes from a macro is possible.
  • If the above aren't feasible/easy, another approach would be to avoid using rstest at all. We could write a "wrapper function" like run_with_ephemeral_datastore which would take a Fn(EphemeralDatastore), then call that function in a loop with an ephemeral datastore at the proper version. Every place that currently uses the template would be changed to call this function instead, wrapping the current test body. This would work, but wouldn't separate different versions of the same test into separate cases or provide parallelization of tests across versions, and less importantly would induce an additional level of indentation in most test bodies.

I would say writing a quick macro is probably the best approach, assuming there's no syntax that would allow SUPPORTED_VERSIONS to be used directly.

@tgeoghegan tgeoghegan force-pushed the timg/startup-check-schema-version branch from 47264b0 to 9c1457a Compare April 24, 2023 18:45
Base automatically changed from timg/startup-check-schema-version to main April 24, 2023 19:44
@@ -114,6 +114,12 @@ This will generate two new migration scripts. Fill the `*.up.sql` file with the
migration you want to run and the `*.down.sql` file with a script that reverses
the first script.

After adding a migration, you must add its version number to
`datastore::SUPPORTED_SCHEMA_VERSIONS` as Janus will refuse to run if it does
not recognize the database schema version. You also must add a `#[case]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the docs suggesting that a #[case] attribute needs to be created can now be dropped.

We want to prove in tests that some Janus version can run safely on
multiple schema versions to make database schema migrations safer. In
this commit:

 - `aggregator_core::test_util::EphemeralDatastore` can now be
   constructed with a `max_schema_version` argument to control which
   migration scripts are applied during tests
 - `datastore::SUPPORTED_SCHEMA_VERSIONS` is now emitted by the
   `supported_schema_versions` macro, which also constructs a template
   for stamping out tests.
 - We adopt [`rstest`][1] to inject an `EphemeralDatastore` instance
   into tests in `aggregator_core::datastore::Datatore::tests`. Using
   [`rstest_reuse`][2], we can automatically stamp out versions of
   existing tests that run using multiple schema versions.

We only use this dependency injection technique in the `datastore`
module, because that should be the only module that's tightly coupled to
the database schema. We could run all tests that use a datastore this
way, at the cost of increasing overall test runtime.

[1]: https://docs.rs/rstest
[2]: https://docs.rs/rstest_reuse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants