Backend-agnostic testing #1205

ADBond · 2023-04-26T16:41:53Z

This PR aims to provide a framework to easily allow backend-agnostic testing, so that it is straightforward to write tests that can run against all SQL backends. This currently covers duckdb, spark, and sqlite only.

The core of this provides a decorator mark_with_dialects_excluding which can be used to decorate test functions, which marks test with all dialects except those specified, and any groups those dialects belong to (see below). There is also an inclusive version mark_with_dialects_including which should be used only for specific dialect tests.

When tests that are marked in this way run, they receive a fixture test_helpers and a parameter dialect, which allows the use of an object helper = test_helpers[dialect] which has methods and properties to cover any dialect-specific part of testing, such as helper.linker, helper.cl (comparison library), and helper.load_frame_from_csv().

Dialects can belong to zero or more groups - currently the only group is the privileged group default containing spark and duckdb which runs if no marks are passed in the command-line when running tests. These groups are just shortcuts to run sets of backends together.

There are a bunch of different options to run tests depending on which tests are to run. In the following core tests are precisely those which are not decorated with one of the new decorators (and should ultimately end up as more unit-testy stuff that does not care about backend, such as test_input_column, test_lat_long_distance:

pytest -m core - runs the core tests only
pytest -m spark - runs the core tests and all spark tests (&sim for other backends)
pytest -m spark_only - runs only spark tests (&sim for other backends)
pytest - equivalent to pytest -m default - runs core tests, and those in the default group (spark and duckdb)
pytest -m all - runs everything (core and all covered backends)

I have updated only a few test scripts with this approach, just to show how this works.

A couple of notes (which hopefully help sell the benefit of this approach for coverage):

initially updated tests/test_charts.py, but one of the tests fails for spark due to a bug in handling of infinity in Bayes factors, so will leave that to a separate PR
test_estimate_prob_two_rr_match.py::test_prob_rr_valid_range and tests/test_u_train.py::test_u_train_link_only_sample both fail for sqlite, seemingly due to separate bugs. This backend won't run for now in CI (which I haven't touched), so figure this is okay to go as-is. Will open separate issues for these

this pattern probably more in line with server-based dbs

i.e. not including unmarked tests that are independent of dialect - should be useful in e.g. CI

so that -m default automatically selects all tests in these cases

ThomasHepworth · 2023-05-23T10:22:43Z

pyproject.toml

+ "duckdb",
+ "duckdb_only",


~~I take it that duckdb and duckdb_only are identical?~~

Ignore me, you have a note about this in your main description.

ThomasHepworth · 2023-05-23T10:24:51Z

tests/helpers.py

+ @property
+ def cll(self):
+ return cll_duckdb
+
+ @property
+ def cl(self):
+ return cl_duckdb


ThomasHepworth · 2023-05-23T11:10:36Z

pyproject.toml

+addopts = ["-m default"]
+markers = [
+# only tests where backend is irrelevant:
+ "core",


core isn't currently running any tests for me

Hmmm, is this running without a folder specified? think in that case it is picking up conftest.py from benchmarking/ which doesn't include the same logic.

as in are you doing just e.g. pytest -m core rather than pytest -m core tests/?

If so would you expect that to run the tests and benchmarking? If so I can either copy the functionality to benchmarking, or even perhaps lift conftest to root level so that it covers both

ThomasHepworth · 2023-05-23T11:11:07Z

pyproject.toml

+# only tests where backend is irrelevant:
+ "core",
+# see tests/decorator.py::dialect_groups for group details:
+ "default",


Do we want default to run all of the benchmarks, or do we want this to be its own test mark?

I've taken benchmarking to be a separate workstream and consider them running separately, as they are performing different functions. As far as this PR goes I have just included the changes so that benchmarking runs the same as previously.
Might be a good upgrade to do something similar for benchmarking, but that might want some additional things (like slow/not slow type markers) and maybe a think about how we are doing the benchmark stuff

ThomasHepworth

Thanks Andy, I'm happy for this to be merged once Ross is content.

RossKen

Excellent work all round @ADBond! 🎉

The code all works well, the only comments I have left are on the docs to try and make them the most readable they can. This isn't necessarily a trivial structure so I really want to make sure it is as easy as possible to understand. Most of them are just suggestions so happy to discuss any to try and get to the best solution!

RossKen · 2023-05-23T16:48:58Z