This guide details our testing philosophy, as well as how to run the test suites.
There are broadly three test suites:
-
The Rust unit/integration test suite, which is run by
cargo test
. These live alongside the code they test in amod tests { ... }
block, or in thetests/
directory within a crate. -
The data-driven system test suite in the top-level test/ directory. These consist of text files in various DSLs (sqllogictest, testdrive, pgtest) that essentially specify SQL commands to run and their expected output.
-
The long-running performance and stability test suite. This test suite has yet to be automated. At the moment it consists of engineers manually running the demo in demo/chbench/.
The unit/integration and system test suites are run on every PR and can easily be run locally. The goal is for these test suites to be quite fast, ideally under 10m, while still catching 95% of bugs.
The long-running tests will simulate actual production runs of Materialize on gigabytes to terabytes of data over several hours to days, and will therefore report their results asynchronously (perhaps nightly or weekly).
Details about each of the test suites follow.
The unit/integration test suite uses the standard test framework that ships with Cargo. You can run the full test suite like so:
$ cargo test
Some of the packages have tests that depend on ZooKeeper, Kafka, and the Confluent Schema Registry running locally on the default ports. See the Developer guide for full details on setting this up, but the following will do the trick if you have the Confluent Platform 5.3+ CLI installed and configured:
$ confluent local services schema-registry start
cargo test
supports many options for precisely controlling what tests get
run, like --package CRATE
, --lib
, --doc
. See its documentation for the
full details.
Two arguments bear special mention.
The first is the --nocapture
argument, which tells Cargo not to hide the
output of successful test runs:
$ cargo test -- --nocapture
Notice how the --nocapture
argument appears after a --
argument. This is not
a typo. This is how you pass an argument to the test binary itself, rather than
to Cargo. This syntax might be a bit confusing at first, but it's used
pervasively throughout Cargo, and eventually you get used to it. (It's also the
POSIX-blessed syntax for indicating that you want option parsing to stop, so you
might be familiar with it from other command-line tools.)
Without --nocapture
, println!()
and dbg!()
output from tests can go
missing, making debugging a very frustrating experience.
The second argument worth special mention is the filter argument, which only
runs the tests that match the specified pattern. For example, to only run tests
with avro
in their name:
$ cargo test -- avro
As mentioned above, the Rust unit/integration tests follow the standard
convention, and live in a mod tests { ... }
block alongside the code
they test, or in a tests/
subdirectory of the crate they test, respectively.
Datadriven is a tool for writing
table-driven tests in
Rust, with rewrite support. datadriven tests plug into cargo test
as unit
or integration tests, but store test data in separate files from the Rust code.
Datadriven is particularly useful when the output to be tested is too large to
be mantained by hand efficiently. The expected output of tests written with
datadriven can be updated by running them with the REWRITE
environment
variable set:
$ REWRITE=1 cargo test
When using REWRITE
it's important to inspect the diff carefully to ensure
that nothing changed unexpectedly.
For an example of what datadriven tests look like, check out
src/transform/tests
.
There are presently two system test frameworks. These are tests against with Materialize as a whole or near-whole, and often test its interaction with other systems, like Kafka.
The first system test framework is sqllogictest, which was developed by the authors of SQLite to cross-check queries across various database vendors. Here's a representative snippet from a SQLite file:
query I rowsort
SELECT - col2 + col1 + col2 FROM tab2
----
17
31
59
This snippet will verify that the query SELECT - col2 + col1 + col2 FROM tab2
outputs a table with one integer column (that's the query I
bit) with the
specified results without verifying order (that's the rowsort
bit).
For more information on how sqllogictest works, check out the official SQLite sqllogictest docs. Note that our version of sqllogictest has a bunch of modifications from CockroachDB and from ourselves. The SQLite docs document what is known "mode standard". Look in the Materialize sqllogictest extended guide for additional documentation, with an emphasis on the extensions beyond what is in the SQLite's base sqllogictest.
sqllogictest ships with a huge corpus of test data—something like 6+ million queries. This takes hours to run in full, so in CI we only run a small subset on each PR, but can schedule the full run on demand. For many features, you may not need to write many tests yourself, if you can find an existing sqllogictest file that has coverage of that feature! (Grep is your friend.)
We pass every SQLite test, with only a few modifications to the upstream test data.
To run a sqllogictest against Materialize, you'll need to have PostgreSQL running on the default port, 5432, and have created a database named after your user. On macOS:
$ brew install postgresql
$ brew services start postgresql
$ createdb $(whoami) # Yes, this is a shell command, not a SQL command.
On Debian, the current user might not exist or have sufficient permissions to
create a database. If the createdb
command fails, try to create a local
postgres user matching the current user, with the createdb
permission:
$ sudo -u postgres createuser "$(whoami)" --createdb
You might reasonably wonder why PostgreSQL is necessary for running
sqllogictests against Materialize. The answer is that we offload the hard work
of mutating queries, like INSERT INTO ...
and UPDATE
, to PostgreSQL. We
slurp up the changelog, much like we would if we were running against a
Kafka–PostgreSQL CDC solution in production, and then run the SELECT
queries
against Materialize and verify they match the results specified by the
sqllogictest file.
Once PostgreSQL is running, you can run a sqllogictest file like so:
$ cargo run --bin sqllogictest --release -- test/sqllogictest/TESTFILE.slt
For larger test files, it is imperative that you compile in release mode, i.e.,
by passing the --release
flag as above. The extra compile time will quickly be
made up for by a much faster execution.
To add logging for tests, append -vv
, e.g.:
$ cargo run --bin sqllogictest --release -- test/TESTFILE.slt -vv
There are currently three classes of sqllogictest files:
-
The offical SQLite test files are in test/sqllogictest/sqlite. Note that the directory is a git submodule, so the folder will start off as empty if you did not clone this repository with
--recurse-submodules
. To populate it, run:$ git submodule update --init
-
Additional test files from CockroachDB are in test/sqllogictest/cockroach. Note that we certainly don't pass every Cockroach sqllogictest test at the moment.
-
Additional Materialize-specific sqllogictest files live in test/sqllogictest.
Feel free to add more Materialize-specific sqllogictests! Because it is more lightweight, sqllogictest is the preferred system test framework when testing:
- the correctness of queries,
- what query plans are being generated.
In general, do not add or modify SQLite and/or CockroachDB test files because that inhibits syncing with upstream.
Testdrive is a Materialize invention that feels a lot like sqllogictest, except it supports pluggable commands for interacting with external systems, like Kafka. It has its own documentation page.
Note that testdrive actually interacts with Materialize over the network, via
the PostgreSQL wire protocol, so it tests more of Materialize than our
sqllogictest driver does. (It would be possible to run sqllogictest over the
network too, but this is just not how things are configured at the moment.)
Therefore it's often useful to write some additional testdrive tests to make
sure things get serialized on the wire properly. For example, when adding a new
data type, like, say, DATE
, the upstream CockroachDB sqllogictests will
provide most of the coverage, but it's worth adding some (very) basic
testdrive tests (e.g., > SELECT DATE '1999-01-01'
) to ensure that our pgwire
implementation is properly serializing dates.
Pgtest is DSL to specify raw pgwire messages to send and their expected responses. It can be used to test message sequences that are difficult or impossible to test with PostgreSQL drivers. Its output is generated against PostgreSQL and then tested against Materialize. Usage is documented at the pgtest crate.
These are still a work in progress. The beginning of the orchestration has begun, though; see the Docker Compose demo in demo/chbench if you're curious.
tl;dr add additional system tests, like sqllogictests and testdrive tests.
Unit/integration tests are great for complicated pure functions. A good example
is Avro decoding. Bytes go in, and avro::Value
s come out. This is a very easy
unit test to write, and one that provides a lot of value. The Avro specification
is stable, so the behaviour of the function should never change, modulo bugs.
But unit/integration tests can be detrimental when testing stateful functions, especially those in the middle of the stack. Trying to unit test some of the functions in the dataflow package would be an exercise in frustration. You'd need to mock out a dozen different objects and introduce several traits, and the logic under test is changing frequently, so the test will hamper development as often as it catches regressions.
Nikhil's philosophy is that writing a battery of system tests is a much better use of time. Testdrive and sqllogictest have discovered numerous bugs with the fiddly bits of timestamp assignment in the coord package, even though that's not what they're designed for. It's true that it's much harder to ascertain what exactly went wrong–some of these failures presented as hangs in CI—but I wager that you still net save time by not writing overly complicated and brittle unit tests.
As a module stabilizes and its abstraction boundaries harden, unit testing becomes more attractive. But be wary of introducing abstraction just to make unit tests easier to get right.
And, as a general rule of thumb, you don't want to add a new long-running test. Experience suggests that these are quite finicky (e.g., because a VM fails to boot), never mind that they're quite slow, so they're best limited to a small set of tests that exercise core business metrics. A long-running test should, in general, be scheduled as a separate work item, so that it can receive the nurturing required to be stable enough to be useful.