-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime: Setup test fixtures for resolvers and migrate existing metrics tests #5218
Comments
The idea is that each file could define a set of project files and OLAP connectors to target, and then a list of tests to run and expected results. We might then have many of these files, such as Some of the goals here are:
I'm sure this can be made much cleaner, but here's a draft YAML spec for a test file. files:
model.sql: SELECT ...
metrics_view.yaml:
type: metrics_view
dimensions:
- ...
measures:
- ...
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows: |
aside from the structure, it's not that easy compared with the something as simple as proposed in the example:
writing SQLs with yaml identation will be akward. |
It also should be easy to run with the debugger (it's a frequent operation). |
I propose anything that can be parsed line by line, possible permissions can be added as an additional line:
Other parameters should be excluded and any separation resolver/project seperation should be done by folder and filenames or ignored and done in specialized native Go-tests. |
We are not looking to test SQL here, we are looking to test our resolvers, which a) rely on project files being parsed, b) need to be tested against multiple OLAPs (DuckDB, ClickHouse, Druid), c) have nested query parameters. I think the complexity here may make a flat text file tricky, but if you think it's possible, then please share a concrete proposal. If you think Go code is better, please suggest a syntax that incorporates all the considerations mentioned in the messages above. |
So we have a directory structure right now like this:
the problem here is that each OLAP has it's own directory and configuration - means if we create a resolver-yaml inside one of them then we need to create a copy of that for another OLAP, like:
To reuse our current framework we need to create for each OLAP identical test projects that differ only in OLAP configuration. So in pseudo language there will be a resolvers_test.go with something like:
Where the fixture for a SQL resolver should have a name sql_<name>.test and the content:
or yaml syntax:
Considering we need to generate
If we need to debug a particular fixture we can write an ad-hoc testing function like:
|
Please take a deeper look at the two first comments on this issue. Notably, the goal is to:
|
So it will look like we merge |
Considering this: files:
model.sql: SELECT ...
metrics_view.yaml:
type: metrics_view
dimensions:
- ...
measures:
- ...
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows: For each connector we need to create an instance and give it (instead of inst := &drivers.Instance{
Environment: "test",
OLAPConnector: "duckdb",
RepoConnector: "repo",
CatalogConnector: "catalog",
Connectors: []*runtimev1.Connector{
{
Type: "file",
Name: "repo",
Config: map[string]string{"dsn": projectPath},
},
{
Type: "duckdb",
Name: "duckdb",
Config: map[string]string{"dsn": ":memory:"},
},
{
Type: "sqlite",
Name: "catalog",
// Setting a test-specific name ensures a unique connection when "cache=shared" is enabled.
// "cache=shared" is needed to prevent threading problems.
Config: map[string]string{"dsn": fmt.Sprintf("file:%s?mode=memory&cache=shared", t.Name())},
},
},
EmbedCatalog: true,
}
for suite := files("testdata/*.suite") {
for c := readConnectors(suite) {
i := createInstance(createRepoConnector(suite)) // creating 'test-yaml' connector that provides the project content from `files` field
for test := readTests(suite) {
t.Run(test.name, func() {
runTest(i, test)
})
}
}
} |
Frankly, some points still trouble me.
Considering this, I wonder why we designed a project configuration with separate files initially, maybe we should redesign the project configuration to a single file? Right now we are planning to create different project configuration format for tests/production - that can lead to additional complexity.
This is relates to previous. But I could add additionally, the problem with suddenly-broken tests is solved by creating another project in
Yes, but why was the separation of projects from tests and placing them in |
Not necessarily – you can just use rill/runtime/compilers/rillv1/parser_test.go Line 1903 in 96bef35
That would be nice and has been requested previously. A single file would be especially nice for small projects (like test projects!). But it's out of scope for this work.
By taking the same approach as the parser tests, where files are declared inline but written out to a temp directory, this will not be a problem (because the actual parser will still be parsing multiple files on disk).
So the goal here is exactly to be able to just copy-paste one file and change it. It means you can add a new test by adding a single file. Copy/pasting a folder in a separate location from the test means future maintainers need to open and diff many files to understand the difference. That's what would be nice to avoid.
Yeah the goal here is to correct that mistake by moving test data and tests closer to each other. |
FYI, the datasource will be a separate file anyway. |
An example of the project in a single yaml: project:
sources:
ad_bids_mini_source.yaml:
connector: local_file
path: data/AdBids.csv.gz
models:
ad_bids_mini.sql: |
select
id,
timestamp,
publisher,
domain,
volume,
impressions,
clicks
from ad_bids_mini_source
dashboards:
ad_bids_mini_metrics_with_policy.yaml:
model: ad_bids_mini
display_name: Ad bids
description:
timeseries: timestamp
smallest_time_grain: ""
dimensions:
- label: Publisher
name: publisher
expression: upper(publisher)
description: ""
- label: Domain
property: domain
description: ""
measures:
- label: "Number of bids"
name: bid's number
expression: count(*)
- label: "Total volume"
name: total volume
expression: sum(volume)
- label: "Total impressions"
name: total impressions
expression: sum(impressions)
- label: "Total clicks"
name: total click"s
expression: sum(clicks)
security:
access: true
row_filter: "domain = '{{ .user.domain }}'"
exclude:
- if: "'{{ .user.domain }}' != 'msn.com'"
names:
- total volume
apis:
mv_sql_policy_api.yaml:
kind : api
metrics_sql: |
select
publisher,
domain,
"total impressions",
"total volume"
FROM
ad_bids_mini_metrics_with_policy
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows: |
Looks pretty good to me.
Could this be flattened to something like this?
|
The idea here is to setup a test file abstractions for
runtime/resolvers
, where test queries for any resolver, security claims, project files, and backend can be written in a nice syntax.The idea here is to draw inspiration from
sqllogictest
(also see this example from DuckDB's test files). For example, this could be YAML test files inruntime/resolvers/testdata
, which are loaded, parsed and executed.It would also be nice to support a way to automatically update the expected test output, e.g. using an
--update
flag as described in this blog post.Lastly, we should start migrating our existing metrics tests to the new files.
The text was updated successfully, but these errors were encountered: