Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend-agnostic testing #1205

Merged
merged 53 commits into from
May 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
bbee536
generic TestHelper class
ADBond Apr 26, 2023
7cc469a
duckdb test helper
ADBond Apr 26, 2023
efdba92
duckdb helper as a fixture
ADBond Apr 26, 2023
95d5d94
test decorator
ADBond Apr 26, 2023
ab82a41
test helpers for spark and sqlite
ADBond Apr 26, 2023
36ff118
pytest fixtures for sqlite and spark
ADBond Apr 26, 2023
d410a72
pytest ini marker options in pyproject.toml
ADBond Apr 26, 2023
6205ecf
test helper dict fixture for actual test use
ADBond Apr 26, 2023
82aad68
intercept tests and mark as appropriate so we run the correct sets
ADBond Apr 26, 2023
5a969df
add spark and sqlite to dialects available to decorator
ADBond Apr 26, 2023
46743d5
test_array_columns to dialect-agnostic form
ADBond Apr 26, 2023
9d1b616
train v predict using decorator
ADBond Apr 26, 2023
0084235
sqlite helper - convert frame running sql + name, drop unneeded argument
ADBond Apr 26, 2023
06d8689
allow inclusive dialect marking
ADBond Apr 26, 2023
fcac00e
estimate lambda test -> agnostic form
ADBond Apr 26, 2023
31d9a6e
duckdb-specific tests marked inclusively
ADBond Apr 26, 2023
c88fd2a
u training test -> decorated
ADBond Apr 26, 2023
2a6f83f
linting
ADBond Apr 26, 2023
a5ad43a
expose sqlite levenshtein levels
ADBond Apr 26, 2023
a5a8c6d
varargs in decorators
ADBond Apr 26, 2023
d519d79
mark for running _only_ dialect-specific tests
ADBond Apr 26, 2023
429c2b5
option for core tests only - non-backend-specific
ADBond Apr 26, 2023
b60ab8b
tidying
ADBond Apr 26, 2023
86bcd7f
clearer decorator names
ADBond Apr 26, 2023
c54d592
simple conftest for benchmarking to make compatible with pytest ini
ADBond Apr 27, 2023
9421825
use benchmarking basic conftest for running splink_demos notebook tests
ADBond Apr 27, 2023
4689889
rename docs testing file for genericity
ADBond May 9, 2023
479abf0
expand docs to include info on writing using new framework
ADBond May 10, 2023
47dee8d
merge master to get code tabs update
ADBond May 10, 2023
712a53c
use code tabs for inclusive decorator example
ADBond May 10, 2023
185d1d8
docs on running tests
ADBond May 10, 2023
ceba5d7
expand docker-test docs slightly + dockerignore file to help build
ADBond May 10, 2023
b75cef4
adding links to testing docs
ADBond May 10, 2023
22f546a
some tweaks to testing docs
ADBond May 10, 2023
d33debf
helper.linker -> Linker to make it clearer it is a class rather than …
ADBond May 10, 2023
e13c766
testing docs - useful flags for docker run
ADBond May 11, 2023
5ad108b
merge changes from master, resolve conflicts
ADBond May 22, 2023
9fd04a2
simplify helper fixture
ADBond May 22, 2023
5285bb2
test helpers - load_frame_from_parquet method
ADBond May 22, 2023
2bdc7f6
+ ctl property for test helpers
ADBond May 23, 2023
db07e3b
testing docs - adding clarity
ADBond May 24, 2023
e5cd592
testing dev guide - put 'running tests' section before 'writing tests'
ADBond May 24, 2023
c2cf894
extra pytest options to collapsible tips block
ADBond May 24, 2023
e51320c
docker testing -> collapsable block
ADBond May 24, 2023
93cb31d
docs note on lack of athena in testing framework
ADBond May 24, 2023
8f644db
add force push
ThomasHepworth May 24, 2023
918f109
+bullets
ADBond May 24, 2023
ee6376a
backend-agnostic testing docs - add an example covering all backends …
ADBond May 24, 2023
6e7e536
Further breakdown of test running options
ADBond May 24, 2023
713a372
gitignore vscode settings
ADBond May 24, 2023
9188be3
Merge branch 'master' into backend-agnostic-testing
ADBond May 24, 2023
9bb8d2d
Merge branch 'master' into backend-agnostic-testing
ADBond May 25, 2023
92e2234
Tests for specific backends -> Backend-specific tests
ADBond May 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*

!tests/
!splink/
!pyproject.toml
!poetry.lock
!README.md

**/*.pyc
1 change: 1 addition & 0 deletions .github/workflows/run_demos_examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ jobs:
- name: Install environment and check notebooks
run: |
cd splink_demos
cp ../benchmarking/conftest.py conftest.py
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/run_demos_tutorials.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
- name: Install environment and check notebooks
run: |
cd splink_demos
cp ../benchmarking/conftest.py conftest.py
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,4 +172,7 @@ cython_debug/
*.parquet
*.csv

.DS_Store
.DS_Store

# vscode local settings
.vscode
5 changes: 5 additions & 0 deletions benchmarking/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# add default marker to all tests - this flag is on by default
# set in pyproject.toml to aid testing tests/
def pytest_collection_modifyitems(items, config):
for item in items:
item.add_marker("default")
32 changes: 0 additions & 32 deletions docs/dev_guides/changing_splink/running_tests_locally.md

This file was deleted.

366 changes: 366 additions & 0 deletions docs/dev_guides/changing_splink/testing.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ nav:
- Building a Virtual Environment: "dev_guides/changing_splink/building_env_locally.md"
- Linting: "dev_guides/changing_splink/lint.md"
- Building Docs: "dev_guides/changing_splink/build_docs_locally.md"
- Running Tests: "dev_guides/changing_splink/running_tests_locally.md"
- Testing: "dev_guides/changing_splink/testing.md"
- Releasing a Package Version: "dev_guides/changing_splink/releases.md"
- Caching and pipelining: "dev_guides/caching.md"
- Understanding and debugging Splink: "dev_guides/debug_modes.md"
Expand Down
19 changes: 18 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,21 @@ select = [
ignore = [
"B905", # `zip()` without an explicit `strict=` parameter
"B006", # Do not use mutable data structures for argument defaults"
]
]

[tool.pytest.ini_options]
addopts = ["-m default"]
markers = [
# only tests where backend is irrelevant:
"core",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core isn't currently running any tests for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, is this running without a folder specified? think in that case it is picking up conftest.py from benchmarking/ which doesn't include the same logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in are you doing just e.g. pytest -m core rather than pytest -m core tests/?

If so would you expect that to run the tests and benchmarking? If so I can either copy the functionality to benchmarking, or even perhaps lift conftest to root level so that it covers both

# see tests/decorator.py::dialect_groups for group details:
"default",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want default to run all of the benchmarks, or do we want this to be its own test mark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken benchmarking to be a separate workstream and consider them running separately, as they are performing different functions. As far as this PR goes I have just included the changes so that benchmarking runs the same as previously.
Might be a good upgrade to do something similar for benchmarking, but that might want some additional things (like slow/not slow type markers) and maybe a think about how we are doing the benchmark stuff

"all",
# backend-specific sets
"duckdb",
"duckdb_only",
Comment on lines +75 to +76
Copy link
Contributor

@ThomasHepworth ThomasHepworth May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it that duckdb and duckdb_only are identical?

Ignore me, you have a note about this in your main description.

"spark",
"spark_only",
"sqlite",
"sqlite_only",
]
8 changes: 8 additions & 0 deletions splink/sqlite/sqlite_comparison_level_library.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
DistanceFunctionLevelBase,
ElseLevelBase,
ExactMatchLevelBase,
LevenshteinLevelBase,
NullLevelBase,
PercentageDifferenceLevelBase,
)
Expand All @@ -30,6 +31,9 @@ def _distance_function_level(self):
return distance_function_level

@property
def _levenshtein_level(self):
return levenshtein_level

def _columns_reversed_level(self):
return columns_reversed_level

Expand All @@ -46,6 +50,10 @@ class else_level(SqliteBase, ElseLevelBase):
pass


class levenshtein_level(SqliteBase, LevenshteinLevelBase):
pass


class columns_reversed_level(SqliteBase, ColumnsReversedLevelBase):
pass

Expand Down
9 changes: 9 additions & 0 deletions splink/sqlite/sqlite_comparison_library.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from ..comparison_library import (
DistanceFunctionAtThresholdsComparisonBase,
ExactMatchBase,
LevenshteinAtThresholdsComparisonBase,
)
from .sqlite_comparison_level_library import SqliteComparisonProperties

Expand All @@ -13,3 +14,11 @@ class distance_function_at_thresholds(
SqliteComparisonProperties, DistanceFunctionAtThresholdsComparisonBase
):
pass


class levenshtein_at_thresholds(
SqliteComparisonProperties, LevenshteinAtThresholdsComparisonBase
):
@property
def _distance_level(self):
return self._levenshtein_level
26 changes: 26 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,25 @@
import pytest

from splink.spark.jar_location import similarity_jar_location
from tests.decorator import dialect_groups
from tests.helpers import DuckDBTestHelper, SparkTestHelper, SQLiteTestHelper

logger = logging.getLogger(__name__)


def pytest_collection_modifyitems(items, config):
# any tests without backend-group markers will always run
marks = {gp for groups in dialect_groups.values() for gp in groups}
# any mark we've added, but excluding e.g. parametrize
our_marks = {*marks, *dialect_groups.keys()}

for item in items:
if not any(marker.name in our_marks for marker in item.iter_markers()):
item.add_marker("core")
for mark in our_marks:
item.add_marker(mark)


@pytest.fixture(scope="module")
def spark():
from pyspark import SparkConf, SparkContext
Expand Down Expand Up @@ -35,3 +50,14 @@ def df_spark(spark):
df = spark.read.csv("./tests/datasets/fake_1000_from_splink_demos.csv", header=True)
df.persist()
yield df


# workaround as you can't pass fixtures as param arguments in base pytest
# see e.g. https://stackoverflow.com/a/42400786/11811947
@pytest.fixture
def test_helpers(spark):
return {
"duckdb": DuckDBTestHelper(),
"spark": SparkTestHelper(spark),
"sqlite": SQLiteTestHelper(),
}
48 changes: 48 additions & 0 deletions tests/decorator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import pytest

dialect_groups = {
"duckdb": ["default"],
"spark": ["default"],
"sqlite": [],
}
for groups in dialect_groups.values():
groups.append("all")


def invert(sql_dialects_missing):
return (
sql_d for sql_d in dialect_groups.keys() if sql_d not in sql_dialects_missing
)


def mark_with_dialects_excluding(*sql_dialects_missing):
sql_dialects = invert(sql_dialects_missing)
return mark_with_dialects_including(*sql_dialects, pass_dialect=True)


def mark_with_dialects_including(*sql_dialects, pass_dialect=False):
def mark_decorator(test_fn):
params = []
all_marks = []
for sql_d in sql_dialects:
# marks for whatever groups the dialect is in
marks = [
getattr(pytest.mark, dialect_group)
for dialect_group in dialect_groups[sql_d]
]
# plus the basic dialect mark
dialect_mark = getattr(pytest.mark, sql_d)
dialect_only_mark = getattr(pytest.mark, f"{sql_d}_only")
marks += [dialect_mark, dialect_only_mark]
params.append(pytest.param(sql_d, marks=marks))
# will end up with duplicates, but think that's okay. for now at least.
all_marks += marks

if pass_dialect:
test_fn = pytest.mark.parametrize("dialect", params)(test_fn)
else:
for mark in all_marks:
test_fn = mark(test_fn)
return test_fn

return mark_decorator
Loading