Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST: Run all tests through pytest #2090

Merged
merged 26 commits into from
Oct 14, 2024

Conversation

david-cortes-intel
Copy link
Contributor

Description

Currently, unit tests execute through a mixture of built-in unittest and pytest.

From the commit history, it looks like initial tests were added using python's built-in unittest framework, and then subsequent tests were introduced using pytest.

Compared to unittest, pytest has a much nicer output format, offers many options for how to execute tests and how to see their outputs, and has plugins that allow it do extra things which the built-in cannot when desired (e.g. JSON outputs, test coverage reports, etc.).

This PR switches the automated scripts to execute all of the tests using pytest. This doesn't change what tests were executed - just uses pytest as the runner.

One challenge here is how each of these interprets the path from where modules are imported - here I'm trying to control these with the PYTHONPATH variable when pytest is called within shell scripts, but don't know if this will work out for all the variants under which tests are executed.


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@david-cortes-intel david-cortes-intel added the testing Tests for sklearnex/daal4py/onedal4py & patching sklearn label Oct 7, 2024
@david-cortes-intel david-cortes-intel marked this pull request as draft October 7, 2024 15:28
@david-cortes-intel
Copy link
Contributor Author

@Alexsandruss Would you perhaps have any pointers about why the failing CI job here:
https://github.com/intel/scikit-learn-intelex/actions/runs/11218859163/job/31183555329?pr=2090

.. might be failing to import daal4py._daal4py in the new pytest call that I added, while it succeeds in the other calls?

@icfaust
Copy link
Contributor

icfaust commented Oct 7, 2024

I've had to fight against pytest's package discovery recently and had to use --rootdir to get things to work properly (https://github.com/intel/scikit-learn-intelex/blob/main/.ci/scripts/run_sklearn_tests.py#L47). Maybe that could help you? I believe your daal4py._daal4py error is coming from the $PYTHONPATH which is using the repo daal4py folder to find daal4py instead of the installed site-packages daal4py, because it can't find the compiled _daal4py.so file there.

@david-cortes-intel
Copy link
Contributor Author

I've had to fight against pytest's package discovery recently and had to use --rootdir to get things to work properly (https://github.com/intel/scikit-learn-intelex/blob/main/.ci/scripts/run_sklearn_tests.py#L47). Maybe that could help you? I believe your daal4py._daal4py error is coming from the $PYTHONPATH which is using the repo daal4py folder to find daal4py instead of the installed site-packages daal4py, because it can't find the compiled _daal4py.so file there.

Thanks for the tip. But that was actually for a different reason: some test files import other tests files - for example test_daal4py_spmd_examples.py imports test_daal4py_examples right here, as an absolute import instead of relative, so the folder with test files needs to be under sys.path for it to work:
https://github.com/intel/scikit-learn-intelex/blob/2f73b9de4a79205b67522f502ae140e6470871f3/tests/test_daal4py_spmd_examples.py#L22

But since now these tests should only be getting executed through pytest, I guess a better solution should be to use relative imports. I'm now trying that approach so let's see how the CI will end up working with it.

@david-cortes-intel
Copy link
Contributor Author

Looks like it was just a matter of deleting the __init__.py file in the tests folder. All CI jobs are executing them correctly now.

@david-cortes-intel david-cortes-intel marked this pull request as ready for review October 9, 2024 06:34
@david-cortes-intel david-cortes-intel changed the title [Draft] TEST: Run all tests through pytest TEST: Run all tests through pytest Oct 9, 2024
Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for doing this. Here are my initial thoughts.

Comment on lines 26 to 38
err_code = subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "patch_sklearn", "-a", "svc"]
)
assert not err_code

def unpatch_from_cmd():
err_code = subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "unpatch_sklearn"]
)
assert not err_code

request.addfinalizer(unpatch_from_cmd)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err_code = subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "patch_sklearn", "-a", "svc"]
)
assert not err_code
def unpatch_from_cmd():
err_code = subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "unpatch_sklearn"]
)
assert not err_code
request.addfinalizer(unpatch_from_cmd)
return
err_code = subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "patch_sklearn", "-a", "svc"]
)
assert not err_code
yield
err_code = subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "unpatch_sklearn"]
)
assert not err_code

Thoughts on using yield instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the advantage of using yield here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not a must, but its the pytest recommended procedure over adding a finalizer https://docs.pytest.org/en/stable/how-to/fixtures.html#yield-fixtures-recommended

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to yield.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the windows job doesn't like usage of 'yield':

INTERNALERROR>     ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "C:\Miniconda\envs\CB\Lib\site-packages\_pytest\logging.py", line 790, in pytest_collection
INTERNALERROR>     return (yield)
INTERNALERROR>             ^^^^^

...

joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Will try to see where that multi-processing invocation is coming from.

def patch_from_function(request):
from sklearnex import patch_sklearn, unpatch_sklearn

patch_sklearn(name=["svc"], global_patch=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again possibly yield?

.ci/scripts/test_global_patch.py Outdated Show resolved Hide resolved
)


def test_unpatching_from_command_line(patch_from_command_line):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the fixture used here again? Is this trying to expand functionality or match the previous one (because this is slightly different from what was before). What was previous assumed that the global setting was returned to normal after the unpatch, so by reapplying the "sklearnex.glob", "patch_sklearn", "-a", "svc" the state of this test is different than what occurred in the file before.

Copy link
Contributor Author

@david-cortes-intel david-cortes-intel Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture has a finalizer. So on each test, it gets initialized (global patching is applied), and after the test finishes, gets finalized (global unpatching is applied). Then the same repeats for the next test that uses the fixture, ensuring that state doesn't propagate across tests. This is of course assuming that patching and unpatching are working correctly, which is what the tests are looking at in the first place, but I thought it better than the idea of merging patching and unpatching in a single test.

.ci/scripts/test_global_patch.py Outdated Show resolved Hide resolved
assert not err_code
from sklearnex import unpatch_sklearn

unpatch_sklearn()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/intel/scikit-learn-intelex/blob/main/sklearnex/tests/test_monkeypatch.py#L65 Just be careful with these functions since they impact global state and aren't local to the test (anything with a patch_sklearn call should be with a 'try-finally' with an unpatch_sklearn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea behind the fixtures with finalizers.

)


def test_unpatching_from_function(patch_from_function):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at what the code had previously been, I am glad you have done this. Much better isolation of tests.

- mpirun -n 4 python -m unittest discover -v -s tests -p test*spmd*.py # [not win]
- mpiexec -localonly -n 4 python -m unittest discover -v -s tests -p test*spmd*.py # [win]
- python -m unittest discover -v -s tests -p test*.py
- mpirun -n 4 pytest --pyargs --verbose -s tests/test*spmd*.py # [not win]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is --pyargs necessary since you are using a path and not a package name? (same goes for the others in run_test.bat / run_test.sh)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. But all the other pytest calls have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say cut them out and see what happens. If they fail you can readd them (and ping me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like they weren't necessary.

tests/test_examples_sklearnex.py Outdated Show resolved Hide resolved
Comment on lines -37 to -39
err_code = subprocess.call([sys.executable, "-m", "sklearnex.glob", "unpatch_sklearn"])
assert not err_code
unpatch_sklearn()
Copy link
Contributor

@icfaust icfaust Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state of the unpatch_sklearn() test was previously after a subprocess.call([sys.executable, "-m", "sklearnex.glob", "unpatch_sklearn"]) not a
subprocess.call(
[sys.executable, "-m", "sklearnex.glob", "patch_sklearn", "-a", "svc"] call, meaning the state of the test is different in the new implementation. This is okay if this was what you desired

#2090 (comment)

Copy link
Contributor Author

@david-cortes-intel david-cortes-intel Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had missed that detail entirely.

So I tried adding a new test for global state separately from just SVC, but the test actually ends up failing:

@pytest.fixture
def patch_all_from_command_line():
    err_code = subprocess.call(
        [sys.executable, "-m", "sklearnex.glob", "patch_sklearn"]
    )
    assert err_code == os.EX_OK

    yield

    err_code = subprocess.call(
        [sys.executable, "-m", "sklearnex.glob", "unpatch_sklearn"]
    )
    assert err_code == os.EX_OK

def test_patching_all_from_command_line(patch_all_from_command_line):
    from sklearn.svm import SVC, SVR

    assert SVC.__module__.startswith("daal4py")
    assert SVC.__module__.startswith("sklearnex")
    assert SVR.__module__.startswith("daal4py")
    assert SVR.__module__.startswith("sklearnex")

should that be expected to succeed? same happens if checking them with "or", and happens for both SVC and SVR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that its failing on
assert SVC.__module__.startswith("daal4py"). You can remove
assert SVC.__module__.startswith("daal4py") and assert SVR.__module__.startswith("daal4py")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also fails for sklearnex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it this new test as an extra commit here, let's see if it also fails in the CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a legit failure then, lets see

Copy link
Contributor Author

@david-cortes-intel david-cortes-intel Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it also fails in the CI jobs:

    def test_patching_all_from_command_line(patch_all_from_command_line):
        from sklearn.svm import SVC, SVR
    
>       assert SVC.__module__.startswith("daal4py") or SVC.__module__.startswith("sklearnex")
E       AssertionError: assert (False or False)
E        +  where False = <built-in method startswith of str object at 0x7f8679a04300>('daal4py')
E        +    where <built-in method startswith of str object at 0x7f8679a04300> = 'sklearn.svm._classes'.startswith
E        +      where 'sklearn.svm._classes' = <class 'sklearn.svm._classes.SVC'>.__module__
E        +  and   False = <built-in method startswith of str object at 0x7f8679a04300>('sklearnex')
E        +    where <built-in method startswith of str object at 0x7f8679a04300> = 'sklearn.svm._classes'.startswith
E        +      where 'sklearn.svm._classes' = <class 'sklearn.svm._classes.SVC'>.__module__

scripts/test_global_patch.py:83: AssertionError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is bad, i'll admit I don't know much about the glob use. I wonder if sklearn has been pre-imported via sklearnex imports, and the change in glob not affecting it. There is definitely something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like that does indeed make a difference. If leaving it as the only test then it actually succeeds. Will try to find a workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't manage to find any solution so in the end I just removed the test and left a comment in the file.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming small English correction ready to merge

.ci/scripts/test_global_patch.py Outdated Show resolved Hide resolved
Co-authored-by: Ian Faust <icfaust@gmail.com>
@david-cortes-intel david-cortes-intel merged commit 614e088 into uxlfoundation:main Oct 14, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Tests for sklearnex/daal4py/onedal4py & patching sklearn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants