Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add xfail_windows decorator for failing tests on Windows #4362

Merged
merged 46 commits into from
Sep 22, 2024

Conversation

echoix
Copy link
Member

@echoix echoix commented Sep 22, 2024

This PR is a big step towards being able to safely refactor tests by having almost all tests (see footnote) being ran and checked on Windows, and any failure making the tests fail. The success in number of files is now of 96.8%.

The approach used here is to make all current Windows test failures being expected failures, thus allowing to re-enable all excluded test files. It also allows for files where only one or more tests were failing, to now be non failing and thus have all the other working tests be required to be passing.
If ever a previously failing test with an expected failure will end up passing, it will count as a failure as it is an unexpected success. This is important, as it is what allows us to know when we impacted a test. Skipping a complete file isn’t an appropriate solution, as we won’t know the results of all the other good tests of the file.

Plain unittest can have conditions in the decorator skip or skipIf, but none exists for expected failures. So, a new decorator was created in the utils file, that wraps the logic, and adds a warning with explanations when the test ends up applying the expected failure.

The approach to fixing the tests is to find a usage of the xfail_windows decorator, fix the test, and remove the decorator. If we happen to modify the behavior of something that ends up making a Windows test pass (fails as it is an unexpected success), then we also remove the xfail_windows decorator, and after that this test is not allowed to fail any more.

We will now be way more in control of what affects our tests on Windows.

Side note: I have some great progress on running Pytest on the Windows CI, it now runs, but still 15 failures, but not as much as when things like cannot load grass_gis.8.5 library were happening. So it’s much better.


Footnotes:

  • Some of the failures, that aren’t failures, but errors, are happening in the setUpClass method, and error out before even being able to run a test and apply a decorator to it. So, it still fails.
  • I don’t know how to apply an expected failure to a doctest, so they either still fail or end up being excluded in a new Windows-only .gunittest.cfg file.
  • Tests in shell scripts don’t work exactly like unittest tests, so they can’t be individually marked, so either the entire file is excluded and never runs, or stays failing.
  • Since the success percent (in number of files) is an integer, and we got 96.8%, I have to set 96% as the minimum. (As the condition is >=) This still allows one file to vary without us noticing.

echoix and others added 30 commits September 18, 2024 19:45
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@echoix echoix added this to the 8.5.0 milestone Sep 22, 2024
@github-actions github-actions bot added GUI wxGUI related CI Continuous integration vector Related to vector data processing raster Related to raster data processing temporal Related to temporal data processing Python Related code is in Python libraries module general imagery tests Related to Test Suite notebook labels Sep 22, 2024
@echoix
Copy link
Member Author

echoix commented Sep 22, 2024

I’m kind of proud of this one. Now we really start working on something. Like individually integrating changes from #2616, knowing for sure what changes impact what tests (because we would remove the decorator one by one).

@echoix
Copy link
Member Author

echoix commented Sep 22, 2024

Another important advantage: On Windows, it is the only place where v.in.pdal tests are run at all, since they are excluded on macOS and Linux, and it is the only place where they are all passing. After this PR, it will then be enforced to always pass.

@wenzeslaus
Copy link
Member

v.in.pdal tests have @unittest.skipIf(shutil.which("v.in.pdal") is None, "Cannot find v.in.pdal"). I expect that there is no v.in.pdal on Windows (--without-pdal).

Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great progress!

python/grass/gunittest/utils.py Show resolved Hide resolved
@wenzeslaus
Copy link
Member

  • I don’t know how to apply an expected failure to a doctest, so they either still fail or end up being excluded in a new Windows-only .gunittest.cfg file.

They already have to be handled in a special way, and it is not clear to me what exactly would be the unit to disable (documentation block?).

Originally, there was an idea to use doctest for testing, now perhaps it more for testing that the examples in the documentation are up to date and for that, run on one platform would be enough. I think we didn't have much discussion about that lately.

  • Tests in shell scripts don’t work exactly like unittest tests, so they can’t be individually marked, so either the entire file is excluded and never runs, or stays failing.

While at some point there was an idea that one shell script can be more than one test, I think these are the "simple" tests, so one script - one test is what it should be, no need to disable on a finer level.

It is not clear to me how to approach the scripts from cross-platform perspective. My idea for the functionality was to accommodate existing test scripts in Bash or sh and people who want to write tests as shell scripts.

  • Since the success percent (in number of files) is an integer, and we got 96.8%, I have to set 96% as the minimum.

Why don't you exclude the failing files, then you don't have to set success percent anymore. The success percent setting was just trying to compensate for the missing file exclusion feature. Now that's in place (and even the better per-function expected failures on Windows), so I don't see a reason to use it anymore.

@echoix
Copy link
Member Author

echoix commented Sep 22, 2024

v.in.pdal tests have @unittest.skipIf(shutil.which("v.in.pdal") is None, "Cannot find v.in.pdal"). I expect that there is no v.in.pdal on Windows (--without-pdal).

Oh, now I see.. I was sooo surprised it didn't complain at all..

@echoix
Copy link
Member Author

echoix commented Sep 22, 2024

  • I don’t know how to apply an expected failure to a doctest, so they either still fail or end up being excluded in a new Windows-only .gunittest.cfg file.

They already have to be handled in a special way, and it is not clear to me what exactly would be the unit to disable (documentation block?).

Originally, there was an idea to use doctest for testing, now perhaps it more for testing that the examples in the documentation are up to date and for that, run on one platform would be enough. I think we didn't have much discussion about that lately.

I didn't know either. That's why I kept it mostly out of scope for here. But at the same time, we were able in the recent past to see some failures that were mostly unique here, as the packages in OSGeo4W are mostly very close to the latest releases, very early on. So all the failures related to the latest Python, the latest numpy having different output, or the GEOS format changing, we all see them first here technically if it runs.

  • Tests in shell scripts don’t work exactly like unittest tests, so they can’t be individually marked, so either the entire file is excluded and never runs, or stays failing.

While at some point there was an idea that one shell script can be more than one test, I think these are the "simple" tests, so one script - one test is what it should be, no need to disable on a finer level.

It is not clear to me how to approach the scripts from cross-platform perspective. My idea for the functionality was to accommodate existing test scripts in Bash or sh and people who want to write tests as shell scripts.

I also understand that the goal was to accommodate existing tests. On windows, it's harder, as we need to have msys just for that. I did a lot of iterations (not exactly on this), but using the already packaged OSGeo4W grass-dev and running tests from the repo towards the installed package. Apart from saving me almost 15 minutes per iteration, it also separated the possible error causes. In these, I struggled a bit on finding why the * didn't the shell tests weren't picking up everything that was supposed to be set. It turns out the second build osgeo4w shell script, only used in our current CI diverged a bit from OSGeo4W and did a little bit of magic and added msys2 paths everywhere so it was tightly coupled and ended up working. I found a way more limited way of placing only the needed PATHs at one place, and works fine. But all that to say, shell script tests are a little pain cross-platform and should be avoided at some point, and are way slower.

  • Since the success percent (in number of files) is an integer, and we got 96.8%, I have to set 96% as the minimum.

Why don't you exclude the failing files, then you don't have to set success percent anymore. The success percent setting was just trying to compensate for the missing file exclusion feature. Now that's in place (and even the better per-function expected failures on Windows), so I don't see a reason to use it anymore.
Because not having them run at all isn't better, and we see that it's way easier to let them be excluded forever.

Once merged, I'll see if it's reasonable to place a floating point variable instead. It will just make it easier. Once merged, it's easier and safer to do refactorings on the test side. Pytest running unittests are getting closer. Last time I checked, it was mostly the part where test data files were copied and made available to the test cases that was missing. Most of the rest was working fine. So playing with the gunittest reporting won't be needed at that point. So its not worth taking toooooo much time on that.

@echoix
Copy link
Member Author

echoix commented Sep 22, 2024

After responding to all the comments of @wenzeslaus, I think it is safe to assume that no changes will be required for it in this PR, so I'll go forward with it.

@echoix echoix merged commit 7b4a240 into OSGeo:main Sep 22, 2024
28 checks passed
@neteler neteler modified the milestone: 8.5.0 Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration general GUI wxGUI related imagery libraries module notebook Python Related code is in Python raster Related to raster data processing temporal Related to temporal data processing tests Related to Test Suite vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants