Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration / regression tests that compare images #1788

Closed
gerritholl opened this issue Aug 12, 2021 · 2 comments
Closed

integration / regression tests that compare images #1788

gerritholl opened this issue Aug 12, 2021 · 2 comments

Comments

@gerritholl
Copy link
Member

Feature Request

Is your feature request related to a problem? Please describe.

If a change in Satpy or one of its dependencies leads to an unintentional change in the produced image, we currently have no automated way of detecting this. If the change is small, we might not notice at all. If the change is large, someone might notice it sooner or later, possibly too late to clearly pinpoint the change to a specific cause. For a tool that is primarily used to produce images, it would be desirable to have systematic acceptance / integration / regression tests.

Describe the solution you'd like

I would like that Satpy runs systematic acceptance / integration / regression tests that do a pixel-per-pixel comparison of produced images compared to reference images.

Describe any changes to existing user workflow

It would change the build process and development workflow. Such tests are probably too heavy to run after each commit on GitHub CI, but they could be combined with the performance tests that are already running on the European Weather Cloud (EWC). Perhaps nightly for the main branch and for non-draft PRs that have had new commits since the last run. If differences are reported, we should then identify whether those differences are expected; if they are, then the new image would becomes the new reference.

I don't foresee any differences to user workflow, except they might get a more stable product and better documentation in case of expected image changes.

Additional context

This idea was triggered by a similar system in NinJo (which reported differences between NinJoTIFF and GeoTIFF images I provided; those turned out to be my fault) and a comment by @djhoese on Slack.

@djhoese
Copy link
Member

djhoese commented Aug 12, 2021

One thing we'd have to decide is if creating a "basic" set of user-facing examples that produce images and running those and doing the comparisons would be enough. Otherwise, we'd have to come up with a full set of comparison tests. In P2G I have behavior (using the behave package) tests that generate output and then run a compare script in polar2grid which extracts out the arrays from the images and using np.is_close with some tolerances to allow for a few pixels to change. It reports things like if the shape of the output changed, what percentage of the pixels are different, if the expected output file was actually generated for this execution, etc.

Another thing we could do to limit how often things are run is if we run them for PRs, run the regular unit tests too. This assumes running the integration tests takes a long time. If the unit tests fail there is no point is running the integration tests.

@gerritholl
Copy link
Member Author

Closed by #2999 and #1788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants