integration / regression tests that compare images #1788

gerritholl · 2021-08-12T07:38:05Z

Feature Request

Is your feature request related to a problem? Please describe.

If a change in Satpy or one of its dependencies leads to an unintentional change in the produced image, we currently have no automated way of detecting this. If the change is small, we might not notice at all. If the change is large, someone might notice it sooner or later, possibly too late to clearly pinpoint the change to a specific cause. For a tool that is primarily used to produce images, it would be desirable to have systematic acceptance / integration / regression tests.

Describe the solution you'd like

I would like that Satpy runs systematic acceptance / integration / regression tests that do a pixel-per-pixel comparison of produced images compared to reference images.

Describe any changes to existing user workflow

It would change the build process and development workflow. Such tests are probably too heavy to run after each commit on GitHub CI, but they could be combined with the performance tests that are already running on the European Weather Cloud (EWC). Perhaps nightly for the main branch and for non-draft PRs that have had new commits since the last run. If differences are reported, we should then identify whether those differences are expected; if they are, then the new image would becomes the new reference.

I don't foresee any differences to user workflow, except they might get a more stable product and better documentation in case of expected image changes.

Additional context

This idea was triggered by a similar system in NinJo (which reported differences between NinJoTIFF and GeoTIFF images I provided; those turned out to be my fault) and a comment by @djhoese on Slack.

djhoese · 2021-08-12T11:31:30Z

One thing we'd have to decide is if creating a "basic" set of user-facing examples that produce images and running those and doing the comparisons would be enough. Otherwise, we'd have to come up with a full set of comparison tests. In P2G I have behavior (using the behave package) tests that generate output and then run a compare script in polar2grid which extracts out the arrays from the images and using np.is_close with some tolerances to allow for a few pixels to change. It reports things like if the shape of the output changed, what percentage of the pixels are different, if the expected output file was actually generated for this execution, etc.

Another thing we could do to limit how often things are run is if we run them for PRs, run the regular unit tests too. This assumes running the integration tests takes a long time. If the unit tests fail there is no point is running the integration tests.

gerritholl · 2024-12-05T13:37:04Z

Closed by #2999 and #1788

gerritholl closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration / regression tests that compare images #1788

integration / regression tests that compare images #1788

gerritholl commented Aug 12, 2021

djhoese commented Aug 12, 2021

gerritholl commented Dec 5, 2024

integration / regression tests that compare images #1788

integration / regression tests that compare images #1788

Comments

gerritholl commented Aug 12, 2021

Feature Request

djhoese commented Aug 12, 2021

gerritholl commented Dec 5, 2024