-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Cram for tests #787
Use Cram for tests #787
Conversation
I think cram is a really useful tool for testing in general, but I am not really a big fan of using it here, mainly for the following reasons:
To sum it up, I think this type of testing is a step in the wrong direction, and I'd suggest we stick with the semantic testing that we are doing currently. |
What are your stances here @vesalvojdani @stilscher @jerhard ? |
Anything that is not an intentional change to the output should not randomly change the output and Cram tests catch any such change that we currently have no way of testing. For example, changing the solver shouldn't change the set of warnings we produce. It should only change when some solver unsoundness is fixed (so it's intentional) or some solver unsoundness is introduced (when it's accidental).
Given that catastrophic output changes should not happen, not necessarily. Cram tests scale to much bigger projects than Goblint, including dune itself. If the outputs are too massive to manually understand for our tiny regression test programs, then it clearly highlights the usability problem of our analyzer. This would actually encourage us to consider the actual outputs, which should concisely list all the warnings and no other garbage (e.g. spurious
And that's precisely the point, there shouldn't be new debug output at all if the particular test case doesn't explicitly
There is no reason to remove any of the existing testing infrastructure though, so all the existing semantic checks would still be done (and are being done on this branch). Moreover, we have an unknown number of "regression" tests that actually check absolutely nothing and whose output is supposed to be manually inspected. The latest such testing being added in #785. Nobody is ever manually re-running those and for many of them it's not even clear what the expected output should be, making it impossible to make any conclusions from a manual run as well. This actually is a good point I initially didn't consider myself: instead of porting all existing testing to Cram, we leave them as-is, but introduce Cram tests for things that we otherwise cannot automatically test for. |
In GitHub Actions vid-s differ.
Yet another thing Cram tests would be really useful for is witness generation, both YAML and GraphML, which are currently completely untested. It would just need an option for slightly filtered output, which doesn't include timestamps and version numbers to be stable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the scope where they now have been introduced, it makes sense to have cram
tests!
Here's a wild idea I toyed with, which we can either make complete or just close this PR.
Basically, dune's Cram tests are designed for testing executables and could have some nice benefits for us:
// PARAM:
magic, those are directly in the Cram test command line.dune promote
automatically takes the current output and adds it as expected. Of course this should only be done if the output change is manually checked to be correct.// PARAM:
. Just add multiple goblint commands into the same test!