-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Uniform backend testing #4115
fix: Uniform backend testing #4115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the main files, but it looks great ! How much longer does it take now to run the CI?
Comparing to another recent CI run (#4172), it looks like all I've done is shuffle how the tests are assigned to shards, and the shard runtimes before and after are still in the same 20-27 minute range. As an aside, it is likely far more tractable to execute a |
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ||
17 | ||
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22] | ||
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Go so different starting from here?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference is due to how Go prints strings when --unicode-char
is false
:
% diff Test/comp/Arrays.dfy.expect Test/comp/Arrays.dfy.go.expect
88c88
< DDDDDDDDDDagggggaggg
---
> [D, D, D, D, D, D, D, D, D, D, a, g, g, g, g, g, a, g, g, g]
That does mean the NONUNIFORM tag on Arrays.dfy
itself is pointing to the wrong issue though. I am second-guessing the decision to have the conversion script point to #4108 by default, I think it would be better to force a manual review of any inconsistent output.
5 List.Nil | ||
List.Cons(0, List.Cons(1, List.Cons(2, List.Cons(3, List.Cons(4, List.Cons(5, List.Cons(6, List.Nil))))))) | ||
0 + 1 + 2 + 3 + 4 + 5 + 6 == 21 (once more, that's 21) | ||
{Berry.Smultron, Berry.Jordgubb, Berry.Hallon} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point here :-)
I think we can also leave the printing of sets unspecified (as in theory, there is no order in set), but have a function setToSeq that takes an order on the elements and transforms it into a sequence. No needed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it's definitely a solvable problem in stdlib code (and we should provide good solutions). For testing we may want to transition to using expect
more and diffing print output less instead.
@@ -0,0 +1,3 @@ | |||
// https://github.com/dafny-lang/dafny/issues/4162 | |||
// CHECK-L: assert "FIFO" == __name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is CHECK-L ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/stp/OutputCheck#check-l-string - like CHECK but looks for a "literal", exact match (aside from leading or trailing whitespace) rather than a regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you confirm that all the tests files were generated by your script except for the ".check" one that I checked manually?
Otherwise, everything looks good so far.
@@ -0,0 +1 @@ | |||
// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others. | |
// CHECK-L: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually wouldn't work since the actual output is:
Test/comp/NativeNumbers.dfy(14,30): Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
Test/comp/NativeNumbers.dfy(15,30): Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
...
CHECK-L requires the whole trimmed line matches, not just part. We could add the source location prefixes, but that would just make this test more fragile, so I opted not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I did not know. In that, case you might just want to escape the dots since it's a regex :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great way to make our compilers more robust! Thanks for doing that.
Converts most existing integration tests that touch backends to run for every supported backend, as a relatively easy way to discover backend-specific bugs and increase testing coverage. Implements a conversion process to rewrite existing testing scripts like:
To the equivalent:
Overview of changes:
DAFNY_INTEGRATION_TESTS_MODE
environmment variable to support these uniformity requirements:uniformity-check
- Verifies that every test that touched backends either uses%testDafnyForEachCompiler
or includes a// RUN: NONUNIFORM: <reason>
command to flag it as intentionally non-uniform. This check is run as asingletons
test in CI.uniformity-convert
- Convert any existing non-uniform tests to use%testDafnyForEachCompiler
, by splitting up the existing<test file>.expect
file into the chunks output by each backend and extracting any common extra options from all test commands. SeeLitTests.ConvertToMultiBackendTestIfNecessary
.LitTests.InvokeMainMethodsDirectly
to an environment variable as well (for consistency and to avoid having to edit source code to use this feature locally)%testDafnyForEachCompiler
to support known inconsistencies/bugs in backends (see updates toTestDafny/README.md
ILitCommand
hierarchy to publicly readable attributes, to support the analysis/conversion logicOutputCheckCommand
implementation to be usable fromTestDafny
To review, I recommend looking first at the README updates, then at the source changes, then spot-checking the test conversions. The last commit is the result of applying the conversion logic to the whole test suite (and I'll try to keep it that way even after feedback edits), so it probably best to look at the diff up until but not including that commit first.
FYI, see testing-method: uniform-backend-testingIssues found by ensuring uniform testing across backends
for the list of 22 bugs discovered by this change (some of which were generally known without explicit issues, and even if explicitly known I still added the label to record the fact that it was pointed out by this technique).
Resolves #4103 (since it was blocking so many tests against Java). I otherwise tried not to be distracted by fixing bugs even if they appeared easy to fix. :)
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.