Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Uniform backend testing #4115

Merged
merged 75 commits into from
Jun 13, 2023

Conversation

robin-aws
Copy link
Member

@robin-aws robin-aws commented Jun 2, 2023

Converts most existing integration tests that touch backends to run for every supported backend, as a relatively easy way to discover backend-specific bugs and increase testing coverage. Implements a conversion process to rewrite existing testing scripts like:

// RUN: %dafny /compile:0 "%s" > "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:cs "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:js "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:go "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:java "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:py "%s" >> "%t"
// RUN: %diff "%s.expect" "%t"

To the equivalent:

// RUN: %testDafnyForEachCompiler "%s" -- --relax-definite-assignment --spill-translation

Overview of changes:

  • Added DAFNY_INTEGRATION_TESTS_MODE environmment variable to support these uniformity requirements:
    • uniformity-check - Verifies that every test that touched backends either uses %testDafnyForEachCompiler or includes a // RUN: NONUNIFORM: <reason> command to flag it as intentionally non-uniform. This check is run as a singletons test in CI.
    • uniformity-convert - Convert any existing non-uniform tests to use %testDafnyForEachCompiler, by splitting up the existing <test file>.expect file into the chunks output by each backend and extracting any common extra options from all test commands. See LitTests.ConvertToMultiBackendTestIfNecessary.
    • (no value or blank) - executes tests as usual
  • Converted LitTests.InvokeMainMethodsDirectly to an environment variable as well (for consistency and to avoid having to edit source code to use this feature locally)
  • Added functionality to %testDafnyForEachCompiler to support known inconsistencies/bugs in backends (see updates to TestDafny/README.md
  • Converted lots of private state in the ILitCommand hierarchy to publicly readable attributes, to support the analysis/conversion logic
    • Refactored OutputCheckCommand implementation to be usable from TestDafny

To review, I recommend looking first at the README updates, then at the source changes, then spot-checking the test conversions. The last commit is the result of applying the conversion logic to the whole test suite (and I'll try to keep it that way even after feedback edits), so it probably best to look at the diff up until but not including that commit first.

FYI, see testing-method: uniform-backend-testing Issues found by ensuring uniform testing across backends for the list of 22 bugs discovered by this change (some of which were generally known without explicit issues, and even if explicitly known I still added the label to record the fact that it was pointed out by this technique).

Resolves #4103 (since it was blocking so many tests against Java). I otherwise tried not to be distracted by fixing bugs even if they appeared easy to fix. :)

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

This reverts commit d4fcd16.
@robin-aws robin-aws marked this pull request as ready for review June 12, 2023 22:22
@MikaelMayer MikaelMayer self-requested a review June 13, 2023 14:08
Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the main files, but it looks great ! How much longer does it take now to run the CI?

Test/README.md Show resolved Hide resolved
Source/TestDafny/MultiBackendTest.cs Show resolved Hide resolved
Source/TestDafny/README.md Outdated Show resolved Hide resolved
Source/XUnitExtensions/Lit/OutputCheckCommand.cs Outdated Show resolved Hide resolved
Test/VerifyThis2015/Problem3.dfy Show resolved Hide resolved
@robin-aws robin-aws changed the title Uniform backend testing fix: Uniform backend testing Jun 13, 2023
@robin-aws
Copy link
Member Author

robin-aws commented Jun 13, 2023

How much longer does it take now to run the CI?

Comparing to another recent CI run (#4172), it looks like all I've done is shuffle how the tests are assigned to shards, and the shard runtimes before and after are still in the same 20-27 minute range.

As an aside, it is likely far more tractable to execute a MultiBackendTest fully in-process, running the frontend on a program once and then each backend on the resulting resolved and verified program, than it is to switch the more general DAFNY_INTEGRATION_TESTS_IN_PROCESS mode on everywhere. It may be worth doing that to save some CI time after this too!

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
17
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Go so different starting from here?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is due to how Go prints strings when --unicode-char is false:

% diff Test/comp/Arrays.dfy.expect Test/comp/Arrays.dfy.go.expect 
88c88
< DDDDDDDDDDagggggaggg
---
> [D, D, D, D, D, D, D, D, D, D, a, g, g, g, g, g, a, g, g, g]

That does mean the NONUNIFORM tag on Arrays.dfy itself is pointing to the wrong issue though. I am second-guessing the decision to have the conversion script point to #4108 by default, I think it would be better to force a manual review of any inconsistent output.

5 List.Nil
List.Cons(0, List.Cons(1, List.Cons(2, List.Cons(3, List.Cons(4, List.Cons(5, List.Cons(6, List.Nil)))))))
0 + 1 + 2 + 3 + 4 + 5 + 6 == 21 (once more, that's 21)
{Berry.Smultron, Berry.Jordgubb, Berry.Hallon}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point here :-)
I think we can also leave the printing of sets unspecified (as in theory, there is no order in set), but have a function setToSeq that takes an order on the elements and transforms it into a sequence. No needed now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's definitely a solvable problem in stdlib code (and we should provide good solutions). For testing we may want to transition to using expect more and diffing print output less instead.

@@ -0,0 +1,3 @@
// https://github.com/dafny-lang/dafny/issues/4162
// CHECK-L: assert "FIFO" == __name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is CHECK-L ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/stp/OutputCheck#check-l-string - like CHECK but looks for a "literal", exact match (aside from leading or trailing whitespace) rather than a regular expression.

Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you confirm that all the tests files were generated by your script except for the ".check" one that I checked manually?

Otherwise, everything looks good so far.

@@ -0,0 +1 @@
// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
// CHECK-L: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually wouldn't work since the actual output is:

Test/comp/NativeNumbers.dfy(14,30): Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
Test/comp/NativeNumbers.dfy(15,30): Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
...

CHECK-L requires the whole trimmed line matches, not just part. We could add the source location prefixes, but that would just make this test more fragile, so I opted not to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I did not know. In that, case you might just want to escape the dots since it's a regex :-)

Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great way to make our compilers more robust! Thanks for doing that.

@robin-aws robin-aws enabled auto-merge (squash) June 13, 2023 21:27
@robin-aws robin-aws merged commit b87a851 into dafny-lang:master Jun 13, 2023
@robin-aws robin-aws deleted the uniform-backend-testing branch June 13, 2023 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when specifying --split-translation on Java
2 participants