fix: Uniform backend testing #4115

robin-aws · 2023-06-02T04:29:50Z

Converts most existing integration tests that touch backends to run for every supported backend, as a relatively easy way to discover backend-specific bugs and increase testing coverage. Implements a conversion process to rewrite existing testing scripts like:

// RUN: %dafny /compile:0 "%s" > "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:cs "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:js "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:go "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:java "%s" >> "%t"
// RUN: %dafny /noVerify /compile:4 /spillTargetCode:2 /compileTarget:py "%s" >> "%t"
// RUN: %diff "%s.expect" "%t"

To the equivalent:

// RUN: %testDafnyForEachCompiler "%s" -- --relax-definite-assignment --spill-translation

Overview of changes:

Added DAFNY_INTEGRATION_TESTS_MODE environmment variable to support these uniformity requirements:
- uniformity-check - Verifies that every test that touched backends either uses %testDafnyForEachCompiler or includes a // RUN: NONUNIFORM: <reason> command to flag it as intentionally non-uniform. This check is run as a singletons test in CI.
- uniformity-convert - Convert any existing non-uniform tests to use %testDafnyForEachCompiler, by splitting up the existing <test file>.expect file into the chunks output by each backend and extracting any common extra options from all test commands. See LitTests.ConvertToMultiBackendTestIfNecessary.
- (no value or blank) - executes tests as usual
Converted LitTests.InvokeMainMethodsDirectly to an environment variable as well (for consistency and to avoid having to edit source code to use this feature locally)
Added functionality to %testDafnyForEachCompiler to support known inconsistencies/bugs in backends (see updates to TestDafny/README.md
Converted lots of private state in the ILitCommand hierarchy to publicly readable attributes, to support the analysis/conversion logic
- Refactored OutputCheckCommand implementation to be usable from TestDafny

To review, I recommend looking first at the README updates, then at the source changes, then spot-checking the test conversions. The last commit is the result of applying the conversion logic to the whole test suite (and I'll try to keep it that way even after feedback edits), so it probably best to look at the diff up until but not including that commit first.

FYI, see testing-method: uniform-backend-testing Issues found by ensuring uniform testing across backends for the list of 22 bugs discovered by this change (some of which were generally known without explicit issues, and even if explicitly known I still added the label to record the fact that it was pointed out by this technique).

Resolves #4103 (since it was blocking so many tests against Java). I otherwise tried not to be distracted by fixing bugs even if they appeared easy to fix. :)

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

…form-backend-testing

…ally

This reverts commit 6f4d1c1.

…form-backend-testing

This reverts commit 4417fcf.

This reverts commit d4fcd16.

This reverts commit 35d2efd.

This reverts commit 3ab861b.

MikaelMayer

I reviewed the main files, but it looks great ! How much longer does it take now to run the CI?

Test/README.md

Source/DafnyCore/Compilers/Cplusplus/Compiler-cpp.cs

Source/DafnyCore/Compilers/Java/JavaBackend.cs

Source/TestDafny/MultiBackendTest.cs

Source/TestDafny/README.md

Source/XUnitExtensions/Lit/OutputCheckCommand.cs

Test/VerifyThis2015/Problem3.dfy

This reverts commit bc53898.

…form-backend-testing

…into uniform-backend-testing

robin-aws · 2023-06-13T18:24:18Z

How much longer does it take now to run the CI?

Comparing to another recent CI run (#4172), it looks like all I've done is shuffle how the tests are assigned to shards, and the shard runtimes before and after are still in the same 20-27 minute range.

As an aside, it is likely far more tractable to execute a MultiBackendTest fully in-process, running the frontend on a program once and then each backend on the resulting resolved and verified program, than it is to switch the more general DAFNY_INTEGRATION_TESTS_IN_PROCESS mode on everywhere. It may be worth doing that to save some CI time after this too!

MikaelMayer · 2023-06-13T18:40:12Z

Test/comp/Arrays.dfy.go.expect

+0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 
+17
+[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
+[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]


Why is Go so different starting from here?!

The only difference is due to how Go prints strings when --unicode-char is false:

% diff Test/comp/Arrays.dfy.expect Test/comp/Arrays.dfy.go.expect 88c88 < DDDDDDDDDDagggggaggg --- > [D, D, D, D, D, D, D, D, D, D, a, g, g, g, g, g, a, g, g, g]

That does mean the NONUNIFORM tag on Arrays.dfy itself is pointing to the wrong issue though. I am second-guessing the decision to have the conversion script point to #4108 by default, I think it would be better to force a manual review of any inconsistent output.

Test/comp/ByMethodCompilation.dfy.expect

MikaelMayer · 2023-06-13T19:11:36Z

Test/comp/firstSteps/7_Dt_Algebraic.dfy.go.expect

+5 List.Nil
+List.Cons(0, List.Cons(1, List.Cons(2, List.Cons(3, List.Cons(4, List.Cons(5, List.Cons(6, List.Nil)))))))
+0 + 1 + 2 + 3 + 4 + 5 + 6 == 21 (once more, that's 21)
+{Berry.Smultron, Berry.Jordgubb, Berry.Hallon}


I see your point here :-)
I think we can also leave the printing of sets unspecified (as in theory, there is no order in set), but have a function setToSeq that takes an order on the elements and transforms it into a sequence. No needed now.

Yup, it's definitely a solvable problem in stdlib code (and we should provide good solutions). For testing we may want to transition to using expect more and diffing print output less instead.

MikaelMayer · 2023-06-13T19:15:57Z

Test/exports/FIFO.dfy.py.check

@@ -0,0 +1,3 @@
+// https://github.com/dafny-lang/dafny/issues/4162
+// CHECK-L: assert "FIFO" == __name__


What is CHECK-L ?

https://github.com/stp/OutputCheck#check-l-string - like CHECK but looks for a "literal", exact match (aside from leading or trailing whitespace) rather than a regular expression.

MikaelMayer

Do you confirm that all the tests files were generated by your script except for the ".check" one that I checked manually?

Otherwise, everything looks good so far.

MikaelMayer · 2023-06-13T19:20:21Z

Test/unicodechars/comp/NativeNumbers.dfy.js.check

@@ -0,0 +1 @@
+// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.


Suggested change

// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.

// CHECK-L: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.

That actually wouldn't work since the actual output is:

Test/comp/NativeNumbers.dfy(14,30): Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others. Test/comp/NativeNumbers.dfy(15,30): Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others. ...

CHECK-L requires the whole trimmed line matches, not just part. We could add the source location prefixes, but that would just make this test more fragile, so I opted not to.

Ah I did not know. In that, case you might just want to escape the dots since it's a regex :-)

This reverts commit 3d08f4d.

MikaelMayer

Great way to make our compilers more robust! Thanks for doing that.

robin-aws added 30 commits May 26, 2023 10:07

Start on analysis script

431d723

Progress on conversion logic

558ca9c

Merge branch 'master' of https://github.com/dafny-lang/dafny into uni…

e4b77db

…form-backend-testing

More script

5d0c398

Improve conversion script, fix a Java bug

597891c

Handling more options

1b735fa

Marking a couple of intentionally non-uniform tests

20f704c

Add support for backend-specific exceptions to MultiBackendTest

786bd6a

More exceptions

987250c

Use environment variables to configure, generate exceptions automatic…

ced635e

…ally

Better error message

a68fc85

Handle warnings

7949f08

Fix for first chunk

98fd743

Converting all tests

6f4d1c1

Revert "Converting all tests"

aa45d59

This reverts commit 6f4d1c1.

Merge branch 'master' of https://github.com/dafny-lang/dafny into uni…

31f2efc

…form-backend-testing

Converting all tests

4417fcf

Revert "Converting all tests"

cfa3c68

This reverts commit 4417fcf.

Manually marking nonuniform tests, converting some special cases

7e8163f

Restore .gitmodules

422f950

Convert all tests

d4fcd16

Build warning and whitespace

543d99e

Exceptions for firstSteps tests

e21c2d9

Tweaks

35964c2

Revert "Convert all tests"

f59314e

This reverts commit d4fcd16.

Fix verifier chunk extraction

2c043c2

A few more options translations and nonuniform tests

eda93bb

Converting all tests

35d2efd

Revert "Converting all tests"

e59a21d

This reverts commit 35d2efd.

Manually fix line number for NONUNIFORM test

3fc649a

robin-aws added 4 commits June 12, 2023 15:17

Revert "Convert all tests"

275d651

This reverts commit 3ab861b.

Unbreak test

485d856

Convert all tests

bc53898

Merge branch 'master' into uniform-backend-testing

5fc6f3e

robin-aws marked this pull request as ready for review June 12, 2023 22:22

MikaelMayer self-requested a review June 13, 2023 14:08

MikaelMayer reviewed Jun 13, 2023

View reviewed changes

robin-aws changed the title ~~Uniform backend testing~~ fix: Uniform backend testing Jun 13, 2023

robin-aws added 5 commits June 13, 2023 10:37

Revert "Convert all tests"

33e80b1

This reverts commit bc53898.

PR feedback

70c4be2

Merge branch 'master' of https://github.com/dafny-lang/dafny into uni…

7bbbcfb

…form-backend-testing

Merge branch 'uniform-backend-testing' of github.com:robin-aws/dafny …

c5f26cc

…into uniform-backend-testing

Convert all tests

3d08f4d

MikaelMayer reviewed Jun 13, 2023

View reviewed changes

Test/comp/ByMethodCompilation.dfy.expect Show resolved Hide resolved

MikaelMayer reviewed Jun 13, 2023

View reviewed changes

robin-aws added 5 commits June 13, 2023 12:54

Revert "Convert all tests"

e09c231

This reverts commit 3d08f4d.

Force manually specifying the reason for tests with inconsistent output

6fc0364

Convert all tests

8852b25

Filling in NONUNIFORM reasons

788b199

Escape periods in CHECK directives

e9d0be9

MikaelMayer approved these changes Jun 13, 2023

View reviewed changes

robin-aws enabled auto-merge (squash) June 13, 2023 21:27

robin-aws merged commit b87a851 into dafny-lang:master Jun 13, 2023

robin-aws deleted the uniform-backend-testing branch June 13, 2023 22:11

robin-aws mentioned this pull request Jun 15, 2023

Document minimum requirements for compiled code in each language #1983

Open

robin-aws mentioned this pull request Jun 26, 2023

Run all applicable tests for all target languages #632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Uniform backend testing #4115

fix: Uniform backend testing #4115

robin-aws commented Jun 2, 2023 •

edited

Loading

MikaelMayer left a comment

robin-aws commented Jun 13, 2023 •

edited

Loading

MikaelMayer Jun 13, 2023

robin-aws Jun 13, 2023

MikaelMayer Jun 13, 2023

robin-aws Jun 13, 2023

MikaelMayer Jun 13, 2023

robin-aws Jun 13, 2023

MikaelMayer left a comment

MikaelMayer Jun 13, 2023

robin-aws Jun 13, 2023

MikaelMayer Jun 13, 2023

MikaelMayer left a comment

		@@ -0,0 +1,3 @@
		// https://github.com/dafny-lang/dafny/issues/4162
		// CHECK-L: assert "FIFO" == __name__

		@@ -0,0 +1 @@
		// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.

	// CHECK: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.
	// CHECK-L: Error: None of the types given in :nativeType arguments is supported by the current compilation target. Try supplying others.

fix: Uniform backend testing #4115

fix: Uniform backend testing #4115

Conversation

robin-aws commented Jun 2, 2023 • edited Loading

MikaelMayer left a comment

Choose a reason for hiding this comment

robin-aws commented Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikaelMayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikaelMayer left a comment

Choose a reason for hiding this comment

robin-aws commented Jun 2, 2023 •

edited

Loading

robin-aws commented Jun 13, 2023 •

edited

Loading