Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset GLOBAL_RNG state in/out at-testset for reproducibility #24445

Merged
merged 1 commit into from
Dec 17, 2017

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented Nov 2, 2017

This is a follow-up on the recently introduced guardsrand functionality,
first suggested at
#16940 (comment).
Each testset block is now implicitly wrapped in a guardsrand block, which
is more user-friendly and more systematic (only few users know about
guardsrand).

These are essentially two new features:

  1. "in": in base, tests are run with the global RNG randomly seeded,
    but the seed is printed in case of failure to allow reproducing the
    failure; but even if the failure occurs after many tests, when they
    alter the global RNG state, one has to re-run the whole file, which
    can be time-consuming; with this change, at the beginning of each
    testset, the global RNG is re-seeded with its own seed: this
    allows to re-run only the failing testset, which can be done
    easily in the REPL (the seeding occurs for each loop in a testset for;
    this also allows to re-arrange testsets in arbitrary order
    w.r.t. the global RNG;

  2. "out": a testset leaves no tracks of its use of srand or rand
    (this "feature" should be less and less needed/useful with the
    generalization of the use of testset blocks).

Example:

@testset begin
   srand(123)
   rand()
   @testset for T in (Int, Float64)
       # here is an implicit srand(123), by 1)
       rand()
   end
   rand() # this value will not be affected if the sub-`testset` block
          # above is removed, or if another rand() call is added therein, by 2)
end

By wrapping @testset bodies in a guardsrand do block, the re-use of a local variable as a for-loop variable started to generate a warning (in "test/sparse/sparse.jl"). It's easy to silence the warning, but I don't know why there was no warning before, and I wonder if this wrapping can have other unwanted consequences. If needed, it would be easy to copy/paste code to avoid the wrapping. EDIT: guardsrand can't be used directly, as then the testset's body is wrapped in a function, which causes problem with overwriting local variable in for loop (outer), using using, defining new methods, etc.

Apologies for the big diffs, github is not always good at spotting indentation-only changes.

@rfourquet rfourquet added the testsystem The unit testing framework and Test stdlib label Nov 2, 2017
@fredrikekre
Copy link
Member

Apologies for the big diffs, github is not always good at spotting indentation-only changes.

Append ?w=1 to the url: https://github.com/JuliaLang/julia/pull/24445/files?w=1

@rfourquet
Copy link
Member Author

I still have a failing test, at "test/linalg/arnoldi.jl:103". I had to set a fresh random seed, which works well on a 64 bits machine, but fails on 32 bits ones. Would someone with easy access to both architectures help me? Run this snippet with changin the srand seed (123 here) until it passes on both (I could use CI, but this is a waste of ressources which I'm relunctant to commit).

@testset "Symmetric generalized with singular B" begin
            srand(123)
            n = 10
            k = 3
            A = randn(n,n); A = A'A
            B = randn(n,k); B = B*B'
            @test sort(eigs(A, B, nev = k, sigma = 1.0)[1]) ≈ sort(eigvals(A, B)[1:k])
        end

@yuyichao
Copy link
Contributor

yuyichao commented Nov 2, 2017

AFAICT 124 works on x86, x64, armv7-a and aarch64

@rfourquet
Copy link
Member Author

AFAICT 124 works on x86, x64, armv7-a and aarch64

thanks a lot!

@rfourquet
Copy link
Member Author

Any opinion on this one? I find it pretty neat to have this easier reproducibility at the @testset level, but maybe I'm missing a flaw in this approach?

@rfourquet
Copy link
Member Author

Bump.

@StefanKarpinski
Copy link
Member

I'm not really sure what needs to be reviewed here. This just seems to be a broader application in test files of the guardsrand thing that you previously introduced? Tests all passed. What's the risk here? If none and you find this to be a better way to write these tests, then why not?

@StefanKarpinski
Copy link
Member

Oh I see. This automatically does the equivalent of guardsrand in testsets which is a significant change of behavior to the Test infrastructure. This seems like a potentially good idea, but it's definitely something that needs to be documented and probably discussed a bit more.

@rfourquet
Copy link
Member Author

Yes my bump is mainly to generate feed-back and opinions on the idea, as the implementation is rather straightforward. I think it's a good change but I'm not confident enough to take the decision to merge without some support, I may miss some drawback. Essentially this implements what you suggested there, but a bit differently: when a base test fails, the RNG seed is printed, and with this change, to reproduce the failure you only have to do srand(SEED), and then copy-paste the failing testset in your REPL without bothering for the preceding tests (except if they setup some other global state).

@rfourquet
Copy link
Member Author

@andreasnoack @kshyatt do you have opinions on this?

@rfourquet rfourquet added the triage This should be discussed on a triage call label Dec 13, 2017
@rfourquet
Copy link
Member Author

I put the triage label for getting a decision (it's slightly breaking), but I would really like to get this in, as it makes debugging much easier when you get a test error (I had such a use case today). Small recap for busy triage people: the test system prints the RNG seed when there is an error; with this PR, you can call srand with this seed, and then copy paste in your REPL only the failing @testset to reproduce the error, without having to re-run all previous tests in the file (which can use rand themselves and would modify the global RNG state and affect all remaining tests, without this PR).

@StefanKarpinski
Copy link
Member

I defer to your judgement here, @rfourquet, go ahead and just rebase this, run CI and we'll merge when it passes.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Dec 14, 2017
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Dec 14, 2017
@rfourquet rfourquet force-pushed the rf/testset-seed branch 4 times, most recently from 81b143e to 47a21f5 Compare December 17, 2017 09:00
[ci skip]
[av skip]
[bsd skip]

This is a follow-up on the recently introduced `guardsrand` functionality,
first suggested at
#16940 (comment).
Each `testset` block is now implicitly wrapped in a `guardsrand` block, which
is more user-friendly and more systematic (only few users know about
`guardsrand`).

These are essentially two new features:

1) "in": in base, tests are run with the global RNG randomly seeded,
   but the seed is printed in case of failure to allow reproducing the
   failure; but even if the failure occurs after many tests, when they
   alter the global RNG state, one has to re-run the whole file, which
   can be time-consuming; with this change, at the beginning of each
   `testset`, the global RNG is re-seeded with its own seed: this
   allows to re-run only the failing `testset`, which can be done
   easily in the REPL (the seeding occurs for each loop in a `testset for`;
   this also allows to re-arrange `testset`s in arbitrary order
   w.r.t. the global RNG;

2) "out": a `testset` leaves no tracks of its use of `srand` or `rand`
   (this "feature" should be less and less needed/useful with the
   generalization of the use of `testset` blocks).

Example:
```
@testset begin
   srand(123)
   rand()
   @testset for T in (Int, Float64)
       # here is an implicit srand(123), by 1)
       rand()
   end
   rand() # this value will not be affected if the sub-`testset` block
          # above is removed, or if another rand() call is added therein, by 2)
end
```

Note that guardsrand can't be used directly, as then the testset's body is
wrapped in a function, which causes problem with overwriting loop variable
("outer"), using `using`, defining new methods, etc. So we need to duplicate
guardsrand's logic.
@rfourquet
Copy link
Member Author

CI all green, but unfortunately one conflict: it's trivial to resolve (rename a couple copy! to copyto!), so I will re-push the updated branch with [ci skip] and merge right away.
Here are the links to the builds for the record:

@rfourquet rfourquet merged commit a2c97c8 into master Dec 17, 2017
@rfourquet rfourquet deleted the rf/testset-seed branch December 17, 2017 17:35
@timholy
Copy link
Member

timholy commented Dec 20, 2017

One negative is that it breaks tests that look like this (observed in Revise.jl):

julia> using Test

julia> @testset begin
           workdir = joinpath(tempdir(), randstring(10))
           @show workdir
           mkdir(workdir)
       end

       @testset begin
           workdir = joinpath(tempdir(), randstring(10))
           @show workdir
           mkdir(workdir)
       end
workdir = "/tmp/xrOdLWavRh"
Test Summary: |
test set      | No tests
workdir = "/tmp/xrOdLWavRh"
test set: Error During Test at REPL[2]:7
  Got an exception of type SystemError outside of a @test
  SystemError (with /tmp/xrOdLWavRh): mkdir: File exists
  Stacktrace:
   [1] #systemerror#44(::String, ::Function, ::Symbol, ::Bool) at ./error.jl:106
   [2] #systemerror at ./<missing>:0 [inlined]
   [3] mkdir(::String, ::UInt16) at ./file.jl:99
   [4] mkdir(::String) at ./file.jl:94
   [5] macro expansion at ./REPL[2]:10 [inlined]
   [6] macro expansion at /home/tim/src/julia-1.0/usr/share/julia/site/v0.7/Test/src/Test.jl:978 [inlined]
   [7] top-level scope at ./REPL[2]:8
   [8] eval at ./boot.jl:291 [inlined]
   [9] eval(::Module, ::Expr) at ./repl/REPL.jl:3
   [10] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./repl/REPL.jl:70
   [11] macro expansion at /home/tim/.julia/v0.7/Revise/src/Revise.jl:482 [inlined]
   [12] (::getfield(Revise, Symbol("##15#16")){Base.REPL.REPLBackend})() at ./event.jl:92
Test Summary: | Error  Total
test set      |     1      1
ERROR: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.

@rfourquet
Copy link
Member Author

I see. A sort of mental model about the way of this PR is that each time you enter a @testset, you get a reset of the global RNG set, like if you would call twice in a row a C program without calling srand in it. Would it be a problem in your case to either call srand() at the begining of the @testset, or, if you'd prefer to keep reproducibility except for the randstring call, to call randstring(RandomDevice()) ? Yet another possibility would be to share an RNG object M between the 2 @testsets and to call randstring(M, 10). Hope this helps.

@StefanKarpinski
Copy link
Member

Would it make sense to make the different test sets independent but not have the same seed? I.e. generating more or less random numbers from one test set should not affect another one, but they they would not all start with the same seed. I guess the trouble then is how to pick the seed in a stable way. One option would be to base it on the testset name. Of course, in this case that wouldn't help.

@KristofferC
Copy link
Member

Something with the location info of the testset? Of course it will change if you move the testset but if you aren't calling srand then you shouldn't rely on getting specific numbers anyway.

@rfourquet
Copy link
Member Author

I have no idea yet for a solution. I think it would help to see the Revise problem or a specific instance where the current (new) behavior is a problem (which can't be solved easily). The testset name could inded work in some cases. But I also thought about the location, but this is exactly the problem this PR was trying to solve: to be able to re-arrange your testsets in any order without having your test depending on the global RNG affected, and even to paste it the REPL.

@StefanKarpinski
Copy link
Member

A hash of the test-set name with a per-name counter might be pretty good since test sets with different names would be isolated from each other and test sets with the same name would not all have the same seed, but their RNG streams would only be determined by their order.

@StefanKarpinski
Copy link
Member

Location info seems bad since adding or removing a line elsewhere in the file would change it. In that case, you may as well just always use different random seeds but choose a new random seed per testset so that seeding in one doesn't have the effect of inadvertently seeding the following ones.

@rfourquet
Copy link
Member Author

But I would like to understand what use case we are trying to solve exactly, which is not addressed by the work-around I mentioned to Tim.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 20, 2017

To do the right thing more often by default – or at least to do something that's not surprising. If someone writes two testset blocks with no random seeds, I think they'd find it rather surprising that they both get the same sequence of random values.

@timholy
Copy link
Member

timholy commented Dec 20, 2017

It's not hard to work around, see timholy/Revise.jl@109cbd9. I could have alternatively used srand---I actually forgot you can call it without an argument, so based on that mistaken impression I decided to make sure the sequence would be different if I ran my tests twice (which it would not be if I used srand(i) with hard-coded i).

I think the only question is, what's more commonly the desired behavior? For example, should we view the fact that every @testset will generate the same random sequence as a strength or a weakness? (I don't think there's a clear answer here.) On one hand, preventing spurious failures is nice. On the other hand, I used to be able to do for i = 1:100; include("runtests.jl"); end and have extra confidence that there weren't corner cases that would cause test failures. Now that strategy will likely fail to reveal those corner cases, because each one of those iterates will use the exact same random numbers.

@StefanKarpinski
Copy link
Member

What is the benefit of each testset starting with the same seed? Serious question.

@rfourquet
Copy link
Member Author

I must say that I already started to enjoy the new behavior. For example at (55852ee), I feel super confident with this fuzzy tests (not sure it's the right name): it will be run by many different machines, with differenct seeds. And if a user finds a failure, she will give me the printed seed, and I will just be able to reproduce the failure by copy-pasting this single testset, without bothering with all what comes before in the file

For your example with the loop, it's very simple to work-around:

for i = 1:100
    srand(i)
    include("runtests.jl")
end

The big win is if that fails at iteration 90, you can easily start to debug directly at iteration 90.

@StefanKarpinski
Copy link
Member

@rfourquet, can you spell out what the benefit is? I think putting it in general words (not examples or use cases) will help me at least to figure this situation out.

@timholy
Copy link
Member

timholy commented Dec 20, 2017

Worth separating two points:

  • what's the benefit of printing the seed upon failure? (seems quite large!)
  • what's the benefit of resetting the seed with each test set? (less sure about this one)

If these are coupled, I don't understand why.

@rfourquet
Copy link
Member Author

can you spell out what the benefit is? (not examples or use cases)

It's tough question as the main motivation was to improve some use cases of the debugging experience. I will use Tim's separate questions as a starting point (at the risk of repeating myself)

what's the benefit of printing the seed upon failure?

Printing the seed upon failure on base tests, or seeding manually the global RNG before running some tests, allows (except in corner cases, e.g using RandomDevice), in case of failure of the test, to reproduce the failure which helps investigate the reason of it. Basically the alternative till now (seen in some places of base tests) was to run the tests with srand(fixed_seed) at the top of the file to be sure to be able to reproduce a failure. This is extremly limiting. By choosing randomly a seed each time the testsuite is run, and doing implicitly srand(seed) at the top of the test file, we can test algorithms with a wide variety of different inputs, without loosing the safety to track down a failure.

what's the benefit of resetting the seed with each test set?

There can be a benefit only if the tester has a way to know the seed. So it's coupled to your first item. The idea is to insulate each testset from each other. If a testset needs a particular seed to work correctly (like it happens a lot in linalg tests for example), it should state it explicitly. If it depends implicitly on the fact that a testset 20 lines above had a srand(123) in it, then they are coupled and it's more difficult to reproduce the failure.
Let's take a step back: if the global RNG was seeded only once at the beginning of running the whole base test suite (let's assume all files are run sequencially), and there is a failure in testfile 20, it's a nightmare to reproduce the failure, as to re-create the state of the RNG which caused the failure, there is no way around going through all the steps again, which can take many minutes. So my view was that testsets are to a testfile what a testfile is to the whole testsuite. You know the seed, so you know the state of the global RNG when the testset starts.

@StefanKarpinski
Copy link
Member

What I don't understand is why all test sets need the same seed. Why not just give each testset a different seed but make them independent of each other? If a testset fails, print the seed it failed with so that we know how to reproduce that particular testset failure. Printing this could be conditional on whether the testset has used any random values or not, to avoid printing irrelevant information for non-random testset failures.

@rfourquet
Copy link
Member Author

What I don't understand is why all test sets need the same seed

Oh ok, this is just the easiest implementation to keep track of the seed, but I agree this wouldn't need to be the same seed for each testset, as long as the user gets it in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testsystem The unit testing framework and Test stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants