[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565

ehannigan · 2020-07-22T23:58:10Z

I was debugging some modifications I made to mass_spring.py when I realized that the result of each run is non-deterministic. I went back to the original mass_spring.py and made sure the controller network weights were initialized to the same value each time. But even when I can guarantee that there are no random variables being assigned anywhere, the resulting loss differs in each run.

Here are two different runs of the exact same code. You can see that the controller weights are exactly the same, but the loss values begin to diverge.

Run 1: mass_spring.py 2 train
n_objects= 20 n_springs= 46 weights1[0,0] -0.23413006961345673 weights2[0,0] 0.46663400530815125 Iter= 0 Loss= -0.2193218171596527 0.19502715683487248 Iter= 1 Loss= -0.21754804253578186 0.07976935930575488 Iter= 2 Loss= -0.3397877812385559 0.055776006347379746 Iter= 3 Loss= -0.3514309227466583 0.03870257399629174
Run 2: mass_spring.py 2 train
n_objects= 20 n_springs= 46 weights1[0,0] -0.23413006961345673 weights2[0,0] 0.46663400530815125 Iter= 0 Loss= -0.21932175755500793 0.1950520028177551 Iter= 1 Loss= -0.21754644811153412 0.07983238023710348 Iter= 2 Loss= -0.3397367000579834 0.055822440269175766 Iter= 3 Loss= -0.3514898419380188

In my own modifications, this was resulting in inconsistent failures of the simulation (v_inc will explode and all values will go to nan). I assume this is due to instabilities in Euler integration, but it would be nice to be able to get consistent results each time to make debugging easier.

Where could the non-deterministic behavior be coming from? Is it something we can fix, or are there stochastic processes that are a result of the compiler?

The text was updated successfully, but these errors were encountered:

ehannigan · 2020-07-23T14:16:07Z

This may have been a better question to post in the DiffTaiChi repo. I will post it there instead (taichi-dev/difftaichi#31 (comment)) and summarize any response I get here.

See taichi-dev/taichi#1565.

samuela · 2020-08-06T02:59:27Z

Hey @ehannigan ! I just played around with this and found that there's randomness deriving from both Python's stdlib random and np.random. I tried setting random seeds on both of them in taichi-dev/difftaichi#34, and I'm now seeing deterministic results when running

python examples/mass_spring.py 0 train

ehannigan · 2020-08-08T00:53:10Z

Hey! Thank you @samuela! I thought I had tried setting a seed (using numpy's RandomState object), but I must have messed up somewhere. I'll go back and try running it with your fix.

ehannigan · 2020-08-08T01:14:43Z

I tried adding in these two lines, and I am still not getting repeatable results. Maybe you are using a different setup? What measure are you using to see if your results are the same each time? I'm looking at loss values.

Here is my current setup:
python 3.7.3
taichi == 0.6.21
llvm == 10.0.0
numpy==1.18.5

Here are the loss outputs after running the same command twice in a row:

First run: python mass_spring.py 2 train

n_objects= 20    n_springs= 46
Iter= 0 Loss= -0.21222630143165588
0.1129670010156612
Iter= 1 Loss= -0.21599465608596802
0.06594551447062441
Iter= 2 Loss= -0.25001487135887146
0.13671642659517222

Second run: python mass_spring.py 2 train

n_objects= 20    n_springs= 46
Iter= 0 Loss= -0.21222639083862305
0.11296541652168635
Iter= 1 Loss= -0.21599650382995605
0.0659292521378909
Iter= 2 Loss= -0.25000977516174316
0.1367099362027803
Iter= 3 Loss= -0.29366904497146606
0.10108566298371975

Is there anything else I could be missing to get the same results you are getting? I'm tearing my hair out on this one lol.

yuanming-hu · 2020-08-08T02:37:40Z

Sorry for my absence - recent days have been rather hectic for me.

Do you get any improvements if you use ti.f64 in
https://github.com/yuanming-hu/difftaichi/blob/4742b1c84b045ea64da2eae99a3240b2ae0ebad0/examples/mass_spring.py#L11
?

ehannigan · 2020-08-15T02:29:42Z

~~Some strange results:~~
I tried changing to f64 and that didn't change anything.

I also tried setting f64 and i64 but I got this error:
Assertion failed: (S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"), function Create, file /Users/th3charlie/dev/taichi-exp/

So I started a jupyter notebook to keep track of my debugging so I could post it here. But in the jupyter notebook, when I ran mass_spring.py 2 train three times in a row using ~~
%run -i mass_spring.py 2 train,
~~the output was deterministic; all the losses matched even when I was using f32.~~ (Edit: I tried this again and did not get the same result. Maybe I made a mistake, or read the results wrong.)
I will finish up my debugging notebook and post it tomorrow. If you have any insights, let me know.

ehannigan · 2020-08-20T02:32:44Z

I've created a jupyter notebook to outline my debugging process. Since there were some updates to difftaichi due to updates in taichi, I went ahead and updated my version just to make sure we weren't debugging old code.

Here is the notebook: https://github.com/ehannigan/difftaichi/blob/testing_determinism/examples/debug_determinism-current.ipynb

I tried running mass_spring.py without any modifications. I tried switching to f64. I tried also changing i32->i64 (which caused an error), and I tried using np.random.RandomState() instead of np.random.seed(). At least in my system, the results are still not deterministic.

Could someone try running my jupyter notebook on their machine to see if you get the same results?

ehannigan · 2020-08-24T00:15:06Z

Is there cuda in the backend? Is it possible that a function similar to this one needs to be added?
torch.backends.cudnn.deterministic=True
pytorch/pytorch#7068

samuela · 2020-08-24T01:44:00Z

Is there cuda in the backend? Is it possible that a function similar to this one needs to be added?
torch.backends.cudnn.deterministic=True
pytorch/pytorch#7068

There is although I think you need to be selecting it for it to be enabled. Default for mass_spring is CPU-only IIRC.

ehannigan · 2020-08-24T23:35:28Z

Hmmm, then idk why I am still getting stochastic results. @samuela , you said you were able to get repeatable results? Were they just similar results, or did you get losses that matched exactly? If so, what is your system setup?

ehannigan changed the title ~~why is mass spring simulation non deterministic?~~ [Question] why is mass spring simulation non deterministic? Jul 23, 2020

ehannigan changed the title ~~[Question] why is mass spring simulation non deterministic?~~ [Question] can we make mass_spring.py repeatable (deterministic) between runs? Jul 23, 2020

ehannigan mentioned this issue Jul 27, 2020

Random behavior and incorrect gradient #1594

Open

ehannigan changed the title ~~[Question] can we make mass_spring.py repeatable (deterministic) between runs?~~ [question] can we make mass_spring.py repeatable (deterministic) between runs? Jul 27, 2020

samuela added a commit to samuela/difftaichi that referenced this issue Aug 6, 2020

Set random seeds in mass_spring.py

144c7f3

See taichi-dev/taichi#1565.

samuela mentioned this issue Aug 6, 2020

Set random seeds in mass_spring.py taichi-dev/difftaichi#34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565

[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565

ehannigan commented Jul 22, 2020

ehannigan commented Jul 23, 2020

samuela commented Aug 6, 2020

ehannigan commented Aug 8, 2020

ehannigan commented Aug 8, 2020

yuanming-hu commented Aug 8, 2020

ehannigan commented Aug 15, 2020 •

edited

Loading

ehannigan commented Aug 20, 2020

ehannigan commented Aug 24, 2020

samuela commented Aug 24, 2020

ehannigan commented Aug 24, 2020

[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565

[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565

Comments

ehannigan commented Jul 22, 2020

ehannigan commented Jul 23, 2020

samuela commented Aug 6, 2020

ehannigan commented Aug 8, 2020

ehannigan commented Aug 8, 2020

yuanming-hu commented Aug 8, 2020

ehannigan commented Aug 15, 2020 • edited Loading

ehannigan commented Aug 20, 2020

ehannigan commented Aug 24, 2020

samuela commented Aug 24, 2020

ehannigan commented Aug 24, 2020

ehannigan commented Aug 15, 2020 •

edited

Loading