Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565

Open
ehannigan opened this issue Jul 22, 2020 · 10 comments

Comments

@ehannigan
Copy link

I was debugging some modifications I made to mass_spring.py when I realized that the result of each run is non-deterministic. I went back to the original mass_spring.py and made sure the controller network weights were initialized to the same value each time. But even when I can guarantee that there are no random variables being assigned anywhere, the resulting loss differs in each run.

Here are two different runs of the exact same code. You can see that the controller weights are exactly the same, but the loss values begin to diverge.

Run 1: mass_spring.py 2 train
n_objects= 20 n_springs= 46 weights1[0,0] -0.23413006961345673 weights2[0,0] 0.46663400530815125 Iter= 0 Loss= -0.2193218171596527 0.19502715683487248 Iter= 1 Loss= -0.21754804253578186 0.07976935930575488 Iter= 2 Loss= -0.3397877812385559 0.055776006347379746 Iter= 3 Loss= -0.3514309227466583 0.03870257399629174
Run 2: mass_spring.py 2 train
n_objects= 20 n_springs= 46 weights1[0,0] -0.23413006961345673 weights2[0,0] 0.46663400530815125 Iter= 0 Loss= -0.21932175755500793 0.1950520028177551 Iter= 1 Loss= -0.21754644811153412 0.07983238023710348 Iter= 2 Loss= -0.3397367000579834 0.055822440269175766 Iter= 3 Loss= -0.3514898419380188

In my own modifications, this was resulting in inconsistent failures of the simulation (v_inc will explode and all values will go to nan). I assume this is due to instabilities in Euler integration, but it would be nice to be able to get consistent results each time to make debugging easier.

Where could the non-deterministic behavior be coming from? Is it something we can fix, or are there stochastic processes that are a result of the compiler?

@ehannigan ehannigan changed the title why is mass spring simulation non deterministic? [Question] why is mass spring simulation non deterministic? Jul 23, 2020
@ehannigan
Copy link
Author

This may have been a better question to post in the DiffTaiChi repo. I will post it there instead (taichi-dev/difftaichi#31 (comment)) and summarize any response I get here.

@ehannigan ehannigan changed the title [Question] why is mass spring simulation non deterministic? [Question] can we make mass_spring.py repeatable (deterministic) between runs? Jul 23, 2020
@ehannigan ehannigan changed the title [Question] can we make mass_spring.py repeatable (deterministic) between runs? [question] can we make mass_spring.py repeatable (deterministic) between runs? Jul 27, 2020
samuela added a commit to samuela/difftaichi that referenced this issue Aug 6, 2020
@samuela
Copy link
Contributor

samuela commented Aug 6, 2020

Hey @ehannigan ! I just played around with this and found that there's randomness deriving from both Python's stdlib random and np.random. I tried setting random seeds on both of them in taichi-dev/difftaichi#34, and I'm now seeing deterministic results when running

python examples/mass_spring.py 0 train

@ehannigan
Copy link
Author

Hey! Thank you @samuela! I thought I had tried setting a seed (using numpy's RandomState object), but I must have messed up somewhere. I'll go back and try running it with your fix.

@ehannigan
Copy link
Author

I tried adding in these two lines, and I am still not getting repeatable results. Maybe you are using a different setup? What measure are you using to see if your results are the same each time? I'm looking at loss values.

Here is my current setup:
python 3.7.3
taichi == 0.6.21
llvm == 10.0.0
numpy==1.18.5

Here are the loss outputs after running the same command twice in a row:

First run: python mass_spring.py 2 train

n_objects= 20    n_springs= 46
Iter= 0 Loss= -0.21222630143165588
0.1129670010156612
Iter= 1 Loss= -0.21599465608596802
0.06594551447062441
Iter= 2 Loss= -0.25001487135887146
0.13671642659517222

Second run: python mass_spring.py 2 train

n_objects= 20    n_springs= 46
Iter= 0 Loss= -0.21222639083862305
0.11296541652168635
Iter= 1 Loss= -0.21599650382995605
0.0659292521378909
Iter= 2 Loss= -0.25000977516174316
0.1367099362027803
Iter= 3 Loss= -0.29366904497146606
0.10108566298371975

Is there anything else I could be missing to get the same results you are getting? I'm tearing my hair out on this one lol.

@yuanming-hu
Copy link
Member

Sorry for my absence - recent days have been rather hectic for me.

Do you get any improvements if you use ti.f64 in
https://github.com/yuanming-hu/difftaichi/blob/4742b1c84b045ea64da2eae99a3240b2ae0ebad0/examples/mass_spring.py#L11
?

@ehannigan
Copy link
Author

ehannigan commented Aug 15, 2020

Some strange results:
I tried changing to f64 and that didn't change anything.

I also tried setting f64 and i64 but I got this error:
Assertion failed: (S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"), function Create, file /Users/th3charlie/dev/taichi-exp/

So I started a jupyter notebook to keep track of my debugging so I could post it here. But in the jupyter notebook, when I ran mass_spring.py 2 train three times in a row using ~~
%run -i mass_spring.py 2 train,

the output was deterministic; all the losses matched even when I was using f32. (Edit: I tried this again and did not get the same result. Maybe I made a mistake, or read the results wrong.)
I will finish up my debugging notebook and post it tomorrow. If you have any insights, let me know.

@ehannigan
Copy link
Author

I've created a jupyter notebook to outline my debugging process. Since there were some updates to difftaichi due to updates in taichi, I went ahead and updated my version just to make sure we weren't debugging old code.

Here is the notebook: https://github.com/ehannigan/difftaichi/blob/testing_determinism/examples/debug_determinism-current.ipynb

I tried running mass_spring.py without any modifications. I tried switching to f64. I tried also changing i32->i64 (which caused an error), and I tried using np.random.RandomState() instead of np.random.seed(). At least in my system, the results are still not deterministic.

Could someone try running my jupyter notebook on their machine to see if you get the same results?

@ehannigan
Copy link
Author

Is there cuda in the backend? Is it possible that a function similar to this one needs to be added?
torch.backends.cudnn.deterministic=True
pytorch/pytorch#7068

@samuela
Copy link
Contributor

samuela commented Aug 24, 2020

Is there cuda in the backend? Is it possible that a function similar to this one needs to be added?
torch.backends.cudnn.deterministic=True
pytorch/pytorch#7068

There is although I think you need to be selecting it for it to be enabled. Default for mass_spring is CPU-only IIRC.

@ehannigan
Copy link
Author

Hmmm, then idk why I am still getting stochastic results. @samuela , you said you were able to get repeatable results? Were they just similar results, or did you get losses that matched exactly? If so, what is your system setup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants