-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thermostat noise for different seeds is correlated #3585
Comments
We could use the user-provided seed value as the offset of a Philox counter and take the random number at that index as the seed for the thermostats. |
we should start with writing a failing test |
The obvious and simple solution is to use the seed as part of the key. This would also allow to just use one counter for all the RNGs (as a global propagation parameter). |
Also the salt should be part of the key, not the counter.... |
I'll propose a fix. |
There is a sketch in https://github.com/fweik/espresso/tree/counter, which this the thermostats can be turned into proper classes, because |
The branch lacks the python interface, which can also be considerably simplified. The counter needs to be added to the state (this is why the checkpointing tests fail), and the cython stuff needs to be adjusted. |
You've also added an extra |
Ah right. I'll finish this next week. |
Any news on this? The proposed fix simplifies the core a lot and would be a nice addition to 4.1.3. |
I forgot about this, sorry. I'll take a look and asses how much work this still is. Then we can decide on a course of action. |
The plan here was to use a single monotonic counter as parametric time, which is propagated once per integration step? Does anybody see a problem with such a design? @RudolfWeeber? |
Looking at your WIP I don't see an issue on 4.2.0, but for 4.3.0 with walberla I'm not sure. The only side effect I can think of in 4.2.0, is that disabling a thermostat and re-enabling it later with the same seed will produce a different trajectory, if I understand your code correctly. This will make reproducing results more difficult, unless we provide a mechanism to reset the global counter. |
Apparently this is the desired behavior, so this is by design. |
Looking at the thermostat code, I found more bugs. |
I won't have time to work on this. |
Could you tell me more about the other thermostat bug you found? I'd be interested in having a look at it. |
No unfortunately I don't remember. I should have taken notes. |
Let's talk on Monday, I have looked into this, and in principle it is solvable. They old WIP is of no use anymore, because too much of the relevant code changed, I have started a new one, providing the infrastructure, e.g. exposing all bits of the key in the philox wrapper and adding the evolution parameter as a new global. This could now be used for the thermostats, but I looked at design |
not a unit test, but illustrates the issue: diff --git a/src/core/unit_tests/thermostats_test.cpp b/src/core/unit_tests/thermostats_test.cpp
index 45eecdce6..dc30b7142 100644
--- a/src/core/unit_tests/thermostats_test.cpp
+++ b/src/core/unit_tests/thermostats_test.cpp
@@ -330,3 +330,18 @@ BOOST_AUTO_TEST_CASE(test_npt_iso_randomness) {
}
}
#endif // NPT
+
+BOOST_AUTO_TEST_CASE(test_autocorrelation) {
+ auto thermostat1 = thermostat_factory<LangevinThermostat>();
+ auto thermostat2 = thermostat_factory<LangevinThermostat>();
+ thermostat1.rng_initialize(42);
+ thermostat2.rng_initialize(43);
+ auto p = particle_factory();
+ for (int i = 0 ; i < 10; ++i) {
+ thermostat1.rng_increment();
+ thermostat2.rng_increment();
+ auto p1 = friction_thermo_langevin(thermostat1, p);
+ auto p2 = friction_thermo_langevin(thermostat2, p);
+ printf("t = %d p1[0] = %+6.3f p2[0] = %+6.3f\n", i, p1[0], p2[0]);
+ }
+} output:
The only way to test this is to run an autocorrelation function, which isn't very convenient in a unit test. We should also check autocorrelation on the ordered random sequences to reveal bugs such as the NPT one that took me, Flo and Rudolf quite a while to figure out when refactoring the NPT code where we call the same random function twice in a single time step (this bug never made it into a PR, fortunately), but this would need some special treatment since ordered sequence are naturally autocorrelated... Or, we just check the random sequences match the Philox sequences as suggested by Flo. |
The bonded interactions checkpointing mechanism in set up in such a way that exceptions are silently ignored. This makes it really hard to track down a seed checkpointing bug I just ran into (maybe this is what @fweik mentioned above). The seed checkpointing mechanism is really cryptic. --- a/src/python/espressomd/interactions.pyx
+++ b/src/python/espressomd/interactions.pyx
@@ -1720,6 +1720,10 @@ cdef class BondedInteraction:
property params:
def __get__(self):
+ print(">>> ", self._params)
+ if 'seed' in self._params and self._params['seed'] == 0:
+ print('The exception on the next line will be silently ignored')
+ raise TypeError()
return self._params
def __set__(self, p):
@@ -3358,10 +3362,14 @@ class BondedInteractions:
params = {}
for i, bonded_instance in enumerate(self):
if hasattr(bonded_instance, 'params'):
+ print('Read parameters...')
params[i] = bonded_instance.params
+ print('Read successful')
params[i]['bond_type'] = bonded_instance.type_number()
else:
+ print('Object has no parameters!')
params[i] = None
+ print('::: ', params)
return params
def __setstate__(self, params): output when running into the seed checkpointing bug that resets the seed to 0:
|
why do you think that the |
I think w need the following checks. Uncorrelated results for:
* 2 particles, same counter value
* 1 particle, consecutive time steps
* 2 particles with the same id with 2 rngs with consecutive seeds
All of these are single value correlations.
In addition, we need to check the mean and variance of the outcome for a single particle.
That should be feasible, both in terms of runtime and complexity.
The alternative is to
* Separate out the particle->key mapping
* Make the RNG and key generation injectable via templates into the thermostats
* Make a fake rng with deterministic outcomes
* Test with that
|
wouldn't it be better to have a single implementation of the generation of random numbers which can be unit tested? |
wouldn't it be better to have a single implementation of the generation of random numbers which can be unit tested?
That would be the outcome of the steps I proposed in the second list.
However, there is two parts: key generation and random number generation. Both would need to be tested.
It is clear that that approach is better, but it is also much more work.
Since the thermalization is quite a central part, it might be worth it, though.
@jngrad, what would you prefer?
|
I looks like it's the mechanism currently used for checkpointing bonded interactions objects, but I could be wrong. From what I understood, the espresso/src/python/espressomd/interactions.pyx Lines 3357 to 3365 in 748e9d0
I'll have a look once the refactor is ready. Right now we have 3 different thermostat implementations in the core, so it's not clear to me yet which path is feasible. |
But if we decide to pull out the RNG and key generation, then THAT is the refactoring that should be done now. |
it is not the first bug we encounter since we switched to the counter-based RNG and we are still not sure if it is worth to pull out the logic and write unit tests? |
All functions generating uniform and gaussian noise are checked for correlation in
The RNG implementation is currently made of free functions in
I'm not sure what you mean here. My understanding is that we need the key mechanism to provide the same random numbers to the same particle id pair. |
By “key generation”, I mean: how a particle or a pair of particles maps to a key for the RNG.
I think, there are two kinds:
* Single particle (Langevin, SD, Brownian)
* Pair of particle (DPD, thermalized bond)
|
At the moment we use the same function for the single particles and particle pairs. In the particle pair case, either the particle ids order matters (thermalized bond), or it doesn't matter and the ids are sorted (DPD). We could refactor the free functions in |
Alright the WIP is in jngrad:thermostats-global-counter. All thermostats now derive from We can now decide whether we want to abstract the key generation for single particles/particle pairs, replace thermostat unit tests by statistical python tests, pull the free functions for random number generation out of the core, refactor the python class for thermostats (e.g. like in integrate.pyx), split the core classes for thermostats in separates files so that they don't know about each other, etc. |
By the way, the This |
Not sure about the optional, probably, we need to look at it in a meeting.
``temperature` is one of the globals which should go away.
As Flo pointed out, the system doesn’t have a temperature. Algorithms which use one as parameter, should store it themselves.
|
Good point. Let's discuss the thermostat infrastructure in the next meeting. Going back to the original issue, here is a MWE to reproduce the bug in a simulation: import espressomd
system = espressomd.System(box_l=[1.0, 1.0, 1.0])
system.cell_system.skin = 0
system.time_step = 0.01
system.part.add(pos=[0, 0, 0], ext_force=[0, 0, -1])
kT = 1.1
gamma = 3.5
print("seed = 42")
system.thermostat.set_langevin(kT=kT, gamma=gamma, seed=42)
for i in range(4):
system.integrator.run(1)
print(system.part[0].pos if i else '...skip...')
system.part[0].pos = [0, 0, 0]
system.part[0].v = [0, 0, 0]
print("seed = 43")
system.thermostat.set_langevin(kT=kT, gamma=gamma, seed=43)
for i in range(3):
system.integrator.run(1)
print(system.part[0].pos)
system.part[0].pos = [0, 0, 0]
system.part[0].v = [0, 0, 0] output:
Although the random sequences are shifted between two simulations, the initial conditions (particle positions and velocities) are necessarily different, thus if at least two particles in the system are within interaction range, the initial forces will be different in Langevin Dynamics. Maybe this RNG correlation could be an issue in Stokesian Dynamics and Brownian Dynamics simulations? |
Description of changes: - Remove RNG correlation stemming from seed offsets (fixes #3585) - seeds are now used as keys - a monotonically increasing counter is used in each thermostat - the only way to reset these counters is to create a new `System` - Remove RNG correlation stemming from resetting `sim_time` or `time_step` during simulations with SD (fixes #3840) - the SD thermostat now uses the same RNG interface as other thermostats - Accelerate RNG unit tests (fixes #3573) - they now take 2 seconds to run in coverage and sanitizer builds in CI - Separate thermostats from integrators - better separation of concerns
Currently the thermostat noise for different seeds is correlated, the seed just shifts the sequence.
Currently the seed is used to offset the counter, but it should be used in the key to get independent
sequences. Also there should only be one counter...
The text was updated successfully, but these errors were encountered: