Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep same random seed when restarting a run #2

Open
anaprietonem opened this issue Dec 19, 2024 · 0 comments
Open

Keep same random seed when restarting a run #2

anaprietonem opened this issue Dec 19, 2024 · 0 comments
Labels
bug Something isn't working training

Comments

@anaprietonem
Copy link
Contributor

What happened?

Current AnemoiTrainer defines the initial seed according to def initial_seed(self) -> int:. When resuming a run this seed is not reloaded as part of the mlflow hyperparameters and hence a new random seed is generated. For consistency and reproducibility of the results, we would need to keep the same seed.

What are the steps to reproduce the bug?

Train a model in Leonardo and resume that after 24 hrs - see that the resumed job has a different seed.

Version

Current

Platform (OS and architecture)

ATOS

Relevant log output

No response

Accompanying data

No response

Organisation

No response

@anaprietonem anaprietonem added the bug Something isn't working label Dec 19, 2024
@JesperDramsch JesperDramsch transferred this issue from ecmwf/anemoi-training Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants