Densenet architectures providing non-deterministic results #5790
Replies: 3 comments
-
@ankur56 I ran your densenet code multiple times locally, but they all give the same results in my env. Would you be able to reproduce the behaviour with BoringModel? my env
|
Beta Was this translation helpful? Give feedback.
-
@akihironitta Thank you for trying out my code. Please correct me if I am wrong, but from your environment, it doesn't seem like you are using GPUs with DDP. Are you running it only on CPUs? As far as I know, non-determinism mostly arises while using GPUs. As I mentioned in my question, I can obtain deterministic behavior with the simpler model (LitAutoEncoder) I posted, so it's highly likely that I will be able to obtain determinism with the Boring Model as well. I will give that one a try. But if you have access to a multi-GPU node, please run the Densenet model on it, and let me know what you get. |
Beta Was this translation helpful? Give feedback.
-
@ankur56 You're right, it worked right on CPUs, so the problem should arise when running on accelerators as you mentioned. Unfortunately, I currently do not have access to any other computing resources, so I cannot investigate this further at the moment, but I'm sure other core members will do. Sorry for your inconvenience. |
Beta Was this translation helpful? Give feedback.
-
❓ Questions and Help
Before asking:
I have tried looking for answers in other forums but couldn't find anything related to my question.
What is your question?
I can't seem to obtain deterministic results using Densenets (https://github.com/gpleiss/efficient_densenet_pytorch). I was able to obtain deterministic results with a relatively simpler architecture, LitAutoEncoder. I was wondering if that's because of the large number of convolution layers involved in Densenet models.
Code
The Densenet code I am using is as follows,
The code I am using for LitAutoEncoder is as follows,
What have you tried?
I am running all of my jobs on a supercomputer. I have tried running the code multiple times on the same node to remove any randomness due to having a different machine, but apparently, that doesn't make any difference.
What's your environment?
Beta Was this translation helpful? Give feedback.
All reactions