-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neg partial log likelihood loss is 0 every time the batch size is 1 #35
Comments
Dear @mayurmallya, Thank you for your interest and for your question! Summary More details Including more subjects in your sample results in a more refined and accurate ranking, which improves the estimation of the log hazards. Conversely, with only one subject, the likelihood provides no information for parameter estimation (because the subject is compared to ... no one). In maths With only one subject Side note I hope this helps, Melodie |
Thank you @melodiemonod for the detailed answer, much appreciated! If I have 300 samples in the dataset, I believe the ideal case scenario would be a batch size of 300 (right?). But because of the computational constraints, the batch size would be lower, let's say 10. In that case the likelihood would be calculated for subjects 1-10 and 10-20 separately right? I'm just trying to understand what you meant by-
Also, based on your experience, what batch size would you recommend? Or is it simply higher the better? Thank you once again :) |
Hi @mayurmallya, 1/
You cannot decompose the log likelihood as follows when dealing the cox partial likelihood 2/
It's a tradeoff between converging faster and computational power. The primary constraint is often the memory available on your GPU or TPU. On the other hand, larger batch sizes provide more accurate estimates of the gradient, potentially leading to more stable and faster convergence. Practical guidelines advise to start small: Begin with a small batch size (e.g., 32 or 64) to ensure your model trains correctly without memory issues. Monitor performance and track that the loss and accuracy on both the training and validation sets to see if increasing the batch size improves performance. If yes and if you have the memory capacity, gradually increase the batch size to see if it speeds up training without compromising the model’s ability to generalize. Best regards Melodie |
Thank you @ahmedhshahin and @melodiemonod |
Hi! I was wondering: I am working with inputs that are tensor of dimension [x, 1024], with x bein the number of patches in a histology image. |
Hi there,
Thanks for sharing this wonderful library!
I was trying to run a survival analysis using the Cox proportional hazards model and due to the GPU constraints, I have to go with the batch size of 1. And every time I run the model, I observe that the loss value is always 0 when I'm using
cox.neg_partial_log_likelihood
.I looked into the implementation of the
_partial_likelihood_cox
and it seems that thelog_denominator
gets the same value as thelog_hz_sorted
when the batch size is 1, resulting in the loss to be 0.I was wondering if there is a workaround for this issue, please let me know. Also attaching the link to the corresponding code
torchsurv/src/torchsurv/loss/cox.py
Line 174 in 799eb30
Thank you in advance!
The text was updated successfully, but these errors were encountered: