Optimize fit_loop()
to reduce train_dataloader()
's memory footprint
#20382
Labels
fit_loop()
to reduce train_dataloader()
's memory footprint
#20382
Description & Motivation
Hi,
I have noticed that the
train_dataloader()
's workers were still up, idle but withholding resources, whilst theval_dataloader()
's would be actively delivering batches.After some investigation, I found the following pseudo-code describing
fit()
, here simplified:And the actual behaviour matches the pseudo code, so this is not a bug and is working as intended.
However, I've been struggling to maintain the equilibrium between data processing speed and memory footprint when running instance segmentation runs on large and dense non-public datasets.
I understand that when
val_check_interval
is different thanNone
, running theval_loop
within thetrain_dataloader()
loop is necessary. However, in when theval_check_interval
isNone
, I think that it would be beneficial to modify thefit_loop()
to something like,That way resources would be freed as soon as they're not needed.
Pitch
Within the implementation, the
val_loop()
is called withinon_advance_end()
, and thefit_loop()
withinrun()
is considerably different than the pseudo-code.I'm assuming that we need to modify and re-use
on_advance_end()
after the completion of thewhile
-loop inrun()
.Is this correct?
Alternatives
No response
Additional context
I have made this
boring.py
to illustrate the situation and have a concrete example to debug on,cc @Borda
The text was updated successfully, but these errors were encountered: