Releases: Lightning-AI/pytorch-lightning
Standard weekly patch release
[1.3.6] - 2021-06-15
Fixed
- Fixed logs overwriting issue for remote filesystems (#7889)
- Fixed
DataModule.prepare_data
could only be called on the global rank 0 process (#7945) - Fixed setting
worker_init_fn
to seed dataloaders correctly when using DDP (#7942) - Fixed
BaseFinetuning
callback to properly handle parent modules w/ parameters (#7931)
Contributors
@awaelchli @Borda @kaushikb11 @Queuecumber @SeanNaren @senarvi @speediedan
Standard weekly patch release
[1.3.5] - 2021-06-08
Added
- Added warning to Training Step output (#7779)
Fixed
- Fixed LearningRateMonitor + BackboneFinetuning (#7835)
- Minor improvements to
apply_to_collection
and type signature oflog_dict
(#7851) - Fixed docker versions (#7834)
- Fixed sharded training check for fp16 precision (#7825)
- Fixed support for torch Module type hints in LightningCLI (#7807)
Changed
- Move
training_output
validation to aftertrain_step_end
(#7868)
Contributors
@Borda, @justusschock, @kandluis, @mauvilsa, @shuyingsunshine21, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
Standard weekly patch release
[1.3.3] - 2021-05-26
Changed
- Changed calling of
untoggle_optimizer(opt_idx)
out of the closure function (#7563)
Fixed
- Fixed
ProgressBar
pickling after callingtrainer.predict
(#7608) - Fixed broadcasting in multi-node, multi-gpu DDP using torch 1.7 (#7592)
- Fixed dataloaders are not reset when tuning the model (#7566)
- Fixed print errors in
ProgressBar
whentrainer.fit
is not called (#7674) - Fixed global step update when the epoch is skipped (#7677)
- Fixed training loop total batch counter when accumulate grad batches was enabled (#7692)
Contributors
@carmocca @kaushikb11 @ryanking13 @Lucklyric @ajtritt @yifuwang
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.3.2] - 2021-05-18
Changed
DataModule
s now avoid duplicate{setup,teardown,prepare_data}
calls for the same stage (#7238)
Fixed
- Fixed parsing of multiple training dataloaders (#7433)
- Fixed recursive passing of
wrong_type
keyword argument inpytorch_lightning.utilities.apply_to_collection
(#7433) - Fixed setting correct
DistribType
forddp_cpu
(spawn) backend (#7492) - Fixed incorrect number of calls to LR scheduler when
check_val_every_n_epoch > 1
(#7032)
Contributors
@alanhdu @carmocca @justusschock @tkng
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.3.1] - 2021-05-11
Fixed
- Fixed DeepSpeed with IterableDatasets (#7362)
- Fixed
Trainer.current_epoch
not getting restored after tuning (#7434) - Fixed local rank displayed in console log (#7395)
Contributors
@akihironitta @awaelchli @leezu
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Lightning CLI, PyTorch Profiler, Improved Early Stopping
Today we are excited to announce Lightning 1.3, containing highly anticipated new features including a new Lightning CLI, improved TPU support, integrations such as PyTorch profiler, new early stopping strategies, predict and validate trainer routines, and more.
[1.3.0] - 2021-05-06
Added
- Added support for the
EarlyStopping
callback to run at the end of the training epoch (#6944) - Added synchronization points before and after
setup
hooks are run (#7202) - Added a
teardown
hook toClusterEnvironment
(#6942) - Added utils for metrics to scalar conversions (#7180)
- Added utils for NaN/Inf detection for gradients and parameters (#6834)
- Added more explicit exception message when trying to execute
trainer.test()
ortrainer.validate()
withfast_dev_run=True
(#6667) - Added
LightningCLI
class to provide simple reproducibility with minimum boilerplate training CLI (#4492, #6862, #7156, #7299) - Added
gradient_clip_algorithm
argument to Trainer for gradient clipping by value (#6123). - Added a way to print to terminal without breaking up the progress bar (#5470)
- Added support to checkpoint after training steps in
ModelCheckpoint
callback (#6146) - Added
TrainerStatus.{INITIALIZING,RUNNING,FINISHED,INTERRUPTED}
(#7173) - Added
Trainer.validate()
method to perform one evaluation epoch over the validation set (#4948) - Added
LightningEnvironment
for Lightning-specific DDP (#5915) - Added
teardown()
hook to LightningDataModule (#4673) - Added
auto_insert_metric_name
parameter toModelCheckpoint
(#6277) - Added arg to
self.log
that enables users to give custom names when dealing with multiple dataloaders (#6274) - Added
teardown
method toBaseProfiler
to enable subclasses defining post-profiling steps outside of__del__
(#6370) - Added
setup
method toBaseProfiler
to enable subclasses defining pre-profiling steps for every process (#6633) - Added no return warning to predict (#6139)
- Added
Trainer.predict
config validation (#6543) - Added
AbstractProfiler
interface (#6621) - Added support for including module names for forward in the autograd trace of
PyTorchProfiler
(#6349) - Added support for the PyTorch 1.8.1 autograd profiler (#6618)
- Added
outputs
parameter to callback'son_validation_epoch_end
&on_test_epoch_end
hooks (#6120) - Added
configure_sharded_model
hook (#6679) - Added support for
precision=64
, enabling training with double precision (#6595) - Added support for DDP communication hooks (#6736)
- Added
artifact_location
argument toMLFlowLogger
which will be passed to theMlflowClient.create_experiment
call (#6677) - Added
model
parameter to precision plugins'clip_gradients
signature (#6764, #7231) - Added
is_last_batch
attribute toTrainer
(#6825) - Added
LightningModule.lr_schedulers()
for manual optimization (#6567) - Added
MpModelWrapper
in TPU Spawn (#7045) - Added
max_time
Trainer argument to limit training time (#6823) - Added
on_predict_{batch,epoch}_{start,end}
hooks (#7141) - Added new
EarlyStopping
parametersstopping_threshold
anddivergence_threshold
(#6868) - Added
debug
flag to TPU Training Plugins (PT_XLA_DEBUG) (#7219) - Added new
UnrepeatedDistributedSampler
andIndexBatchSamplerWrapper
for tracking distributed predictions (#7215) - Added
trainer.predict(return_predictions=None|False|True)
(#7215) - Added
BasePredictionWriter
callback to implement prediction saving (#7127) - Added
trainer.tune(scale_batch_size_kwargs, lr_find_kwargs)
arguments to configure the tuning algorithms (#7258) - Added
tpu_distributed
check for TPU Spawn barrier (#7241) - Added device updates to TPU Spawn for Pod training (#7243)
- Added warning when missing
Callback
and usingresume_from_checkpoint
(#7254) - DeepSpeed single file saving (#6900)
- Added Training type Plugins Registry (#6982, #7063, #7214, #7224)
- Add
ignore
param tosave_hyperparameters
(#6056)
Changed
- Changed
LightningModule.truncated_bptt_steps
to be property (#7323) - Changed
EarlyStopping
callback from by default runningEarlyStopping.on_validation_end
if only training is run. Setcheck_on_train_epoch_end
to run the callback at the end of the train epoch instead of at the end of the validation epoch (#7069) - Renamed
pytorch_lightning.callbacks.swa
topytorch_lightning.callbacks.stochastic_weight_avg
(#6259) - Refactor
RunningStage
andTrainerState
usage (#4945, #7173)- Added
RunningStage.SANITY_CHECKING
- Added
TrainerFn.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}
- Changed
trainer.evaluating
to returnTrue
if validating or testing
- Added
- Changed
setup()
andteardown()
stage argument to take any of{fit,validate,test,predict}
(#6386) - Changed profilers to save separate report files per state and rank (#6621)
- The trainer no longer tries to save a checkpoint on exception or run callback's
on_train_end
functions (#6864) - Changed
PyTorchProfiler
to usetorch.autograd.profiler.record_function
to record functions (#6349) - Disabled
lr_scheduler.step()
in manual optimization (#6825) - Changed warnings and recommendations for dataloaders in
ddp_spawn
(#6762) pl.seed_everything
will now also set the seed on theDistributedSampler
(#7024)- Changed default setting for communication of multi-node training using
DDPShardedPlugin
(#6937) trainer.tune()
now returns the tuning result (#7258)LightningModule.from_datasets()
now acceptsIterableDataset
instances as training datasets. (#7503)- Changed
resume_from_checkpoint
warning to an error when the checkpoint file does not exist (#7075) - Automatically set
sync_batchnorm
fortraining_type_plugin
(#6536) - Allowed training type plugin to delay optimizer creation (#6331)
- Removed ModelSummary validation from train loop on_trainer_init (#6610)
- Moved
save_function
to accelerator (#6689) - Updated DeepSpeed ZeRO (#6546, #6752, #6142, #6321)
- Improved verbose logging for
EarlyStopping
callback (#6811) - Run ddp_spawn dataloader checks on Windows (#6930)
- Updated mlflow with using
resolve_tags
(#6746) - Moved
save_hyperparameters
to its own function (#7119) - Replaced
_DataModuleWrapper
with__new__
(#7289) - Reset
current_fx
properties on lightning module in teardown (#7247) - Auto-set
DataLoader.worker_init_fn
withseed_everything
(#6960) - Remove
model.trainer
call inside of dataloading mixin (#7317) - Split profilers module (#6261)
- Ensure accelerator is valid if running interactively (#5970)
- Disabled batch transfer in DP mode (#6098)
Deprecated
- Deprecated
outputs
in bothLightningModule.on_train_epoch_end
andCallback.on_train_epoch_end
hooks (#7339) - Deprecated
Trainer.truncated_bptt_steps
in favor ofLightningModule.truncated_bptt_steps
(#7323) - Deprecated
outputs
in bothLightningModule.on_train_epoch_end
andCallback.on_train_epoch_end
hooks (#7339) - Deprecated
LightningModule.grad_norm
in favor ofpytorch_lightning.utilities.grads.grad_norm
(#7292) - Deprecated the
save_function
property from theModelCheckpoint
callback (#7201) - Deprecated
LightningModule.write_predictions
andLightningModule.write_predictions_dict
(#7066) - Deprecated
TrainerLoggingMixin
in favor of a separate utilities module for metric handling (#7180) - Deprecated
TrainerTrainingTricksMixin
in favor of a separate utilities module for NaN/Inf detection for gradients and parameters (#6834) period
has been deprecated in favor ofevery_n_val_epochs
in theModelCheckpoint
callback (#6146)- Deprecated
trainer.running_sanity_check
in favor oftrainer.sanity_checking
(#4945) - Deprecated
Profiler(output_filename)
in favor ofdirpath
andfilename
(#6621) - Deprecated
PytorchProfiler(profiled_functions)
in favor ofrecord_functions
(#6349) - Deprecated
@auto_move_data
in favor oftrainer.predict
(#6993) - Deprecated
Callback.on_load_checkpoint(checkpoint)
in favor ofCallback.on_load_checkpoint(trainer, pl_module, checkpoint)
(#7253) - Deprecated metrics in favor of
torchmetrics
(#6505, #6530, #6540, #6547, #6515, #6572, #6573, #6584, #6636, #6637, #6649, #6659, #7131) - Deprecated the
LightningModule.datamodule
getter and setter methods; access them throughTrainer.datamodule
instead (#7168) - Deprecated the use of
Trainer(gpus="i")
(string) for selecting the i-th GPU; from v1.5 this will set the number of GPUs instead of the index (#6388)
Removed
- Removed the
exp_save_path
property from theLightningModule
(#7266) - Removed training loop explicitly calling
EarlyStopping.on_validation_end
if no validation is run (#7069) - Removed
automatic_optimization
as a property from the training loop in favor ofLightningModule.automatic_optimization
(#7130) - Removed evaluation loop legacy returns for
*_epoch_end
hooks (#6973) - Removed support for passing a bool value to
profiler
argument of Trainer (#6164) - Removed no return warning from val/test step (#6139)
- Removed passing a
ModelCheckpoint
instance toTrainer(checkpoint_callback)
(#6166) - Removed deprecated Trainer argument
enable_pl_optimizer
andautomatic_optimization
(#6163) - Removed deprecated metrics (#6161)
- from
pytorch_lightning.metrics.functional.classification
removedto_onehot
,to_categorical
,get_num_classes
,roc
,multiclass_roc
,average_precision
,precision_recall_curve
,multiclass_precision_recall_curve
- from
pytorch_lightning.metrics.functional.reduction
removedreduce
,class_reduce
- from
- Removed deprecated
ModelCheckpoint
argumentsprefix
,mode="auto"
(#6162) - Removed
mode='auto'
fromEarlyStopping
(#6167) - Removed
epoch
andstep
argume...
Quick patch release
Fixing missing packaging
package in dependencies, which was affecting the only installation to a very blank system.
Standard weekly patch release
Standard weekly patch release
[1.2.8] - 2021-04-14
Added
- Added TPUSpawn + IterableDataset error message (#6875)
Fixed
- Fixed process rank not being available right away after
Trainer
instantiation (#6941) - Fixed
sync_dist
for tpus (#6950) - Fixed
AttributeError for
require_backward_grad_sync` when running manual optimization with sharded plugin (#6915) - Fixed
--gpus
default for parser returned byTrainer.add_argparse_args
(#6898) - Fixed TPU Spawn all gather (#6896)
- Fixed
EarlyStopping
logic whenmin_epochs
ormin_steps
requirement is not met (#6705) - Fixed csv extension check (#6436)
- Fixed checkpoint issue when using Horovod distributed backend (#6958)
- Fixed tensorboard exception raising (#6901)
- Fixed setting the eval/train flag correctly on accelerator model (#6983)
- Fixed DDP_SPAWN compatibility with bug_report_model.py (#6892)
- Fixed bug where
BaseFinetuning.flatten_modules()
was duplicating leaf node parameters (#6879) - Set better defaults for
rank_zero_only.rank
when training is launched with SLURM and torchelastic:
Contributors
@ananthsub @awaelchli @ethanwharris @justusschock @kandluis @kaushikb11 @liob @SeanNaren @skmatz
If we forgot someone due to not matching commit email with GitHub account, let us know :]