Release Model Parallelism Training and More Logging Options · Lightning-AI/pytorch-lightning

Overview

Lightning 1.1 is out! You can now train models with twice the parameters and zero code changes with the new sharded model training! We also have a new plugin for sequential model parallelism, more logging options, and a lot of improvements!
Release highlights: https://bit.ly/3gyLZpP

Learn more about sharded training: https://bit.ly/2W3hgI0

Detail changes

Added

Added "monitor" key to saved ModelCheckpoints (#4383)
Added ConfusionMatrix class interface (#4348)
Added multiclass AUROC metric (#4236)
Added global step indexing to the checkpoint name for a better sub-epoch checkpointing experience (#3807)
Added optimizer hooks in callbacks (#4379)
Added option to log momentum (#4384)
Added current_score to ModelCheckpoint.on_save_checkpoint (#4721)
Added logging using self.log in train and evaluation for epoch end hooks (#4913)
Added ability for DDP plugin to modify optimizer state saving (#4675)
Added casting to python types for NumPy scalars when logging hparams (#4647)
Added prefix argument in loggers (#4557)
Added printing of total num of params, trainable and non-trainable params in ModelSummary (#4521)
Added PrecisionRecallCurve, ROC, AveragePrecision class metric (#4549)
Added custom Apex and NativeAMP as Precision plugins (#4355)
Added DALI MNIST example (#3721)
Added sharded plugin for DDP for multi-GPU training memory optimizations (#4773)
Added experiment_id to the NeptuneLogger (#3462)
Added Pytorch Geometric integration example with Lightning (#4568)
Added all_gather method to LightningModule which allows gradient-based tensor synchronizations for use-cases such as negative sampling. (#5012)
Enabled self.log in most functions (#4969)
Added changeable extension variable for ModelCheckpoint (#4977)

Changed

Removed multiclass_roc and multiclass_precision_recall_curve, use roc and precision_recall_curve instead (#4549)
Tuner algorithms will be skipped if fast_dev_run=True (#3903)
WandbLogger does not force wandb reinit arg to True anymore and creates a run only when needed (#4648)
Changed automatic_optimization to be a model attribute (#4602)
Changed Simple Profiler report to order by percentage time spent + num calls (#4880)
Simplify optimization Logic (#4984)
Classification metrics overhaul (#4837)
Updated fast_dev_run to accept integer representing num_batches (#4629)
Refactored optimizer (#4658)

Deprecated

Deprecated prefix argument in ModelCheckpoint (#4765)
Deprecated the old way of assigning hyper-parameters through self.hparams = ... (#4813)
Deprecated mode='auto' from ModelCheckpoint and EarlyStopping (#4695)

Removed

Removed reorder parameter of the auc metric (#5004)

Fixed

Added feature to move tensors to CPU before saving (#4309)
Fixed LoggerConnector to have logged metrics on root device in DP (#4138)
Auto convert tensors to contiguous format when gather_all (#4907)
Fixed PYTHONPATH for DDP test model (#4528)
Fixed allowing logger to support indexing (#4595)
Fixed DDP and manual_optimization (#4976)

Contributors

@ananyahjha93, @awaelchli, @blatr, @Borda, @borisdayma, @carmocca, @ddrevicky, @george-gca, @gianscarpe, @irustandi, @janhenriklambrechts, @jeremyjordan, @justusschock, @lezwon, @rohitgr7, @s-rog, @SeanNaren, @SkafteNicki, @tadejsv, @tchaton, @williamFalcon, @zippeurfou

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Parallelism Training and More Logging Options