Skip to content

Releases: Lightning-AI/pytorch-lightning

Standard weekly patch release

07 Apr 17:58
Compare
Choose a tag to compare

[1.2.7] - 2021-04-06

Fixed

  • Fixed resolve a bug with omegaconf and xm.save (#6741)
  • Fixed an issue with IterableDataset when len is not defined (#6828)
  • Sanitize None params during pruning (#6836)
  • Enforce an epoch scheduler interval when using SWA (#6588)
  • Fixed TPU Colab hang issue, post training (#6816])
  • Fixed a bug where TensorBoardLogger would give a warning and not log correctly to a symbolic link save_dir (#6730)

Contributors

@awaelchli, @ethanwharris, @karthikprasad, @kaushikb11, @mibaumgartner, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

30 Mar 14:46
Compare
Choose a tag to compare

[1.2.6] - 2021-03-30

Changed

  • Changed the behavior of on_epoch_start to run at the beginning of validation & test epoch (#6498)

Removed

  • Removed legacy code to include step dictionary returns in callback_metrics. Use self.log_dict instead. (#6682)

Fixed

  • Fixed DummyLogger.log_hyperparams raising a TypeError when running with fast_dev_run=True (#6398)
  • Fixed error on TPUs when there was no ModelCheckpoint (#6654)
  • Fixed trainer.test freeze on TPUs (#6654)
  • Fixed a bug where gradients were disabled after calling Trainer.predict (#6657)
  • Fixed bug where no TPUs were detected in a TPU pod env (#6719)

Contributors

@awaelchli, @carmocca, @ethanwharris, @kaushikb11, @rohitgr7, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Weekly patch release - torchmetrics compatibility

24 Mar 15:17
Compare
Choose a tag to compare

[1.2.5] - 2021-03-23

Changed

  • Added Autocast in validation, test and predict modes for Native AMP (#6565)
  • Update Gradient Clipping for the TPU Accelerator (#6576)
  • Refactored setup for typing friendly (#6590)

Fixed

  • Fixed a bug where all_gather would not work correctly with tpu_cores=8 (#6587)
  • Fixed comparing required versions (#6434)
  • Fixed duplicate logs appearing in console when using the python logging module (#6275)

Contributors

@awaelchli, @Borda, @ethanwharris, @justusschock, @kaushikb11

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

16 Mar 20:29
Compare
Choose a tag to compare

[1.2.4] - 2021-03-16

Changed

  • Changed the default of find_unused_parameters back to True in DDP and DDP Spawn (#6438)

Fixed

  • Expose DeepSpeed loss parameters to allow users to fix loss instability (#6115)
  • Fixed DP reduction with collection (#6324)
  • Fixed an issue where the tuner would not tune the learning rate if also tuning the batch size (#4688)
  • Fixed broadcast to use PyTorch broadcast_object_list and add reduce_decision (#6410)
  • Fixed logger creating directory structure too early in DDP (#6380)
  • Fixed DeepSpeed additional memory use on rank 0 when default device not set early enough (#6460)
  • Fixed DummyLogger.log_hyperparams raising a TypeError when running with fast_dev_run=True (#6398)
  • Fixed an issue with Tuner.scale_batch_size not finding the batch size attribute in the datamodule (#5968)
  • Fixed an exception in the layer summary when the model contains torch.jit scripted submodules (#6511)
  • Fixed when Train loop config was run during Trainer.predict (#6541)

Contributors

@awaelchli, @kaushikb11, @Palzer, @SeanNaren, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

09 Mar 17:28
Compare
Choose a tag to compare

[1.2.3] - 2021-03-09

Fixed

  • Fixed ModelPruning(make_pruning_permanent=True) pruning buffers getting removed when saved during training (#6073)
  • Fixed when _stable_1d_sort to work when n >= N (#6177)
  • Fixed AttributeError when logger=None on TPU (#6221)
  • Fixed PyTorch Profiler with emit_nvtx (#6260)
  • Fixed trainer.test from best_path hangs after calling trainer.fit (#6272)
  • Fixed SingleTPU calling all_gather (#6296)
  • Ensure we check deepspeed/sharded in multinode DDP (#6297)
  • Check LightningOptimizer doesn't delete optimizer hooks (#6305)
  • Resolve memory leak for evaluation (#6326)
  • Ensure that clip gradients is only called if the value is greater than 0 (#6330)
  • Fixed Trainer not resetting lightning_optimizers when calling Trainer.fit() multiple times (#6372)

Contributors

@awaelchli, @carmocca, @chizuchizu, @frankier, @SeanNaren, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

05 Mar 15:12
Compare
Choose a tag to compare

[1.2.2] - 2021-03-02

Added

  • Added checkpoint parameter to callback's on_save_checkpoint hook (#6072)

Changed

  • Changed the order of backward, step, zero_grad to zero_grad, backward, step (#6147)
  • Changed default for DeepSpeed CPU Offload to False, due to prohibitively slow speeds at smaller scale (#6262)

Fixed

  • Fixed epoch level schedulers not being called when val_check_interval < 1.0 (#6075)
  • Fixed multiple early stopping callbacks (#6197)
  • Fixed incorrect usage of detach(), cpu(), to() (#6216)
  • Fixed LBFGS optimizer support which didn't converge in automatic optimization (#6147)
  • Prevent WandbLogger from dropping values (#5931)
  • Fixed error thrown when using valid distributed mode in multi node (#6297)

Contributors

@akihironitta, @borisdayma, @carmocca, @dvolgyes, @SeanNaren, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

24 Feb 17:13
Compare
Choose a tag to compare

[1.2.1] - 2021-02-23

Fixed

  • Fixed incorrect yield logic for the amp autocast context manager (#6080)
  • Fixed priority of plugin/accelerator when setting distributed mode (#6089)
  • Fixed error message for AMP + CPU incompatibility (#6107)

Contributors

@awaelchli, @SeanNaren, @carmocca

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Pruning & Quantization & SWA

18 Feb 23:04
3645eb1
Compare
Choose a tag to compare

[1.2.0] - 2021-02-18

Added

  • Added DataType, AverageMethod and MDMCAverageMethod enum in metrics (#5657)
  • Added support for summarized model total params size in megabytes (#5590)
  • Added support for multiple train loaders (#1959)
  • Added Accuracy metric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using the top_k parameter (#4838)
  • Added Accuracy metric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with the subset_accuracy parameter (#4838)
  • Added HammingDistance metric to compute the hamming distance (loss) (#4838)
  • Added max_fpr parameter to auroc metric for computing partial auroc metric (#3790)
  • Added StatScores metric to compute the number of true positives, false positives, true negatives and false negatives (#4839)
  • Added R2Score metric (#5241)
  • Added LambdaCallback (#5347)
  • Added BackboneLambdaFinetuningCallback (#5377)
  • Accelerator all_gather supports collection (#5221)
  • Added image_gradients functional metric to compute the image gradients of a given input image. (#5056)
  • Added MetricCollection (#4318)
  • Added .clone() method to metrics (#4318)
  • Added IoU class interface (#4704)
  • Support to tie weights after moving model to TPU via on_post_move_to_device hook
  • Added missing val/test hooks in LightningModule (#5467)
  • The Recall and Precision metrics (and their functional counterparts recall and precision) can now be generalized to Recall@K and Precision@K with the use of top_k parameter (#4842)
  • Added ModelPruning Callback (#5618, #5825, #6045)
  • Added PyTorchProfiler (#5560)
  • Added compositional metrics (#5464)
  • Added Trainer method predict(...) for high performence predictions (#5579)
  • Added on_before_batch_transfer and on_after_batch_transfer data hooks (#3671)
  • Added AUC/AUROC class interface (#5479)
  • Added PredictLoop object (#5752)
  • Added QuantizationAwareTraining callback (#5706, #6040)
  • Added LightningModule.configure_callbacks to enable the definition of model-specific callbacks (#5621)
  • Added dim to PSNR metric for mean-squared-error reduction (#5957)
  • Added promxial policy optimization template to pl_examples (#5394)
  • Added log_graph to CometLogger (#5295)
  • Added possibility for nested loaders (#5404)
  • Added sync_step to Wandb logger (#5351)
  • Added StochasticWeightAveraging callback (#5640)
  • Added LightningDataModule.from_datasets(...) (#5133)
  • Added PL_TORCH_DISTRIBUTED_BACKEND env variable to select backend (#5981)
  • Added Trainer flag to activate Stochastic Weight Averaging (SWA) Trainer(stochastic_weight_avg=True) (#6038)
  • Added DeepSpeed integration (#5954, #6042)

Changed

  • Changed stat_scores metric now calculates stat scores over all classes and gains new parameters, in line with the new StatScores metric (#4839)
  • Changed computer_vision_fine_tunning example to use BackboneLambdaFinetuningCallback (#5377)
  • Changed automatic casting for LoggerConnector metrics (#5218)
  • Changed iou [func] to allow float input (#4704)
  • Metric compute() method will no longer automatically call reset() (#5409)
  • Set PyTorch 1.4 as min requirements, also for testing and examples torchvision>=0.5 and torchtext>=0.5 (#5418)
  • Changed callbacks argument in Trainer to allow Callback input (#5446)
  • Changed the default of find_unused_parameters to False in DDP (#5185)
  • Changed ModelCheckpoint version suffixes to start at 1 (#5008)
  • Progress bar metrics tensors are now converted to float (#5692)
  • Changed the default value for the progress_bar_refresh_rate Trainer argument in Google COLAB notebooks to 20 (#5516)
  • Extended support for purely iteration-based training (#5726)
  • Made LightningModule.global_rank, LightningModule.local_rank and LightningModule.logger read-only properties (#5730)
  • Forced ModelCheckpoint callbacks to run after all others to guarantee all states are saved to the checkpoint (#5731)
  • Refactored Accelerators and Plugins (#5743)
    • Added base classes for plugins (#5715)
    • Added parallel plugins for DP, DDP, DDPSpawn, DDP2 and Horovod (#5714)
    • Precision Plugins (#5718)
    • Added new Accelerators for CPU, GPU and TPU (#5719)
    • Added Plugins for TPU training (#5719)
    • Added RPC and Sharded plugins (#5732)
    • Added missing LightningModule-wrapper logic to new plugins and accelerator (#5734)
    • Moved device-specific teardown logic from training loop to accelerator (#5973)
    • Moved accelerator_connector.py to the connectors subfolder (#6033)
    • Trainer only references accelerator (#6039)
    • Made parallel devices optional across all plugins (#6051)
    • Cleaning (#5948, #5949, #5950)
  • Enabled self.log in callbacks (#5094)
  • Renamed xxx_AVAILABLE as protected (#5082)
  • Unified module names in Utils (#5199)
  • Separated utils: imports & enums (#5256, #5874)
  • Refactor: clean trainer device & distributed getters (#5300)
  • Simplified training phase as LightningEnum (#5419)
  • Updated metrics to use LightningEnum (#5689)
  • Changed the seq of on_train_batch_end, on_batch_end & on_train_epoch_end, on_epoch_end hooks (#5688)
  • Refactored setup_training and remove test_mode (#5388)
  • Disabled training with zero num_training_batches when insufficient limit_train_batches (#5703)
  • Refactored EpochResultStore (#5522)
  • Update lr_finder to check for attribute if not running fast_dev_run (#5990)
  • LightningOptimizer manual optimizer is more flexible and expose toggle_model (#5771)
  • MlflowLogger limit parameter value length to 250 char (#5893)
  • Re-introduced fix for Hydra directory sync with multiple process (#5993)

Deprecated

  • Function stat_scores_multiple_classes is deprecated in favor of stat_scores (#4839)
  • Moved accelerators and plugins to its legacy pkg (#5645)
  • Deprecated LightningDistributedDataParallel in favor of new wrapper module LightningDistributedModule (#5185)
  • Deprecated LightningDataParallel in favor of new wrapper module LightningParallelModule (#5670)
  • Renamed utils modules (#5199)
    • argparse_utils >> argparse
    • model_utils >> model_helpers
    • warning_utils >> warnings
    • xla_device_utils >> xla_device
  • Deprecated using 'val_loss' to set the ModelCheckpoint monitor (#6012)
  • Deprecated .get_model() with explicit .lightning_module property (#6035)
  • Deprecated Trainer attribute accelerator_backend in favor of accelerator (#6034)

Removed

  • Removed deprecated checkpoint argument filepath (#5321)
  • Removed deprecated Fbeta, f1_score and fbeta_score metrics (#5322)
  • Removed deprecated TrainResult (#5323)
  • Removed deprecated EvalResult (#5633)
  • Removed LoggerStages (#5673)

Fixed

  • Fixed distributed setting and ddp_cpu only with num_processes>1 (#5297)
  • Fixed the saved filename in ModelCheckpoint when it already exists (#4861)
  • Fixed DDPHPCAccelerator hangs in DDP construction by calling init_device (#5157)
  • Fixed num_workers for Windows example (#5375)
  • Fixed loading yaml (#5619)
  • Fixed support custom DataLoader with DDP if they can be re-instantiated (#5745)
  • Fixed repeated .fit() calls ignore max_steps iteration bound (#5936)
  • Fixed throwing MisconfigurationError on unknown mode (#5255)
  • Resolve bug with Finetuning (#5744)
  • Fixed ModelCheckpoint race condition in file existence check (#5155)
  • Fixed some compatibility with PyTorch 1.8 (#5864)
  • Fixed forward cache (#5895)
  • Fixed recursive detach of tensors to CPU (#6007)
  • Fixed passing wrong strings for scheduler interval doesn't throw an error (#5923)
  • Fixed wrong requires_grad state after return None with multiple optimizers (#5738)
  • Fixed add on_epoch_end hook at the end of validation, test epoch (#5986)
  • Fixed missing process_dataloader call for TPUSpawn when in distributed mode (#6015)
  • Fixed progress bar flickering by appending 0 to floats/strings (#6009)
  • Fixed synchronization issues with TPU training (#6027)
  • Fixed hparams.yaml saved twice when using TensorBoardLogger (#5953)
  • Fixed basic examples (#5912, #5985)
  • Fixed fairscale compatible with PT 1.8 (#5996)
  • Ensured process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader (#6015)
  • Attempted SLURM auto resume call when non-shell call fails (#6002)
  • Fixed wrapping optimizers upon assignment (#6006)
  • Fixed allowing hashing of metrics with lists in their state (#5939)

Contributors

@alanhdu, @ananthsub, @awaelchli, @Borda, @borisdayma, @carmocca, @ddrevicky, @deng-cy, @ducthienbui97, @justusschock, @kartik4949, @kaushikb11, @manipopopo, @marload, @neighthan, @peblair, @prampey, @pranjaldatta, @rohitgr7, @SeanNaren, @sid-sundrani, @SkafteNicki, @tadejsv, @tchaton, @teddykoker, @titu1994, @yuntai

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

08 Feb 08:49
e429f97
Compare
Choose a tag to compare

[1.1.8] - 2021-02-08

Fixed

  • Separate epoch validation from step validation (#5208)
  • Fixed toggle_optimizers not handling all optimizer parameters (#5775)

Contributors

@ananthsub, @rohitgr7

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

03 Feb 18:10
90a813f
Compare
Choose a tag to compare

[1.1.7] - 2021-02-03

Fixed

  • Fixed TensorBoardLogger not closing SummaryWriter on finalize (#5696)
  • Fixed filtering of pytorch "unsqueeze" warning when using DP (#5622)
  • Fixed num_classes argument in F1 metric (#5663)
  • Fixed log_dir property (#5537)
  • Fixed a race condition in ModelCheckpoint when checking if a checkpoint file exists (#5144)
  • Remove unnecessary intermediate layers in Dockerfiles (#5697)
  • Fixed auto learning rate ordering (#5638)

Contributors

@awaelchli @guillochon @noamzilo @rohitgr7 @SkafteNicki @sumanthratna

If we forgot someone due to not matching commit email with GitHub account, let us know :]