Skip to content

Releases: Lightning-AI/pytorch-lightning

Weekly patch release

14 Sep 19:22
Compare
Choose a tag to compare

App

Fixed

  • Replace LightningClient with import from lightning_cloud (#18544)

Fabric

Fixed

  • Fixed an issue causing the _FabricOptimizer.state to remain outdated after loading with load_state_dict (#18488)

PyTorch

Fixed

  • Fixed an issue that wouldn't prevent the user to set the log_model parameter in WandbLogger via the LightningCLI (#18458)
  • Fixed the display of v_num in the progress bar when running with Trainer(fast_dev_run=True) (#18491)
  • Fixed UnboundLocalError when running with python -O (#18496)
  • Fixed visual glitch with the TQDM progress bar leaving the validation bar incomplete before switching back to the training display (#18503)
  • Fixed false positive warning about logging interval when running with Trainer(fast_dev_run=True) (#18550)

Contributors

@awaelchli, @Borda, @justusschock, @SebastianGer

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Weekly patch release

30 Aug 12:29
Compare
Choose a tag to compare

App

Changed

  • Change top folder (#18212)
  • Remove _handle_is_headless calls in app run loop (#18362)

Fixed

  • refactor path to root preventing circular import (#18357)

Fabric

Changed

  • On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)

Fixed

  • Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
  • Removed false positive warning when using fabric.no_backward_sync with XLA strategies (#17761)
  • Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
  • Fixed FSDP full-precision param_dtype training (16-mixed, bf16-mixed and 32-true configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)

PyTorch

Changed

  • On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
  • Fix inefficiency in rich progress bar (#18369)

Fixed

  • Fixed FSDP full-precision param_dtype training (16-mixed and bf16-mixed configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
  • Fixed an issue that prevented the use of custom logger classes without an experiment property defined (#18093)
  • Fixed setting the tracking uri in MLFlowLogger for logging artifacts to the MLFlow server (#18395)
  • Fixed redundant iter() call to dataloader when checking dataloading configuration (#18415)
  • Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
  • Properly manage fetcher.done with dataloader_iter (#18376)

Contributors

@awaelchli, @Borda, @carmocca, @quintenroets, @rlizzo, @speediedan, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Weekly patch release

16 Aug 07:30
Compare
Choose a tag to compare

App

Changed

  • Removed the top-level import lightning.pdb; import lightning.app.pdb instead (#18177)
  • Client retries forever (#18065)

Fixed

  • Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)

Fabric

Changed

  • Disabled the auto-detection of the Kubeflow environment (#18137)

Fixed

  • Fixed issue where DDP subprocesses that used Hydra would set hydra's working directory to current directory (#18145)
  • Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
  • Fixed an issue with Fabric.all_reduce() not performing an inplace operation for all backends consistently (#18235)

PyTorch

Added

  • Added LightningOptimizer.refresh() to update the __dict__ in case the optimizer it wraps has changed its internal state (#18280)

Changed

  • Disabled the auto-detection of the Kubeflow environment (#18137))

Fixed

  • Fixed a Missing folder exception when using a Google Storage URL as a default_root_dir (#18088)
  • Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
  • Fixed the gradient unscaling logic if the training step skipped backward (by returning None) (#18267)
  • Ensure that the closure running inside the optimizer step has gradients enabled, even if the optimizer step has it disabled (#18268)
  • Fixed an issue that could cause the LightningOptimizer wrapper returned by LightningModule.optimizers() have different internal state than the optimizer it wraps (#18280)

Contributors

@0x404, @awaelchli, @bilelomrani1, @Borda, @ethanwharris, @nisheethlahoti

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

24 Jul 21:36
Compare
Choose a tag to compare

2.0.6

App

  • Fixed handling a None request in the file orchestration queue (#18111)

Fabric

  • Fixed TensorBoardLogger.log_graph not unwrapping the _FabricModule (#17844)

PyTorch

  • LightningCLI not saving correctly seed_everything when run=True and seed_everything=True (#18056)
  • Fixed validation of non-PyTorch LR schedulers in manual optimization mode (#18092)
  • Fixed an attribute error for _FaultTolerantMode when loading an old checkpoint that pickled the enum (#18094)

Contributors

@awaelchli, @lantiga, @mauvilsa, @shihaoyin

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

10 Jul 16:09
Compare
Choose a tag to compare

App

Added

  • plugin: store source app (#17892)
  • added colocation identifier (#16796)
  • Added exponential backoff to HTTPQueue put (#18013)
  • Content for plugins (#17243)

Changed

  • Save a reference to created tasks, to avoid tasks disappearing (#17946)

Fabric

Added

  • Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)

Changed

  • Avoid info message when loading 0 entry point callbacks (#17990)

Fixed

  • Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)
  • Fixed check for FSDP's flat parameters in all parameter groups (#17914)
  • Fixed automatic step tracking in Fabric's CSVLogger (#17942)
  • Fixed an issue causing the torch.set_float32_matmul_precision info message to show multiple times (#17960)
  • Fixed loading model state when Fabric.load() is called after Fabric.setup() (#17997)

PyTorch

Fixed

  • Fixed delayed creation of experiment metadata and checkpoint/log dir name when using WandbLogger (#17818)
  • Fixed incorrect parsing of arguments when augmenting exception messages in DDP (#17948)
  • Fixed an issue causing the torch.set_float32_matmul_precision info message to show multiple times (#17960)
  • Added missing map_location argument for the LightningDataModule.load_from_checkpoint function (#17950)
  • Fix support for neptune-client (#17939)

Contributors

@anio, @awaelchli, @Borda, @ethanwharris, @lantiga, @nicolai86, @rjarun8, @schmidt-ai, @schuhschuh, @wouterzwerink, @yurijmikhalevich

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

22 Jun 18:23
Compare
Choose a tag to compare

App

Fixed

  • bumped several dependencies to address security vulnerabilities.

Fabric

Fixed

  • Fixed validation of parameters of plugins.precision.MixedPrecision (#17687)
  • Fixed an issue with HPU imports leading to performance degradation (#17788)

PyTorch

Changed

  • Changes to the NeptuneLogger (#16761):
    • It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the log() method with append() and extend().
    • It now accepts a namespace Handler as an alternative to Run for the run argument. This means that you can call it NeptuneLogger(run=run["some/namespace"]) to log everything to the some/namespace/ location of the run.

Fixed

  • Fixed validation of parameters of plugins.precision.MixedPrecisionPlugin (#17687)
  • Fixed deriving default map location in LightningModule.load_from_checkpoint when there is an extra state (#17812)

Contributors

@akreuzer, @awaelchli, @Borda, @jerome-habana, @kshitij12345

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

07 Jun 17:09
7e0b1a1
Compare
Choose a tag to compare

App

Added

  • Added the property LightningWork.public_ip that exposes the public IP of the LightningWork instance (#17742)
  • Add missing python-multipart dependency (#17244)

Changed

  • Made type hints public (#17100)

Fixed

  • Fixed LightningWork.internal_ip that was mistakenly exposing the public IP instead; now exposes the private/internal IP address (#17742)
  • Fixed resolution of the latest version in CLI (#17351)
  • Fixed property raised instead of returned (#17595)
  • Fixed get project (#17617, #17666)

Fabric

Added

  • Added support for Callback registration through entry points (#17756)

Changed

  • Made type hints public (#17100)
  • Support compiling a module after it was set up by Fabric (#17529)

Fixed

  • Fixed computing the next version folder in CSVLogger (#17139)
  • Fixed inconsistent settings for FSDP Precision (#17670)

PyTorch

Changed

  • Made type hints public (#17100)

Fixed

  • CombinedLoader only starts DataLoader workers when necessary when operating in sequential mode (#17639)
  • Fixed a potential bug with uploading model checkpoints to Neptune.ai by uploading files from stream (#17430)
  • Fixed signature inspection of decorated hooks (#17507)
  • The WandbLogger no longer flattens dictionaries in the hyperparameters logged to the dashboard (#17574)
  • Fixed computing the next version folder in CSVLogger (#17139)
  • Fixed a formatting issue when the filename in ModelCheckpoint contained metrics that were substrings of each other (#17610)
  • Fixed WandbLogger ignoring the WANDB_PROJECT environment variable (#16222)
  • Fixed inconsistent settings for FSDP Precision (#17670)
  • Fixed an edge case causing overlapping samples in DDP when no global seed is set (#17713)
  • Fallback to module available check for mlflow (#17467)
  • Fixed LR finder max val batches (#17636)
  • Fixed multithreading checkpoint loading (#17678)

Contributors

@adamjstewart, @AleksanderWWW, @awaelchli, @baskrahmer, @bkiat1123, @Borda, @carmocca, @ethanwharris, @leng-yue, @lightningforever, @manangoel99, @mukhery, @Quasar-Kim, @water-vapor, @yurijmikhalevich

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release: App jobs

24 Apr 13:56
682d7ef
Compare
Choose a tag to compare

App

Fixed

  • Resolved Lightning App with remote storage (#17426)
  • Fixed AppState, streamlit example (#17452)

Fabric

Changed

  • Enable precision autocast for LightningModule step methods in Fabric (#17439)

Fixed

  • Fixed an issue with LightningModule.*_step methods bypassing the DDP/FSDP wrapper (#17424)
  • Fixed device handling in Fabric.setup() when the model has no parameters (#17441)

PyTorch

Fixed

  • Fixed Model.load_from_checkpoint("checkpoint.ckpt", map_location=map_location) would always return model on CPU (#17308)
  • Fixed Sync module states during non-fit (#17370)
  • Fixed an issue that caused num_nodes not to be set correctly for FSDPStrategy (#17438)

Contributors

@awaelchli, @Borda, @carmocca, @ethanwharris, @ryan597, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

12 Apr 15:31
a020506
Compare
Choose a tag to compare

App

Changed

  • Added healthz endpoint to plugin server (#16882)
  • System customization syncing for jobs run (#16932)

Fabric

Changed

  • Let TorchCollective works on the torch.distributed WORLD process group by default (#16995)

Fixed

  • fixed for all _cuda_clearCublasWorkspaces on teardown (#16907)
  • Improved the error message for installing tensorboard or tensorboardx (#17053)

PyTorch

Changed

  • Changed to the NeptuneLogger (#16761):
    • It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the log() method with append() and extend().
    • It now accepts a namespace Handler as an alternative to Run for the run argument. This means that you can call it like NeptuneLogger(run=run["some/namespace"]) to log everything to the some/namespace/ location of the run.
  • Allow sys.argv and args in LightningCLI (#16808)
  • Moveed HPU broadcast override to the HPU strategy file (#17011)

Depercated

  • Removed registration of ShardedTensor state dict hooks in LightningModule.__init__ with torch>=2.1 (#16892)
  • Removed the lightning.pytorch.core.saving.ModelIO class interface (#16974)

Fixed

  • Fixed num_nodes not being set for DDPFullyShardedNativeStrategy (#17160)
  • Fixed parsing the precision config for inference in DeepSpeedStrategy (#16973)
  • Fixed the availability check for rich that prevented Lightning to be imported in Google Colab (#17156)
  • Fixed for all _cuda_clearCublasWorkspaces on teardown (#16907)
  • The psutil package is now required for CPU monitoring (#17010)
  • Improved the error message for installing tensorboard or tensorboardx (#17053)

Contributors

@awaelchli, @belerico, @carmocca, @colehawkins, @dmitsf, @Erotemic, @ethanwharris, @kshitij12345, @Borda

If we forgot someone due to not matching commit email with GitHub account, let us know :]

2.0.1 appendix

11 Apr 18:43
38933be
Compare
Choose a tag to compare

App

Fixed

  • Fix frontend hosts when running with multi-process in the cloud (#17324)

Fabric

No changes.


PyTorch

Fixed

  • Make the is_picklable function more robust (#17270)

Contributors

@eng-yue @ethanwharris @Borda @awaelchli @carmocca

If we forgot someone due to not matching commit email with GitHub account, let us know :]