make activation checkpointing configurable #211

ssmmnn11 · 2025-03-21T14:59:59Z

Effort to make activation checkpointing configurable. With more GPU memory, small models will be able to run without full checkpointing and therefore we can increase throughput.

Also sets the processor default chunking values to more sane defaults.

theissenhelen · 2025-03-21T15:24:18Z

training/src/anemoi/training/train/train.py

@@ -189,6 +189,16 @@ def model(self) -> GraphForecaster:
                freeze_submodule_by_name(model, submodule_name)
                LOGGER.info("%s frozen successfully.", submodule_name.upper())

+        if self.config.training.activation_checkpointing.encoder:


I think we could get rid of this bit by using the model_config in the encoder_processor_decoder.

make activation checkpointing configurable

56fe02b

ssmmnn11 requested review from theissenhelen, jakob-schloer and japols March 21, 2025 14:59

github-project-automation bot added this to Anemoi-dev Mar 21, 2025

github-project-automation bot moved this to Now In Progress in Anemoi-dev Mar 21, 2025

github-actions bot added training models and removed training models labels Mar 21, 2025

Merge branch 'main' into feat/act_checkpoint_config

3b40dba

theissenhelen reviewed Mar 21, 2025

View reviewed changes

anaprietonem marked this pull request as draft July 4, 2025 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make activation checkpointing configurable #211

make activation checkpointing configurable #211

Uh oh!

ssmmnn11 commented Mar 21, 2025

Uh oh!

theissenhelen Mar 21, 2025

Uh oh!

Uh oh!

make activation checkpointing configurable #211

Are you sure you want to change the base?

make activation checkpointing configurable #211

Uh oh!

Conversation

ssmmnn11 commented Mar 21, 2025

Uh oh!

theissenhelen Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!