diff --git a/docs/source/en/main_classes/deepspeed.mdx b/docs/source/en/main_classes/deepspeed.mdx index 11831dbdc401da..a0d6dcc7769e79 100644 --- a/docs/source/en/main_classes/deepspeed.mdx +++ b/docs/source/en/main_classes/deepspeed.mdx @@ -37,7 +37,7 @@ won't be possible on a single GPU. 2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed yourself, core functionality functions like `from_pretrained` and `from_config` include integration of essential parts of DeepSpeed like `zero.Init` for ZeRO stage 3 and higher. To tap into this feature read the docs on - [deepspeed-non-trainer-integration](#deepspeed-non-trainer-integration). + [non-Trainer DeepSpeed Integration](#nontrainer-deepspeed-integration). What is integrated: @@ -1849,7 +1849,6 @@ In this case you usually need to raise the value of `initial_scale_power`. Setti - ## Non-Trainer Deepspeed Integration