Fix Mistral memory consumption with JAX and default dtype bug #1460

tirthasheshpatel · 2024-02-22T23:39:39Z

This PR updates the presets for the Mistral model. The configs have been updated on Kaggle to not set a default dtype. The JAX memory consumption bug should also be fixed now.

mattdangerw

looks good! one comment on conversion script

mattdangerw · 2024-02-23T00:14:37Z

tools/checkpoint_conversion/convert_mistral_checkpoints.py

@@ -300,7 +300,7 @@ def main(_):
 print("-> Saved the model weights in float16")

 # === Save the model config ===
- keras_nlp_config["dtype"] = "bfloat16"
+ keras_nlp_config.pop("dtype") # We don't want a default dtype


I would actually rename keras_nlp_config -> backbone_kwargs, and save using

keras_nlp.src.utils.preset_utils.save_to_preset( keras_nlp_model, preset ) keras_nlp.src.utils.preset_utils.save_to_preset( keras_nlp_tokenizer, preset, config_filename="tokenizer.json" )

Those will do the right thing and always call model.get_config() to create the config.json file (which does not include dtype for this reason). No need to regenerate presets if things look good in testing.

Done, uploaded the new presets to Kaggle.

mattdangerw

lgtm!

…team#1460)

tirthasheshpatel added 2 commits February 22, 2024 23:16

Fix Mistral memory consumption with JAX and default dtype bug

db7de58

Update the presets script to generate the right thing

3096b25

tirthasheshpatel added the type:Bug Something isn't working label Feb 22, 2024

tirthasheshpatel requested a review from mattdangerw February 22, 2024 23:39

mattdangerw approved these changes Feb 23, 2024

View reviewed changes

Use keras_nlp.utils.preset_utils to generate the presets

11b2fc0

mattdangerw approved these changes Feb 27, 2024

View reviewed changes

tirthasheshpatel merged commit 7a66555 into keras-team:master Feb 27, 2024
7 of 10 checks passed

mattdangerw pushed a commit that referenced this pull request Feb 27, 2024

Fix Mistral memory consumption with JAX and default dtype bug (#1460)

47c1ab5

abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024

Fix Mistral memory consumption with JAX and default dtype bug (keras-…

4ba3ca7

…team#1460)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mistral memory consumption with JAX and default dtype bug #1460

Fix Mistral memory consumption with JAX and default dtype bug #1460

tirthasheshpatel commented Feb 22, 2024

mattdangerw left a comment

mattdangerw Feb 23, 2024

tirthasheshpatel Feb 26, 2024

mattdangerw left a comment

Fix Mistral memory consumption with JAX and default dtype bug #1460

Fix Mistral memory consumption with JAX and default dtype bug #1460

Conversation

tirthasheshpatel commented Feb 22, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Feb 23, 2024

Choose a reason for hiding this comment

tirthasheshpatel Feb 26, 2024

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment