Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Mistral memory consumption with JAX and default dtype bug #1460

Merged
merged 3 commits into from
Feb 27, 2024

Conversation

tirthasheshpatel
Copy link
Contributor

Fixes #1458

This PR updates the presets for the Mistral model. The configs have been updated on Kaggle to not set a default dtype. The JAX memory consumption bug should also be fixed now.

@tirthasheshpatel tirthasheshpatel added the type:Bug Something isn't working label Feb 22, 2024
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! one comment on conversion script

@@ -300,7 +300,7 @@ def main(_):
print("-> Saved the model weights in float16")

# === Save the model config ===
keras_nlp_config["dtype"] = "bfloat16"
keras_nlp_config.pop("dtype") # We don't want a default dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually rename keras_nlp_config -> backbone_kwargs, and save using

keras_nlp.src.utils.preset_utils.save_to_preset(
    keras_nlp_model, preset
)
keras_nlp.src.utils.preset_utils.save_to_preset(
    keras_nlp_tokenizer, preset, config_filename="tokenizer.json"
) 

Those will do the right thing and always call model.get_config() to create the config.json file (which does not include dtype for this reason). No need to regenerate presets if things look good in testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, uploaded the new presets to Kaggle.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@tirthasheshpatel tirthasheshpatel merged commit 7a66555 into keras-team:master Feb 27, 2024
7 of 10 checks passed
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mistral kills the process by taking too many RAM
2 participants