Mistral kills the process by taking too many RAM #1458

deep-diver · 2024-02-22T18:12:02Z

preprocessor = keras_nlp.models.MistralCausalLMPreprocessor.from_preset(
    "mistral_instruct_7b_en",
    sequence_length=128,
)
mistral_lm = keras_nlp.models.MistralCausalLM.from_preset(
    "mistral_instruct_7b_en", preprocessor=preprocessor
)

output = mistral_lm.generate("My trip to Yosemite was", max_length=64)
print("\nMistral output:")
print(output)

I was running Mistral model on Colab environment w/ A100(40GB) and 80GB RAM. I loaded up the model successfully. However, when generate text, the RAM usage hit the peak, and the runtime got restarted.

Is this an expected behavior? or could there be bugs?

tirthasheshpatel · 2024-02-22T20:50:18Z

Is this an expected behavior? or could there be bugs?

Mistral loads in bfloat16 by default. I noticed this caused issues in Colab with the JAX backbend (TensorFlow runs fine though with 16.5 GB of RAM).

The workaround to use the dtype set using keras.mixed_precision module is to pass dtype=None to the from_preset method:

mistral_lm = keras_nlp.models.MistralCausalLM.from_preset('mistral_instruct_7b_en', preprocessor=preprocessor, dtype=None)

Let me know if this lowers the RAM usage. This will be fixed in the next release.

mattdangerw · 2024-02-22T21:41:42Z

Thanks for the bug! Just synced up with @tirthasheshpatel. We want to change two things here

By default, mistral should follow global keras default settings. So keras.mixed_precision.set_global_policy("mixed...") -> variables load as float32. keras.config.set_floatx("bfloat16") -> variables load at bfloat16.
There is a bug with the jax backend only, where generation for mistral is consuming significantly too much CPU and GPU memory. It's a one liner fix on our side I think.

These are both simple but important fixes, we should have a patch fix for this in a couple days. Thanks @deep-diver!

deep-diver added the type:Bug Something isn't working label Feb 22, 2024

tirthasheshpatel mentioned this issue Feb 22, 2024

Fix Mistral memory consumption with JAX and default dtype bug #1460

Merged

tirthasheshpatel closed this as completed in #1460 Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral kills the process by taking too many RAM #1458

Mistral kills the process by taking too many RAM #1458

deep-diver commented Feb 22, 2024

tirthasheshpatel commented Feb 22, 2024

mattdangerw commented Feb 22, 2024

Mistral kills the process by taking too many RAM #1458

Mistral kills the process by taking too many RAM #1458

Comments

deep-diver commented Feb 22, 2024

tirthasheshpatel commented Feb 22, 2024

mattdangerw commented Feb 22, 2024