Skip to content

Add Multi-head Latent Attention (DeepSeekv2) #1945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

simoneangarano
Copy link

This pull request contains the modifications made to the codebase in the litgpt repository to add support for Multi-Head Latent Attention (MLA) block from DeepSeekV2.

Changes

  • Configuration: Added latent_attention: Optional[bool] = False parameter to the configuration file to enable the MLA block.
  • MLA module: Implemented the MLA module as a separate component in the litgpt codebase.
  • KVCacheCompressed: Added support for the KVCacheCompressed class to store the key-value pairs for the MLA block.
  • Model: Modified the GPT class to include the MLA block as an alternative component based on the configuration parameter latent_attention.
  • Training: Updated the training script to support the MLA block and added support for training with the new configuration file config_hub/pretrain/cfg.yaml.

Usage

  • Configuration: Set the latent_attention parameter to True in the configuration file to enable the MLA block.
  • Training: Run the training script with the updated configuration file.
    litgpt pretrain --config config_hub/pretrain/cfg.yaml
  • Inference: Use the trained model for inference as follows:
    litgpt generate out/pretrain/mla/final/

@Borda Borda added the enhancement New feature or request label Mar 12, 2025
@Borda
Copy link
Member

Borda commented Mar 20, 2025

@simoneangarano, mind checking the failing tests? :)

@ysjprojects
Copy link
Contributor

@Borda @simoneangarano I can look into the failing tests if that's okay with everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants