Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
barebones_hymba_block.py		barebones_hymba_block.py
test_barebones_hymba.py		test_barebones_hymba.py

README.md

Barebones Hymba

This is a minimal implementation of Hymba model.

NOTE: This version is only for demonstration purposes and does not include all the features of the full model. For full implementation, please refer to HF/Hymba-1.5B-Instruct

This minimal implementation includes:

Parallel fused attention and mamba heads
Meta tokens and how to use them with sliding window attention
Mix of local and global attention

and does not include:

Cache management for generation during inference
Cross-layer kv reusing

Run Barebones Hymba

python test_barebones_hymba.py

Feel free to change the HymbaConfig to build your own Hymba model.

class HymbaConfig:
    num_layers = 32
    global_layer_list = [5,11,18,25,31]
    hidden_size = 1536
    vocab_size = 151936
    num_meta_tokens = 256
    mamba_expand = 2
    num_attention_heads = 12
    num_key_value_heads = 2
    conv_kernel_size = 3
    time_step_rank = 8
    ssm_state_size = 16
    attention_window_size = 2048
    mlp_hidden_act = "silu"
    intermediate_size = 4608
    modify_attention_mask = True # set this to False if you want to use flashattention for simplicity
    seq_length = 4096
    use_positional_embedding = True
    rope_base = 10000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

barebones_hymba

barebones_hymba

README.md

Barebones Hymba

Run Barebones Hymba

Files

barebones_hymba

Directory actions

More options

Directory actions

More options

Latest commit

History

barebones_hymba

Folders and files

parent directory

README.md

Barebones Hymba

Run Barebones Hymba