Jen Wei JenWei0312

Working from home

Pinned Loading

All_things_attention All_things_attention Public

Comparison of different kinds of attentions

Jupyter Notebook 1
deepseek-moe deepseek-moe Public

Python
OLMo OLMo Public

Forked from allenai/OLMo

Modeling, training, eval, and inference code for OLMo

Python
huggingface/trl huggingface/trl Public

Train transformer language models with reinforcement learning.

Python 16.4k 2.3k
allenai/OLMo allenai/OLMo Public

Modeling, training, eval, and inference code for OLMo

Python 6.2k 677
deepseek-mla deepseek-mla Public

Implementation of DeepSeek's Multihead Latent Attention architecture

Python