Unlocking Emergent Modularity in Large Language Models

Repo Overview

For more details about the method, please take a look at our paper.

In this section, we provide an overview of the three main components of the repository.

LLMs: This section includes experiments tunning Llama on Alpaca. We use opencompass for evaluation.

Vision: This section includes experiments involving full fine-tuning on Domainbed. It focuses on vision-related tasks and evaluations.

Language: This section contains experiments related to full fine-tuning and LoRA tuning on the GLUE dataset. Additionally, experiments on ID GLUE and OOD GLUE-X are conducted. These experiments primarily pertain to language-related tasks.

Demons: This section provides the data sources and plotting code for all the images featured in the original paper.

Tutel: Modified from Tutel MoE to support the added avg-k gaing.

For specific instructions on running the code for each component, please refer to the README.md file within the corresponding folder.

Environments

Please prepare the Vision and Language environments separately and follow the instructions in each part. Importantly, we add functions like avg-k gating to the original Tutel MoE, so please install tutel from the local file tutel through cd ./tutel and pip install ./.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@article{qiu2023emergent,
  title={Emergent Mixture-of-Experts: Can Dense Pre-trained Transformers Benefit from Emergent Modular Structures?},
  author={Qiu, Zihan and Huang, Zeyu and Fu, Jie},
  journal={arXiv preprint arXiv:2310.10908},
  year={2023}
}

Acknowledgement

The MoE module is built on Tutel MoE. Notice we have added the avg-k function to the original gate, so please install it from the local file tutel and follow the corresponding instructions.

The Vision codebase is built on GMoE and original Domainbed

The language training module is built on Transformers, OOD evaluation is built on GLUE-X.

The MoE split method is built on MoEfication

License

This source code is released under the MIT license, included here.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
EMoE_LLaMA		EMoE_LLaMA
Language		Language
Vision		Vision
demos		demos
tutel		tutel
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unlocking Emergent Modularity in Large Language Models

Repo Overview

Environments

Citation

Acknowledgement

License

About

Releases

Packages

Languages

License

qiuzh20/EMoE

Folders and files

Latest commit

History

Repository files navigation

Unlocking Emergent Modularity in Large Language Models

Repo Overview

Environments

Citation

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages