Skip to content

[In Findings of NAACL 2025] The official implementation of the paper “LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy”

Notifications You must be signed in to change notification settings

sustech-nlp/LayAlign

 
 

Repository files navigation

LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

The official implementation of the paper LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

Overview

Abstract

Despite being pretrained on multilingual corpora, large language models (LLMs) exhibit suboptimal performance on low-resource languages. Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models. However, these methods typically focus on the encoder's output, overlooking valuable information from other layers. We propose Layer-Wise Adaptive Fusion and Alignment Strategy (LayAlign), a framework that integrates representations from all encoder layers, coupled with the adaptive fusion-enhanced attention mechanism to enable layer-wise interaction between the LLM and the multilingual encoder. Extensive experiments on multilingual reasoning tasks, along with analyses of learned representations, show that our approach consistently outperforms existing baselines.

Table of Contents

Environment Setup

We run the experiments on 8 NVIDIA L40 GPU with 48GB memory. The code is developed and tested on Ubuntu 22.04.4 with Python 3.10 and CUDA12.1.

To install the required packages, please follow the instructions below.

conda create -n LayAlign python=3.10 -y
conda activate LayAlign
#if your CUDA version is 12.1
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
cd peft
pip install -e ".[train]"

For peft, We use the layalign_prompt to implement Adaptive Fusion-Enhanced Attention. layalign_prompt(peft/src/peft/tuners/layalign_prompt) is modified from the adaptation_prompt

Datasets

We utilize the MindMerger dataset for our experiments. You can download the dataset here and place it in the current directory.

It is important to note that MindMerger later modified its mathematical training dataset. Our study is based on the initial version of the mathematical dataset. Since this version is no longer available in the official documentation, we have provided access to it here.

Models

We use the checkpoint of LayAlign for math based on MetaMath-7B-V1.0, for x-csqa based on LLaMAX-7B-X-CSQA, and for xnli based on LLaMAX-7B-X-XNLI. mT5-xl is used as multilingual encoder.

You can also download the checkpoint which we trained on math tasks using MetaMath-7B-V1.0 and mT5-xl to evaluate mgsm.

Experiments

training math

We use a two-stage training to train LayAlign

bash scripts/finetune.sh

evaluation math

bash scripts/evaluation_mgsm.sh

Acknowledgement

The code is based on the following repositories, we greatly appreciate the authors for their contributions.

License

This project is licensed under the MIT License.

Citation

If you find this code useful, please consider citing our paper:

@inproceedings{
zhiwenruan2025layalign,
title={LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy},
author={Zhiwen Ruan and Yixia Li and He Zhu and Longyue Wang and Weihua Luo and Kaifu Zhang and Yun Chen and Guanhua Chen},
booktitle={The 2025 Annual Conference of the Nations of the Americas Chapter of the ACL},
year={2025},
url={https://openreview.net/forum?id=KmRjOLJISJ}
}

About

[In Findings of NAACL 2025] The official implementation of the paper “LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 81.7%
  • Python 17.9%
  • Shell 0.3%
  • Dockerfile 0.1%
  • Makefile 0.0%
  • Cuda 0.0%