LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

The official implementation of the paper LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

Abstract

Despite being pretrained on multilingual corpora, large language models (LLMs) exhibit suboptimal performance on low-resource languages. Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models. However, these methods typically focus on the encoder's output, overlooking valuable information from other layers. We propose Layer-Wise Adaptive Fusion and Alignment Strategy (LayAlign), a framework that integrates representations from all encoder layers, coupled with the adaptive fusion-enhanced attention mechanism to enable layer-wise interaction between the LLM and the multilingual encoder. Extensive experiments on multilingual reasoning tasks, along with analyses of learned representations, show that our approach consistently outperforms existing baselines.

Environment Setup

We run the experiments on 8 NVIDIA L40 GPU with 48GB memory. The code is developed and tested on Ubuntu 22.04.4 with Python 3.10 and CUDA12.1.

To install the required packages, please follow the instructions below.

conda create -n LayAlign python=3.10 -y
conda activate LayAlign
#if your CUDA version is 12.1
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
cd peft
pip install -e ".[train]"

For peft, We use the layalign_prompt to implement Adaptive Fusion-Enhanced Attention. layalign_prompt(peft/src/peft/tuners/layalign_prompt) is modified from the adaptation_prompt

Datasets

We utilize the MindMerger dataset for our experiments. You can download the dataset here and place it in the current directory.

It is important to note that MindMerger later modified its mathematical training dataset. Our study is based on the initial version of the mathematical dataset. Since this version is no longer available in the official documentation, we have provided access to it here.

Models

We use the checkpoint of LayAlign for math based on MetaMath-7B-V1.0, for x-csqa based on LLaMAX-7B-X-CSQA, and for xnli based on LLaMAX-7B-X-XNLI. mT5-xl is used as multilingual encoder.

You can also download the checkpoint which we trained on math tasks using MetaMath-7B-V1.0 and mT5-xl to evaluate mgsm.

Experiments

training math

We use a two-stage training to train LayAlign

bash scripts/finetune.sh

evaluation math

bash scripts/evaluation_mgsm.sh

Acknowledgement

The code is based on the following repositories, we greatly appreciate the authors for their contributions.

MindMerger: a new method for multilingual reasoning.
adaptation_prompt in peft

License

This project is licensed under the MIT License.

Citation

If you find this code useful, please consider citing our paper:

@inproceedings{
zhiwenruan2025layalign,
title={LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy},
author={Zhiwen Ruan and Yixia Li and He Zhu and Longyue Wang and Weihua Luo and Kaifu Zhang and Yun Chen and Guanhua Chen},
booktitle={The 2025 Annual Conference of the Nations of the Americas Chapter of the ACL},
year={2025},
url={https://openreview.net/forum?id=KmRjOLJISJ}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
datas		datas
ds_configs		ds_configs
images		images
peft		peft
scripts		scripts
tools		tools
.gitignore		.gitignore
LayAlign.py		LayAlign.py
README.md		README.md
evaluation.py		evaluation.py
finetune.py		finetune.py
layer_wise_aligner.py		layer_wise_aligner.py
requirements.txt		requirements.txt
run_evaluation.py		run_evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

Abstract

Table of Contents

Environment Setup

Datasets

Models

Experiments

training math

evaluation math

Acknowledgement

License

Citation

About

Releases

Packages

Languages

sustech-nlp/LayAlign

Folders and files

Latest commit

History

Repository files navigation

LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

Abstract

Table of Contents

Environment Setup

Datasets

Models

Experiments

training math

evaluation math

Acknowledgement

License

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages