Skip to content

Commit

Permalink
Merge pull request #68 from tanganke/EnnengYang-patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
tanganke authored Jan 17, 2025
2 parents a983ea2 + 0021aa2 commit 51ac73c
Showing 1 changed file with 10 additions and 34 deletions.
44 changes: 10 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,45 +22,21 @@ FusionBench is a benchmark suite designed to evaluate the performance of various
Projects based on FusionBench and news from the community (descending order of date):

<details>
<summary>Y Wei, et al. Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent. Jan 2025. https://arxiv.org/abs/2501.01230</summary>

Merging multiple expert models offers a promising approach for performing multi-task learning
without accessing their original data. Existing
methods attempt to alleviate task conflicts by sparsifying task vectors or promoting orthogonality
among them. However, they overlook the fundamental requirement of model merging: ensuring the merged model performs comparably to
task-specific models on respective tasks. We find
these methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Based on our findings, we
frame model merging as a constrained optimization problem (i.e., minimizing the gap between
the merged model and individual models, subject to the constraint of retaining shared knowledge) and solve it via adaptive projective gradient descent. Specifically, we align the merged
model with individual models by decomposing
and reconstituting the loss function, alleviating
conflicts through data-free optimization of task
vectors. To retain shared knowledge, we optimize this objective by projecting gradients within
a shared subspace spanning all tasks. Moreover,
we view merging coefficients as adaptive learning rates and propose a task-aware, training-free
strategy. Experiments show that our plug-andplay approach consistently outperforms previous
methods, achieving state-of-the-art results across
diverse architectures and tasks in both vision and
NLP domains. Our code is available here.
<summary>Anke Tang, et al. Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging. Jan 2025. https://arxiv.org/pdf/2501.09522</summary>

Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their specialized capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight interpolation-based methods being the predominant approaches. However, these conventional approaches are not well-suited for scenarios where models become available sequentially, and they often suffer from high memory requirements and potential interference between tasks. In this study, we propose a training-free projection-based continual merging method that processes models sequentially through orthogonal projections of weight matrices and adaptive scaling mechanisms. Our method operates by projecting new parameter updates onto subspaces orthogonal to existing merged parameter updates while using an adaptive scaling mechanism to maintain stable parameter distances, enabling efficient sequential integration of task-specific knowledge. Our approach maintains constant memory complexity to the number of models, minimizes interference between tasks through orthogonal projections, and retains the performance of previously merged models through adaptive task vector scaling. Extensive experiments on CLIP-ViT models demonstrate that our method achieves a 5-8% average accuracy improvement while maintaining robust performance in different task orderings.
</details>

<details>
<summary>Yongxian Wei, et al. Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent. Jan 2025. https://arxiv.org/abs/2501.01230</summary>

Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. Existing methods attempt to alleviate task conflicts by sparsifying task vectors or promoting orthogonality among them. However, they overlook the fundamental requirement of model merging: ensuring the merged model performs comparably to task-specific models on respective tasks. We find these methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Based on our findings, we frame model merging as a constrained optimization problem (i.e., minimizing the gap between the merged model and individual models, subject to the constraint of retaining shared knowledge) and solve it via adaptive projective gradient descent. Specifically, we align the merged model with individual models by decomposing and reconstituting the loss function, alleviating conflicts through data-free optimization of task vectors. To retain shared knowledge, we optimize this objective by projecting gradients within a shared subspace spanning all tasks. Moreover, we view merging coefficients as adaptive learning rates and propose a task-aware, training-free strategy. Experiments show that our plug-andplay approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains. Our code is available here.
</details>

<details>
<summary>Hongling Zheng, Li Shen, Anke Tang, Yong Luo et al. Learn From Model Beyond Fine-Tuning: A Survey. Nature Machine Intelligence. Jan, 2025. https://www.nature.com/articles/s42256-024-00961-0</summary>

> Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields
of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access
extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the
development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model
training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call
Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface,
so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream
tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse,
meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the
capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the
perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the
survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the
research community. The relevant papers we investigated in this article can be accessed at
https://github.com/ruthless-man/Awesome-Learn-from-Model.
> Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at https://github.com/ruthless-man/Awesome-Learn-from-Model.
</details>

<details>
Expand Down

0 comments on commit 51ac73c

Please sign in to comment.