Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CPT documentation #2229

Merged
merged 40 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
92b9e1a
added CPT model to peft
tsachiblau Oct 22, 2024
e54d380
Merge branch 'huggingface:main' into main
tsachiblau Oct 22, 2024
023f071
Merge branch 'huggingface:main' into main
tsachiblau Oct 24, 2024
54cddaf
Merge branch 'huggingface:main' into main
tsachiblau Oct 25, 2024
2dfe70f
Added arXiv link to the paper, integrated CPT into testing framework,…
tsachiblau Oct 25, 2024
ba4b115
Merge branch 'huggingface:main' into main
tsachiblau Oct 25, 2024
f8c8317
Merge branch 'huggingface:main' into main
tsachiblau Oct 30, 2024
bd2fc70
config: Added config check in __post_init__. Removed redundant initia…
tsachiblau Oct 30, 2024
b01b214
Merge branch 'main' of https://github.com/tsachiblau/peft_CPT
tsachiblau Oct 30, 2024
6ed1723
Merge branch 'huggingface:main' into main
tsachiblau Nov 3, 2024
77bb0b9
tests: Updated test_cpt and testing_common as per the PR requirements.
tsachiblau Nov 3, 2024
dbcdedf
Created cpt.md in package_regerence. Updated the prompting.md file. a…
tsachiblau Nov 3, 2024
f7138d4
Merge branch 'huggingface:main' into main
tsachiblau Nov 5, 2024
0a5fb20
verifying that the model is causal LM
tsachiblau Nov 5, 2024
7206db5
Changed CPTModel to CPTEmbedding
tsachiblau Nov 5, 2024
24b0af9
merge with main branch
tsachiblau Nov 5, 2024
81ffa09
make style
tsachiblau Nov 7, 2024
130ec76
make style
tsachiblau Nov 7, 2024
70067d8
make style
tsachiblau Nov 7, 2024
9397314
make doc
tsachiblau Nov 8, 2024
249713c
Merge branch 'huggingface:main' into main
tsachiblau Nov 10, 2024
0a43473
Removed redundant checks
tsachiblau Nov 10, 2024
144f042
Fixed errors
tsachiblau Nov 13, 2024
97449da
merge with peft
tsachiblau Nov 13, 2024
dacb400
Minor code updates.
tsachiblau Nov 13, 2024
cc348a4
Minor code updates.
tsachiblau Nov 17, 2024
79959d1
Merge branch 'huggingface:main' into main
tsachiblau Nov 18, 2024
7eea892
Minor code updates.
tsachiblau Nov 18, 2024
6d625c0
Merge branch 'huggingface:main' into main
tsachiblau Nov 21, 2024
d120d13
Merge branch 'huggingface:main' into main
tsachiblau Nov 21, 2024
9ae9939
Update Doc
tsachiblau Nov 21, 2024
2fada31
Update Doc
tsachiblau Nov 21, 2024
43260c7
Merge remote-tracking branch 'origin/main'
tsachiblau Nov 21, 2024
ebf5aaa
Update Doc
tsachiblau Nov 25, 2024
b3b5f6e
Update notebook (works on colab)
tsachiblau Nov 26, 2024
e7de80e
Merge branch 'huggingface:main' into main
tsachiblau Nov 28, 2024
41b382d
update doc
tsachiblau Nov 28, 2024
604da6c
update doc
tsachiblau Nov 28, 2024
122567c
update doc
tsachiblau Nov 28, 2024
9ab5078
update doc
tsachiblau Nov 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/conceptual_guides/prompting.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,4 @@ In CPT, only specific context token embeddings are optimized, while the rest of
To prevent overfitting and maintain stability, CPT uses controlled perturbations to limit the allowed changes to context embeddings within a defined range.
Additionally, to address the phenomenon of recency bias—where examples near the end of the context tend to be prioritized over earlier ones—CPT applies a decay loss factor.

Take a look at [Context-Aware Prompt Tuning for few-shot classification](../task_guides/cpt-few-shot-classification) for a step-by-step guide on how to train a model with CPT.
Take a look at [Example](https://github.com/huggingface/peft/blob/main/examples/cpt_finetuning/README.md) for a step-by-step guide on how to train a model with CPT.
15 changes: 9 additions & 6 deletions docs/source/package_reference/cpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,24 @@ the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Context-Aware Prompt Tuning (CPT)
# Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods

[Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods (CPT)](https://huggingface.co/papers/2410.17222) combines In-Context Learning (ICL) with Prompt Tuning (PT) and adversarial optimization to improve few-shot learning by refining context embeddings. CPT optimizes only context tokens, which minimizes overfitting and enhances performance on classification tasks.
[CPT](https://huggingface.co/papers/2410.17222) combines In-Context Learning (ICL), Prompt Tuning (PT), and adversarial optimization to improve few-shot learning by refining context embeddings. CPT updates the context tokens by optimizing both the context and the training examples, encapsulating them into a novel loss design that minimizes overfitting, enables more effective optimization, and drives significant improvements in classification tasks.

[//]: # ([CPT](https://huggingface.co/papers/2410.17222) for the paper)

The abstract from the paper is:

*Traditional fine-tuning is effective but computationally intensive, as it requires updating billions of parameters. CPT, inspired by ICL, PT, and adversarial attacks, refines context embeddings in a parameter-efficient manner. By optimizing context tokens and applying a controlled gradient descent, CPT achieves superior accuracy across various few-shot classification tasks, showing significant improvement over existing methods such as LoRA, PT, and ICL.*
> Large Language Models (LLMs) can perform few-shot learning using either optimization-based approaches or In-Context Learning (ICL). Optimization-based methods often suffer from overfitting, as they require updating a large number of parameters with limited data. In contrast, ICL avoids overfitting but typically underperforms compared to optimization-based methods and is highly sensitive to the selection, order, and format of demonstration examples. To overcome these challenges, we introduce Context-aware Prompt Tuning (CPT), a method inspired by ICL, Prompt Tuning (PT), and adversarial attacks. CPT builds on the ICL strategy of concatenating examples before the input, extending it by incorporating PT-like learning to refine the context embedding through iterative optimization, extracting deeper insights from the training examples. Our approach carefully modifies specific context tokens, considering the unique structure of the examples within the context. In addition to updating the context with PT-like optimization, CPT draws inspiration from adversarial attacks, adjusting the input based on the labels present in the context while preserving the inherent value of the user-provided data. To ensure robustness and stability during optimization, we employ a projected gradient descent algorithm, constraining token embeddings to remain close to their original values and safeguarding the quality of the context. Our method has demonstrated superior accuracy across multiple classification tasks using various LLM models, outperforming existing baselines and effectively addressing the overfitting challenge in few-shot learning.


Take a look at [Example](https://github.com/huggingface/peft/blob/main/examples/cpt_finetuning/README.md) for a step-by-step guide on how to train a model with CPT.


## CPTConfig

Expand Down
64 changes: 64 additions & 0 deletions examples/cpt_finetuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@

# Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
## Introduction ([Paper](https://arxiv.org/abs/2410.17222), [Code](https://github.com/tsachiblau/Context-aware-Prompt-Tuning-Advancing-In-Context-Learning-with-Adversarial-Methods), [Notebook](cpt_train_and_inference.ipynb), [Colab](https://colab.research.google.com/drive/1UhQDVhZ9bDlSk1551SuJV8tIUmlIayta?usp=sharing))

> Large Language Models (LLMs) can perform few-shot learning using either optimization-based approaches or In-Context Learning (ICL). Optimization-based methods often suffer from overfitting, as they require updating a large number of parameters with limited data. In contrast, ICL avoids overfitting but typically underperforms compared to optimization-based methods and is highly sensitive to the selection, order, and format of demonstration examples. To overcome these challenges, we introduce Context-aware Prompt Tuning (CPT), a method inspired by ICL, Prompt Tuning (PT), and adversarial attacks. CPT builds on the ICL strategy of concatenating examples before the input, extending it by incorporating PT-like learning to refine the context embedding through iterative optimization, extracting deeper insights from the training examples. Our approach carefully modifies specific context tokens, considering the unique structure of the examples within the context. In addition to updating the context with PT-like optimization, CPT draws inspiration from adversarial attacks, adjusting the input based on the labels present in the context while preserving the inherent value of the user-provided data. To ensure robustness and stability during optimization, we employ a projected gradient descent algorithm, constraining token embeddings to remain close to their original values and safeguarding the quality of the context. Our method has demonstrated superior accuracy across multiple classification tasks using various LLM models, outperforming existing baselines and effectively addressing the overfitting challenge in few-shot learning.



<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/cpt.png"/>
</div>
<small>CPT optimizing only specific token embeddings while keeping the rest of the model frozen <a href="https://huggingface.co/papers/2410.17222">(image source)</a>.</small>

---

## Dataset Creation and Collation for CPT

This document explains how to prepare datasets for CPT, linking the dataset preparation processes in the code to the methods and principles described in the CPT paper, specifically in **Sections 3.1**, **3.2**, and **3.3**.

---

### Template-Based Tokenization

#### The Role of Templates
Templates define the structure of the input-output pairs, enabling the model to interpret the task within a unified context.

- **Input Templates**:
Templates like `"input: {sentence}"` structure raw input sentences. The `{sentence}` placeholder is replaced with the actual input text.

- **Output Templates**:
Templates such as `"output: {label}"` format the labels (e.g., `positive`, `negative`, etc.).

- **Separator Tokens**:
Separators distinguish different parts of the input, such as the input text and labels, as well as separate examples within the context.


#### How CPT Utilizes Context Structure

CPT leverages the context structure, encoded within the `cpt_tokens_type_mask`, to optimize the context effectively. to optimize the context effectively. By treating different token types based on their roles, the model updates some tokens while using others solely for optimization:

1. **Refrain from Updating Label Tokens**:
Some context tokens represent label tokens, which contain valuable, unmodifiable information. By excluding these tokens from updates during training, CPT ensures that the labels remain fixed, preserving their integrity.

2. **Apply Type-Specific Projection Norms**:
CPT employs Projected Gradient Descent (PGD) to update context embeddings, applying tailored norms to different context parts. This approach reduces overfitting while maintaining robustness and generalization by preserving the integrity of user-provided examples.



#### Limitations
CPT is designed for few-shot scenarios, as concatenating more examples increases memory usage due to the self-attention mechanism and additional loss terms. For larger datasets, users can limit the number of context examples and use the remaining samples solely for optimization to manage memory efficiently.




Comment on lines +51 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

## Citation
```bib
@article{
blau2025cpt,
title={Context-Aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods},
author={Tsachi Blau, Moshe Kimhi, Yonatan Belinkov, Alexander Bronstein, Chaim Baskin},
journal={arXiv preprint arXiv:2410.17222}},
year={2025}
}
```
Loading