Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does relaxing the constraint (RepeatModification) lead to less successful augmentation? #789

Open
YanghaoZYH opened this issue May 3, 2024 · 0 comments

Comments

@YanghaoZYH
Copy link

To Reproduce
Run following code ...

from textattack.augmentation import Augmenter
from textattack.transformations import WordSwapEmbedding
from textattack.constraints.semantics import WordEmbeddingDistance
from textattack.constraints.grammaticality import PartOfSpeech
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.shared import AttackedText

text_sample = "woody , what happened ?"
num_words_to_swap = len(AttackedText(text_sample).words) -1 # minus as what is a stop word
max_candidates = 50

num_samples = max_candidates**num_words_to_swap
print('max num_samples:', num_samples)

# Define constraints to ensure quality of perturbations
constraints = [StopwordModification(),RepeatModification()]
constraints.append(WordEmbeddingDistance(min_cos_sim=0.5))
constraints.append(PartOfSpeech(allow_verb_noun_swap=True))

# Define the transformation method
transformation = WordSwapEmbedding(
    max_candidates=50  # Number of candidates to generate per word
)

# Combine transformation and constraints in an Augmenter
augmenter = Augmenter(
    transformation=transformation,
    constraints=constraints,
    pct_words_to_swap=1,  # Percentage of words to swap per perturbation
    transformations_per_example=num_samples  # Number of perturbations to generate per input
)

perturbations = augmenter.augment(text_sample)
actural_num_samples = len(perturbations)
print('actural_num_samples: ',actural_num_samples)

Which gives me the output:

max num_samples: 2500
actural_num_samples:  532

But when I delete the RepeatModification constraint the other constraints and code remains the same:

constraints = [StopwordModification()]

gives me the output:

max num_samples: 2500
actural_num_samples:  277

Expected behavior
I expect that easing the constraint should increase the num_samples, but it shows the opposite.
Is there anything I misunderstood or is there a bug?

System Information (please complete the following information):

  • OS: Linux
  • Library versions torch==2.3.0, transformers==4.40.1
  • Textattack version 0.3.10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant