Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"IndexError: list assignment index out of range" performing delete_words in a text #308

Closed
jonsnowseven opened this issue Aug 27, 2020 · 1 comment

Comments

@jonsnowseven
Copy link

Hello.

I am having an issue augmenting some text (namely, deleting some random words).

Code to reproduce the error:

import random
import math
from functools import partial
from textacy import make_spacy_doc
from textacy.augmentation.augmenter import Augmenter
from textacy.augmentation.transforms import (
    delete_words,
    insert_word_synonyms,
    substitute_word_synonyms,
    swap_words,
)

random.seed(42)

doc = make_spacy_doc(
    """My name is NAME and I am a NAME NAME with NAME, 
    looking after requirement fulfillment for our clients in the NAME. 
    We provide top skilled resources in NAME/ Non NAME, NAME, NAME, NAME NAME and 
    NAME, NAME, NAME, NAME NAME, and others roles. My company, NAME NAME NAME is a 
    NAME / NAME certified staffing supplier headquartered out of NAME, NAME. 
""".strip(),
    lang="en",
)

tfs = [
    partial(delete_words, num=math.ceil(0.05 * len(doc))),
]
augmenter = Augmenter(tfs, num=None)
augmenter.apply_transforms(doc)

Error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-11-a8a4acb0dba3> in <module>
     27 ]
     28 augmenter = Augmenter(tfs, num=None)
---> 29 augmenter.apply_transforms(doc)

~/anaconda3/envs/.../lib/python3.6/site-packages/textacy/augmentation/augmenter.py in apply_transforms(self, doc, **kwargs)
    105             else:
    106                 for tf in tfs:
--> 107                     aug_toks = tf(aug_toks)
    108             new_nested_aug_toks.append(aug_toks)
    109         return self._make_new_spacy_doc(new_nested_aug_toks, lang)

~/anaconda3/envs/.../lib/python3.6/site-packages/textacy/augmentation/transforms.py in delete_words(aug_toks, num, pos)
    231                     pos=prev_tok.pos,
    232                     is_word=prev_tok.is_word,
--> 233                     syns=prev_tok.syns,
    234                 )
    235         else:

IndexError: list assignment index out of range
@bdewilde
Copy link
Collaborator

Hi @jonsnowseven , thanks for the detailed code example! It looks like you got snagged by a bug that requires unfortunate bad luck, owing to random.seed(42) — things are fine with random.seed(41), but of course that's not a special number :) I believe I have a fix, and will commit it to the dev branch shortly. I'm looking to publish a new release of textacy sometime next week, so the fix should be "live" shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants