You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am compiling the latest build from the github itself, and have encountered another issue!
What if the there is no 'index' for a particular word? This is my first thought on seeing the error below.
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/flow/sometimes.py", line 22, in augment
augmented_text = aug.augment(augmented_text)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/base_augmenter.py", line 65, in augment
return self.substitute(data)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/augmenter/word/word_embs_aug.py", line 57, in substitute
candidate_words = self.model.predict(original_word, top_n=self.aug_n)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/model/word_embs/word_embeddings.py", line 48, in predict
source_id = self.word2idx(word)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/model/word_embs/word_embeddings.py", line 26, in word2idx
return self.w2i[word]
KeyError: 'thethe'
The above is acceptable but it should be handled and is fixable. But for below
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/flow/sometimes.py", line 22, in augment
augmented_text = aug.augment(augmented_text)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/base_augmenter.py", line 65, in augment
return self.substitute(data)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/augmenter/word/word_embs_aug.py", line 57, in substitute
candidate_words = self.model.predict(original_word, top_n=self.aug_n)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/model/word_embs/word_embeddings.py", line 48, in predict
source_id = self.word2idx(word)
File "/home/projects/bot/bot_py3/faq/bot-ml/lib/python3.6/site-packages/nlpaug/model/word_embs/word_embeddings.py", line 26, in word2idx
return self.w2i[word]
KeyError: 'How'
This should not happen! as there is definately an embedding for 'How',
The text was updated successfully, but these errors were encountered:
I've also seen this behavior. Perhaps this is due to cased/uncased embeddings.
For now I'm just lower casing everything and retrying if anything goes wrong.
@neerajvashistha@ricardopieper
It happens when using any one of traditional word embeddgings (word2vec, GloVe, fasttext) augmenter. The root cause is that those words is out-of-vocabulary (OOV or unknown words).
Target to exclude OOV during augmentation. In other word, OOV will not be pick for augmentation. Although it is possible to calculate "most" similar word, I will prefer either exclude OOV. Up coming release will exclude OOV.
Hi Edward,
I am compiling the latest build from the github itself, and have encountered another issue!
What if the there is no 'index' for a particular word? This is my first thought on seeing the error below.
The above is acceptable but it should be handled and is fixable. But for below
This should not happen! as there is definately an embedding for 'How',
The text was updated successfully, but these errors were encountered: