You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @DTchebotarev , this is not currently a feature, but I appreciate that padding sequences is a common task in deep learning. I've been dragging my feet on getting DL models into textacy, but when I do, I'd expect to include useful adjacent functionality like this as well.
lwords, word, rwords=window[:window_size], window[window_size], window[window_size+1:]
Unlike extract.ngrams(), this method produces Tuple[Token] rather than Span objects, so it doesn't work in the context of to_terms_list(). But maybe it's helpful.
Is it possible to add context to ngram extraction?
For example, currently running
list(textacy.Doc('I like green eggs and ham.').to_terms_list(ngrams=3,as_strings=True))
returns a list
['-PRON- like green', 'like green egg', 'egg and ham']
But I would ideally like to have the option to specify something like
list(textacy.Doc('I like green eggs and ham.').to_terms_list(ngrams=3,as_strings=True, left_pad=True, right_pad=True))
and have it return something along the lines of
['<s2> <s1> -PRON', '<s1> -PRON- like' ,'-PRON- like green', 'like green egg', 'egg and ham', 'and ham </s1>', 'ham </s1> </s2>]
I don't think this is possible in textacy currently, so I guess this is a feature request.
Also any ideas for a workaround are greatly appreciated :)
The text was updated successfully, but these errors were encountered: