You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, loading data in conll format fails on my custom dataset with non-ascii characters. So when I read data with encoding 'utf-8' set, I get corresponding errors here:
File "/usr/local/lib/python2.7/dist-packages/seqlearn/datasets.py", line 65, in <genexpr>
lines = (str.split(line) for line in f)
TypeError: descriptor 'split' requires a 'str' object but received a 'unicode'
def_conll_sequences(f, features, labels, lengths, split):
# Divide input into blocks of empty and non-empty lines.lines= (str.strip(line) forlineinf)
Everything works perfectly, when I modify the last line like that:
lines = (line.strip() for line in f)
Is there anything that makes such fix unwanted?
The text was updated successfully, but these errors were encountered:
Hi, loading data in conll format fails on my custom dataset with non-ascii characters. So when I read data with encoding 'utf-8' set, I get corresponding errors here:
Everything works perfectly, when I modify the last line like that:
Is there anything that makes such fix unwanted?
The text was updated successfully, but these errors were encountered: