Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-ascii characters throw errors in FB13 #2

Open
talkhaldi opened this issue Nov 8, 2019 · 2 comments
Open

non-ascii characters throw errors in FB13 #2

talkhaldi opened this issue Nov 8, 2019 · 2 comments

Comments

@talkhaldi
Copy link

Hi,

Thank you for your work and publishing the code. I'm trying to run triple_classification example for FB13 as in the readme file, but I'm getting the following error:

Traceback (most recent call last): File "run_bert_triple_classifier.py", line 847, in <module> main() File "run_bert_triple_classifier.py", line 556, in main train_examples = processor.get_train_examples(args.data_dir) File "run_bert_triple_classifier.py", line 120, in get_train_examples self._read_tsv(os.path.join(data_dir, "train.tsv")), "train", data_dir) File "run_bert_triple_classifier.py", line 173, in _create_examples ent_lines = f.readlines() File "/mnt/orange/ubrew/data/opt/python/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 27: ordinal not in range(128)

Is this behavior expectable?
I can change the open command to have encoding="utf-8" argument, but it becomes extremely slow. How did you deal with this issue?

@yao8839836
Copy link
Owner

yao8839836 commented Nov 12, 2019

@talkhaldi

Hi, I didn't see this problem with my Python 3.5 or 3.6 enviorement.

You may want to try this:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

@Zhizhizhi997
Copy link

@yao8839836 I also meet with this problem. But I think the method you provide only works for python2 instead of python3. I change

with open(os.path.join(data_dir, "entity2text.txt"), 'r')
to
with open(os.path.join(data_dir, "entity2text.txt"), 'r', encoding="utf")

It seems to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants