You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This line f = open(path, 'w'') needs to have encoding='utf-8' to work properly in Windows and avoid the following error when characters are not
UnicodeEncodeError: 'charmap' codec can't encode character <unhandled char> in position 18458: character maps to <undefined>
Modified below:
def to_raw_file(self, path, file_format=None, format_fn=None, header=None, n=None, seed=None, new_sample=True):
"""Flatten all tests into individual examples and print them to file.
Indices of example to test case will be stored in each test.
If n is not None, test.run_idxs will store the test case indexes.
The line ranges for each test will be saved in self.test_ranges.
Parameters
----------
path : string
File path
file_format : string, must be one of 'jsonl', 'squad', 'qqp_test', or None
None just calls str(x) for each example in self.data
squad assumes x has x['question'] and x['passage'], or that format_fn does this
format_fn : function or None
If not None, call this function to format each example in self.data
header : string
If not None, first line of file
n : int
If not None, number of samples to draw
seed : int
Seed to use if n is not None
new_sample: bool
If False, will rely on a previous sample and ignore the 'n' and 'seed' parameters
"""
ret = ''
all_examples = []
add_id = False
if file_format == 'qqp_test':
add_id = True
file_format = 'tsv'
header = 'id\tquestion1\tquestion2'
if header is not None:
ret += header.strip('\n') + '\n'
all_examples = self.get_raw_examples(file_format=file_format, format_fn=format_fn, n=n, seed=seed, new_sample=new_sample)
if add_id and file_format == 'tsv':
all_examples = ['%d\t%s' % (i, x) for i, x in enumerate(all_examples)]
if file_format == 'squad':
ret_map = {'version': 'fake',
'data': []}
for i, x in enumerate(all_examples):
r = {'title': '',
'paragraphs': [{
'context': x['passage'],
'qas': [{'question' : x['question'],
'id': str(i)
}]
}]
}
ret_map['data'].append(r)
ret = json.dumps(ret_map)
else:
ret += '\n'.join(all_examples)
f = open(path, 'w', encoding='utf-8')
f.write(ret)
f.close()
The text was updated successfully, but these errors were encountered:
This line
f = open(path, 'w'')
needs to haveencoding='utf-8'
to work properly in Windows and avoid the following error when characters are notUnicodeEncodeError: 'charmap' codec can't encode character <unhandled char> in position 18458: character maps to <undefined>
Modified below:
The text was updated successfully, but these errors were encountered: