-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsupported tokenizer 'OpenAI.BPE' #1049
Comments
Comment by iftenney @pyeres @pruksmhc maybe this got renamed recently? In the mean time, unless you're trying to probe GPT-1 you can just comment out this line: https://github.com/nyu-mll/jiant/blob/master/probing/get_and_process_all_data.sh#L35 |
Comment by pyeres Looks like this is the result of PR nyu-mll/jiant#881 "Replacing old GPT implement with the one from huggingface pytorch transformers". @HaokunLiu, can you take a look? |
Comment by HaokunLiu Okay. You can either choose auto as your tokenizer or simply use the same On Fri, Apr 3, 2020 at 11:07 AM Phil Yeres notifications@github.com wrote:
|
Comment by lovodkin93
I didn't quite follow you - where should I choose the auto tokenizer? Also, what is the auto-tokenizer? And what do you mean by using the same string as my input_module? |
Comment by HaokunLiu Oh, sorry. I thought it was the main program. For this preprocessing On Fri, Apr 3, 2020 at 12:52 PM lovodkin93 notifications@github.com wrote:
|
Comment by lovodkin93
You mean replace the "OpenAI.BPE" in the following line with "openai-gpt" or with "gpt2-medium"? |
Comment by HaokunLiu Exactly On Fri, Apr 3, 2020 at 12:58 PM lovodkin93 notifications@github.com wrote:
|
Issue by lovodkin93
Thursday Apr 02, 2020 at 15:05 GMT
Originally opened as nyu-mll/jiant#1049
hello,
I've been trying to pre-process the data, as was written in the README file located in the probing idirectory.
I ran the following command (which i took from the README mentioned above):
mkdir -p $JIANT_DATA_DIR
./get_and_process_all_data.sh $JIANT_DATA_DIR
and got the following error message:
Traceback (most recent call last):
File "./retokenize_edge_data.py", line 97, in
main(sys.argv[1:])
File "./retokenize_edge_data.py", line 93, in main
retokenize_file(fname, args.tokenizer_name, worker_pool=worker_pool)
File "./retokenize_edge_data.py", line 83, in retokenize_file
for line in tqdm(worker_pool.imap(map_fn, inputs, chunksize=500), total=len(inputs)):
File "/cs/labs/oabend/lovodkin93/anaconda3/envs/jiant/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1022, in iter
for obj in iterable:
File "/cs/labs/oabend/lovodkin93/anaconda3/envs/jiant/lib/python3.6/multiprocessing/pool.py", line 320, in
return (item for chunk in result for item in chunk)
File "/cs/labs/oabend/lovodkin93/anaconda3/envs/jiant/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
ValueError: Unsupported tokenizer 'OpenAI.BPE'
i tried to download the "openai gpt-2" model which i saw had the tokenizer mentioned, but it appears it requires the python version to be 3.7, while the one the jiant environement is working on is of older version.
Has anyone seen this error before, or knows how to solve it?
@iftenney
The text was updated successfully, but these errors were encountered: