Fail to train openai-community / gpt2 model for custom NER on SpaCy framework #13349
Replies: 1 comment
-
Hi! There's a bit of a confusion here between spaCy's tokenizer and the HuggingFace transformer tokenizer. spaCy's tokenizer is really just used to determine token boundaries in the text. It's defined in the The error message that you're getting:
on the other hand, is thrown by the Huggingface
The tool that bridges between spaCy and Huggingface here is called
Note in particular the
Now you can extend the tokenizer config to define the
Let us know if that works! |
Beta Was this translation helpful? Give feedback.
-
I want to create a custom NER tag using GPT2. I want to use this model. I am familiar with SpaCy custom training framework. I formatted the config.cfg file as per the requirement. The config.cfg file is as follows
I received an error "ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})."
I posted this issue to the OpenAi forum and came to know from @younesbelkada that I need to call tokenizer.pad_token = tokenizer.eos_token before launching the training.
I modified the config.cfg file again. The modification portion is as follows
Now I received the error "✘ Config validation error
nlp -> tokenizer.pad_token extra fields not permitted
{'lang': 'en', 'pipeline': ['transformer', 'ner'], 'batch_size': 128, 'disabled': [], 'before_creation': None, 'after_creation': None, 'after_pipeline_creation': None, 'tokenizer': {'@Tokenizers': 'spacy.Tokenizer.v1'}, 'vectors': {'@vectors': 'spacy.Vectors.v1'}, 'tokenizer.pad_token': 'tokenizer.eos_token'}"
Can you please let me know how can I fix this issue and train the GPT2 model.
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions