wikiner datasets / NER training and add to model #13381

CarloPederiva · 2024-03-15T16:06:43Z

CarloPederiva
Mar 15, 2024

Dear all
I do have a question in regard to training an NER model, better said, add NER to de_dep_news_trf. Having a wikiner dataset which is pre-annotated, I want to use it for training. Hence I created the base_config file and ran the command to create the new cfg file. My problem is that the wikiner data is zipped in a .bz2 folder and when unzipped, it is a .txt file. Using Spacy 3.0, do I have to convert this .txt file first to json (or jsonl) as there is no option to use the spacy convert command on txt file as it is not supported. In case of having to use a json file, I figured that the data must be in a "special" order not just a normal .json file (which I have by now) as this does not let me convert it. Do you have any ideas on how to do that? Is my approach correct to first train it on the wikiner, add it to the existing de_dep_news_trf pipeline (the NER) and then save it to disk as a new model and let it run/train/correct on other datasets? Thanks heeps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wikiner datasets / NER training and add to model #13381

{{title}}

Replies: 0 comments

Select a reply

wikiner datasets / NER training and add to model #13381

CarloPederiva Mar 15, 2024

Replies: 0 comments

CarloPederiva
Mar 15, 2024