Preprocessing script that converts txt file into tacred dataset format #9

BoPengGit · 2019-06-25T23:49:53Z

Is there a script that can convert a txt file into the tacred data format that can then be used for predicting using a pre-trained model? I'm wondering about a preprocessing script that can convert a normal txt file into the tacred dataset format?

Thanks and God bless,

yuhaozhang · 2019-07-17T07:09:25Z

In order to do that, you'll basically need a pipeline for tokenization, pos tagging and named entity recognition. I don't have a script readily available, but it should not be too hard to create one with existing NLP toolkit such as the Stanford CoreNLP toolkit?

On a related note, have you looked at the KBP annotator in CoreNLP (with KBP standing for Knowledge Base Population)? It is a well-packaged pipeline that takes a piece of text as input and outputs relation triples. One difference is that this KBP system is a combination of rules, patterns and a logistic regression classifier, unlike the neural network system in this repo, but the logistic regression classifier is indeed trained on the TACRED dataset, so you should expect decent results from it. More details of this KBP system can be found in this paper.

BoPengGit · 2019-07-17T21:57:00Z

Hi Yuhao,

That's very interesting and thanks for the reference to the KBP annotator. I will look into it in the next month or so.

If you have any other ideas or suggestions of given an input text, outputting relation triplets, please feel free to post it here.

Thanks and I'll get back to you once I look at it.

Best and God bless,

onehaitao · 2020-02-22T09:23:22Z

Hi Yuhao,

That's very interesting and thanks for the reference to the KBP annotator. I will look into it in the next month or so.

If you have any other ideas or suggestions of given an input text, outputting relation triplets, please feel free to post it here.

Thanks and I'll get back to you once I look at it.

Best and God bless,

I write a script to convert SemEval2010 to TACRED dataset format. Maybe it will help you

BoPengGit changed the title ~~Convert new txt file into tacred data format~~ Preprocessing script that converts txt file into tacred dataset format Jun 25, 2019

yuhaozhang mentioned this issue Jul 17, 2019

Why replace subject and object entity with special token? #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing script that converts txt file into tacred dataset format #9

Preprocessing script that converts txt file into tacred dataset format #9

BoPengGit commented Jun 25, 2019

yuhaozhang commented Jul 17, 2019

BoPengGit commented Jul 17, 2019

onehaitao commented Feb 22, 2020

Preprocessing script that converts txt file into tacred dataset format #9

Preprocessing script that converts txt file into tacred dataset format #9

Comments

BoPengGit commented Jun 25, 2019

yuhaozhang commented Jul 17, 2019

BoPengGit commented Jul 17, 2019

onehaitao commented Feb 22, 2020