Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing script that converts txt file into tacred dataset format #9

Open
BoPengGit opened this issue Jun 25, 2019 · 3 comments
Open

Comments

@BoPengGit
Copy link

Is there a script that can convert a txt file into the tacred data format that can then be used for predicting using a pre-trained model? I'm wondering about a preprocessing script that can convert a normal txt file into the tacred dataset format?

Thanks and God bless,

@BoPengGit BoPengGit changed the title Convert new txt file into tacred data format Preprocessing script that converts txt file into tacred dataset format Jun 25, 2019
@yuhaozhang
Copy link
Owner

In order to do that, you'll basically need a pipeline for tokenization, pos tagging and named entity recognition. I don't have a script readily available, but it should not be too hard to create one with existing NLP toolkit such as the Stanford CoreNLP toolkit?

On a related note, have you looked at the KBP annotator in CoreNLP (with KBP standing for Knowledge Base Population)? It is a well-packaged pipeline that takes a piece of text as input and outputs relation triples. One difference is that this KBP system is a combination of rules, patterns and a logistic regression classifier, unlike the neural network system in this repo, but the logistic regression classifier is indeed trained on the TACRED dataset, so you should expect decent results from it. More details of this KBP system can be found in this paper.

@BoPengGit
Copy link
Author

Hi Yuhao,

That's very interesting and thanks for the reference to the KBP annotator. I will look into it in the next month or so.

If you have any other ideas or suggestions of given an input text, outputting relation triplets, please feel free to post it here.

Thanks and I'll get back to you once I look at it.

Best and God bless,

@onehaitao
Copy link

Hi Yuhao,

That's very interesting and thanks for the reference to the KBP annotator. I will look into it in the next month or so.

If you have any other ideas or suggestions of given an input text, outputting relation triplets, please feel free to post it here.

Thanks and I'll get back to you once I look at it.

Best and God bless,

I write a script to convert SemEval2010 to TACRED dataset format. Maybe it will help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants