Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relation Extractor custom entities #359

Open
aoldoni opened this issue Feb 22, 2017 · 8 comments
Open

Relation Extractor custom entities #359

aoldoni opened this issue Feb 22, 2017 · 8 comments

Comments

@aoldoni
Copy link

aoldoni commented Feb 22, 2017

Hi All,

Thanks for the great software. I would like to ask you the following please.

When training specific relations to be extracted from custom Entity types, using the Relation Extractor, I noted that the current possible entities are "hard-coded" in some parts, e.g.:

By modifying these 2 bits, one can re-use the Relation Extractor successfully with custom entities in case its needed, but this requires then a recompilation and an initial troubleshooting as to understand this.

Would you be interested in a pull-request that refactors these hard-coded methods in something that is obtainable from the properties file? E.g.: in the properties file one can indicate a "entitiesPath" option which would then point to a tab separated file with the normalised and not normalised values of these entities as its columns.

If this option is not provided potentially these default hard coded entities can then be used as to maintain the current behaviour.

This would cause potential Relation Extractor workflows with custom entities to be possible without code recompilation.

Please advise.

Again, thanks!

@aoldoni aoldoni changed the title Relation Extractor custom relations Relation Extractor custom entities Feb 22, 2017
@J38
Copy link
Contributor

J38 commented Feb 22, 2017

There has been a lot of interest in making custom relation extraction training available, but I think the path forward is to make it easier to train models for new relations that work with the KBPAnnotator. I'm going to try to make sure there is clear documentation and any code changes necessary for that for Stanford CoreNLP 3.8.0.

@aoldoni
Copy link
Author

aoldoni commented Feb 22, 2017

OK, cool, thanks for the prompt response.

So the understanding is that the input format will be migrated from the Roth CONLL04 format to the KBP format for the training, and at that point this will become flexible.

At this point I have a small customisation locally to adjust this and will continue to use such method.

@J38
Copy link
Contributor

J38 commented Mar 10, 2017

Yes that would be the plan. I'm going to start working on this and hopefully it won't take too long. By the way, if you happen to have any sample training data I could look at I am looking for an example so I can make sure my modifications are working properly.

@aoldoni
Copy link
Author

aoldoni commented May 31, 2017

Hi @J38 - sorry it took a while for me to respond to you.

  1. Would you have an email for me to send the data? Unfortunately I cannot link it to the internet.

  2. Moreover, I would like to point out that I modified the code slightly for me as to add the ability to parametrise NER tags for RE training in this commit.
    It adds a new parameter that can be used in the MachineReadingProperties properties file with a comma separated list of values for the NER tag entity normalization needed by the RE machine reading classes. This is an example of the end result.
    This approach surely differs from your intentions as explained above, but I would be happy to do a pull request if you believe such parametrisation is useful to have in case someone else needs such customisation in the meanwhile, before the KBPAnnotator is implemented for Relation Extractor training.

Please let me know your thoughts! 👍

@rpalenik
Copy link

Hello,

Thanks for great Stanford tools!

I would badly need to be able to train RE with custom entities for my project. I am not a professional (java) programmer (I am able to compile from source if proper instructions available) though and do not fully understand how to "change the code" as aoldoni suggested. Is the possibility of training custom relationships with custom entities available in 3.8? If not, how could I use the approach aoldoni suggested? I have available train corpus in original roth format available. Many thanks for reply! I am attaching small sample train file.

rel_train.txt

@aoldoni
Copy link
Author

aoldoni commented Jun 12, 2017

Hi @rpalenik ,

If not, how could I use the approach aoldoni suggested?

Regarding this question specifically, please note:

  1. You could use this fork: https://github.com/aoldoni/stanford-corenlp - it contains the change that I did.
  2. Re-compile it, instructions here https://github.com/stanfordnlp/CoreNLP#build-instructions
  3. Then use the new "possibleEntities" attribute in the properties file that is now available in this fork, as per this example https://github.com/aoldoni/tetre/blob/develop/config/relation.properties#L54

@rpalenik
Copy link

Hi @aoldoni ,

Many thanks, I would need some more help. I understand I need to:

  1. Clone current source file from https://github.com/stanfordnlp/CoreNLP
  2. Replace respective files from your repository
  3. Compile

However, I got numerous compilation errors. Have I done it wrong? Can you pls. help with the right approach?

thnx.
R.

@rpalenik
Copy link

Here is the output from the compiler......
ant_error.txt

@J38 J38 added this to the v.3.9.0 milestone Nov 1, 2017
@manning manning modified the milestones: v.3.9.0, v.4.3 May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants