-
Notifications
You must be signed in to change notification settings - Fork 20
Stanford CoreNLP
- First download Stanford CoreNLP tool jar from its webpage.
- Navigate to the path of the stanford-corenlp.jar.
- Run command
java -cp stanford-corenlp.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop <file.prop>
- This command trains and generates a CRF model according to file.prop.
- file.prop specifies the training file and the features to be used in the training process.
- Run command
java -cp stanford-corenlp.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier <ner-model.ser.gz> -testFile <file_test.txt>
- This command classifies the file file_test.txt using the CRF model generated in step 3.
- It also presents the evaluation regarding the Precision, Recall and F1 for the multiple classes.
Check folder for more information.
File with each line having a token and the respective entity type, being O
the tokens which are not an entity. Example:
"I complained to Microsoft about Bill Gates."
I O
complained O
to O
Microsoft ORGANIZATION
about O
Bill PERSON
Gates PERSON
. O
In order to evaluate using conlleval script, the same tokenization has to be present either in the golden data and the output of Stanford. So, for that to happen, I used the StanfordCoreNLP tokenizer (edu.stanford.nlp.process.PTBTokenizer
) in both the training and testing (golden and output) dataset. Also, I converted the tokenized text into conll format using this script and added IOB tags using this script.
In order to be able to run Stanford NER with the HAREM dataset as input, the dataset has to be converted in the correct format. For conversion, used corpus-processor. Download
Steps:
- Install ruby
- Install corpus-processor ruby-gem
- Change categories to be recognised (example)
- Run command:
corpus-processor process <input-file> <output-file> --categories=<file.yml>
Check folder for more information.
Check all the results here.
Results after 4 repeats:
Level | Precision | Recall | F-measure |
---|---|---|---|
Categories | 58.84% | 53.60% | 56.10% |
Types | - | - | - |
Subtypes | - | - | - |
Filtered | 69.97% | 54.23% | 61.10% |
Note: Since types and subtypes were too computationally demanding to run, a different prop file was used in order to decrease the number of features and thus reduce the number of computational variables. However, since different features were used, it wouldn't be comparable to the other tools, so it is not displayed here.
For this tool, I decided to check the influence of the following hyperparameters: tolerance, epsilon, MaxNGramLeng. The results are the following:
Tolerance (default: 1e-4)
Value | Categories | Filtered |
---|---|---|
1e-5 | 54.07% | 58.94% |
5e-5 | 54.02% | 59.00% |
1e-4 | 54.15% | 58.84% |
5e-4 | 54.02% | 58.72% |
1e-3 | 54.31% | 58.86% |
5e-3 | 54.12% | 58.81% |
Epsilon (default: 0.01)
Value | Categories | Filtered |
---|---|---|
0.005 | 54.15% | 58.84% |
0.01 | 54.15% | 58.84% |
0.015 | 54.15% | 58.84% |
0.02 | 54.15% | 58.84% |
MaxNGramLeng (default: 6)
Value | Categories | Filtered |
---|---|---|
4 | 53.47% | 58.31% |
5 | 53.77% | 58.66% |
6 | 54.15% | 58.84% |
7 | 54.37% | 58.97% |
Repeated holdout
Tolerance | Precision | Recall | F-measure |
---|---|---|---|
1e-4 | 90.09% | 83.41% | 86.62% |
1e-3 | 90.26% | 83.31% | 86.64% |
Repeated 10-fold cross validation
Tolerance | Precision | Recall | F-measure |
---|---|---|---|
1e-4 | 89.80% | 84.10% | 86.86% |
1e-3 | 89.81% | 83.95% | 86.78% |
Get the generated models in the Resources page.