- Quick summary
A research prototype that induce script knowledge,mining relations from general discourse components (events, entities) from text, from multiple documents (Or not).
The project is developed under the following environment:
- Java 1.8 (A Java 1.6 branch is available, but not frequently maintained).
- Maven 3.2.1
In addition, it depends on the following modules,
- Uima Tools : https://github.com/hunterhector/uima-base-tools
- Other Utilities : https://github.com/hunterhector/zl-utils
Building is simple with Maven, under the root directory of the project, do (You need to do the same thing for the two modules I mentioned above beforehand.):
mvn clean install
- Now try running the system using the example data:
- First, get the models and resources:
- English: http://accra.sp.cs.cmu.edu/~zhengzhl/event_models/downloads/event_models/event_english_run_resources.tar.gz
- Chinese: http://accra.sp.cs.cmu.edu/~zhengzhl/downloads/event_models/event_chinese_run_resources.tar.gz
- Put it a directory, we represent as <models_and_resources>
- Locate the setting file:
- English setting: settings/nugget/event-run.en.properties
- Chinese setting: settings/nugget/event-run.zh.properties
- Modify the settings:
- edu.cmu.cs.lti.model.dir=<models_and_resources>
- edu.cmu.cs.lti.resource.dir=<models_and_resources>
- For English: edu.cmu.cs.lti.model.event.dir=<models_and_resources>/EventMention/english
- For Chinese: edu.cmu.cs.lti.model.event.dir=<models_and_resources>/EventMention/chinese
- Run English:
- Just Run it:
bin/kbp/run_only_pipeline.sh settings/nugget/event-run.en.properties data/samples/en data/samples/en/output experiment_en_01
- Just Run it:
- Run Chinese:
- Prerequisites:
- Add the LTP JNI to environment variable:
export LD_LIBRARY_PATH=<models_and_resources>/ltp/lib:$LD_LIBRARY_PATH
- Run it:
bin/kbp/run_only_pipeline.sh settings/nugget/event-run.zh.properties data/samples/zh data/samples/zh/output experiment_zh_01
- Add the LTP JNI to environment variable:
- Prerequisites:
- The results can be found in two formats:
- Json format in: data/samples/en/output/rich/test_run
- TBF format in: data/samples/en/output/experiments/test_run/results/all/vanillaMention.tbf
- Notes:
- The last two parameters of the shell script specify the output directory and the experiment name.
- The Chinese system relies on some external tools that requires C++ binaries, which may cause problems on some platforms.
- First, get the models and resources:
-
Download a copy of all the models package, and unpack it:
http://cairo.lti.cs.cmu.edu/~hector/models/EventMentionModelsAndResources20160411.tar.gz
-
Currently the project has been refactored a lot:
- To run these old models, try an earlier branch:
- https://bitbucket.org/hunterhector/cmu-script/branch/model0411
-
Modify the kbp.properties file with the following:
- point edu.cmu.cs.lti.model.dir to the unpacked directory
- point edu.cmu.cs.lti.model.event.dir to the unpacked directory
- point edu.cmu.cs.lti.resource.dir to the unpacked directory
-
Test it out by running the following command in cmu-script project directory:
bin/test/coref_plain_text.sh settings/kbp.properties event-coref/src/test/resources/sample-input ../sample-output
-
You should be able to find the annotation in TBF format in the following file:
../sample-output/eval/full_run/lv1_coref.tbf
For details about the TBF format, scoring, visit the TAC KBP event task website (look for Task Definition):
Notes: The current models are not the best models since the project is subject to frequent changes recently, I will try to update it as soon as possible.
Warning about Illegal Thread Exception If you see a java.lang.IllegalThreadStateException saying some threads are not terminated, but also a "BUILD SUCCESS" message, that should be fine. I don't know the reasons for it right now.
- The kbp.properties file contains most of the configuration, most of them are pointers to resources. Some numbers controls the various parameters for training.
- Most boolean configuration with a "skip" in it will try to skip certain step if the specific output exists, be ware to turn it off when you want to have fresh results.
- Detailed explanation of the parameters will come later.
To train the model, the easiest way is to use the data provided by TAC-KBP 2015. One can also create files of similar format to train them.