Step1. Instanll the required packages by running the following command:
pip install -r requirements.txt
Step2. Specify the gpt-api-key
in the config.py
file with your openai API key.
Step3. Execute the following command to run llm4ea on D-Y-15k dataset
python infer.py --dataset_name D-Y-15K
If you have no access to an OpenAI API, you can run the simulation by running the following command, it syntheises pseudo-labels for the dataset using the true positive rate of 0.5.
python infer.py --dataset_name D-Y-15K --simulate --tpr 0.5
here, the arguement --tpr
specifies the true positive rate for the synthesized pseudo-labels.
There are three optional scripts: infer-baseline.py
, infer-active-only.py
, and infer-lr-only.py
, which are variants of the infer.py script.
- The
infer-baseline.py
script deactivates both the label refinement and active learning components of the framework, directly training the base EA model, Dual-AMN. This corresponds to the Dual-AMN baseline in the main table. - The
infer-active-only.py
script deactivates the label refinement component of the model. This corresponds to thew/o LR
ablation setting in the paper. - The
infer-lr-only.py
script deactivates the active learning component of the model. This corresponds to thew/o Act
ablation setting in the paper.
The code is based on PRASE and Dual-AMN, the dataset is from OpenEA benchmark, preprocessed by using the dump file wikidatawiki-20160801-abstract.xml
from wikdata. The OpenEA dataset is licensed under the GPLv3 License.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.