Skip to content
/ Splice Public

The Role of Information Extraction Tasks in Automatic Literary Character Network Construction

Notifications You must be signed in to change notification settings

CompNet/Splice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Splice

The Role of Information Extraction Tasks in Automatic Literary Character Network Construction

Reproducing Results

First, you should:

  • install dependencies. Either use poetry install if you have poetry, or pip install -r requirements.txt otherwise.
  • get the litbank dataset

The main experiment can be run with xp.py:

python xp.py with\
	   min_graph_nodes=10\
	   co_occurrences_dist=32\
	   litbank.root="/path/to/litbank"

Degradation Experiments

The following script will run all of the degradation experiments:

MAIN_XP_RUN="/path/to/main/xp/run"

python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=NER degradation_name=add_wrong_entity degradation_steps=1000 degradation_report_frequency=0.05
python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=NER degradation_name=remove_correct_entity degradation_steps=200 degradation_report_frequency=0.5
python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=coref degradation_name=add_wrong_mention degradation_steps=200 degradation_report_frequency=0.05
python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=coref degradation_name=remove_correct_mention degradation_steps=1000 degradation_report_frequency=0.05
python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=coref degradation_name=add_wrong_link degradation_steps=500 degradation_report_frequency=0.05
python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=coref degradation_name=remove_correct_link degradation_steps=1000 degradation_report_frequency=0.05
python xp_metrics_over_degradation.py with input_dir="${MAIN_XP_RUN}" task_name=coref degradation_name=coref_all degradation_steps=1000 degradation_report_frequency=0.05

End-to-end LLM-based Pipelines

The E2E-Coref experiment can be reproduced with the xp_e2e_llm_coref.py script:

MAIN_XP_RUN="/path/to/main/xp/run"
LITBANK_PATH="/path/to/litbank"

python xp_e2e_llm_coref.py with\
	   input_dir="${MAIN_XP_RUN}"\
	   model="gpt3.5"\
	   openAI_API_key="insert your openAI key"\
	   litbank.root="${LITBANK_PATH}"

python xp_e2e_llm_coref.py with\
	   input_dir="${MAIN_XP_RUN}"\
	   model="gpt40"\
	   openAI_API_key="insert your openAI key"\
	   litbank.root="${LITBANK_PATH}"

python xp_e2e_llm_coref.py with\
	   input_dir="${MAIN_XP_RUN}"\
	   model="llama3-8b-instruct"\
	   hg_access_token="insert your Huggingface access token"\
	   device="cuda"\
	   litbank.root="${LITBANK_PATH}"

Similarly, the *E2E-Graphml experiment can be reproduced with the xp_e2e_llm_graphml.py script:

MAIN_XP_RUN="/path/to/main/xp/run"

python xp_e2e_llm_graphml.py with\
	   input_dir="${MAIN_XP_RUN}"\
	   model="gpt3.5"\
	   openAI_API_key="insert your openAI key"\
	   litbank.root="${LITBANK_PATH}"

python xp_e2e_llm_graphml.py with\
	   input_dir="${MAIN_XP_RUN}"\
	   model="gpt40"\
	   openAI_API_key="insert your openAI key"\
	   litbank.root="${LITBANK_PATH}"

python xp_e2e_llm_graphml.py with\
	   input_dir="${MAIN_XP_RUN}"\
	   model="llama3-8b-instruct"\
	   hg_access_token="insert your Huggingface access token"\
	   device="cuda"\
	   litbank.root="${LITBANK_PATH}"

Printing / Plotting Results

Figure Corresponding Script
Table 1 print_main_task_results.py
Table 2 print_main_graph_results.py
Table 3
Figure 1 plot_degradation_metrics.py
Figure 2 plot_ner_degradation_metrics.py
Figure 3 plot_coref_degradation_metrics.py
Table 4 print_e2e_graph_results.py

About

The Role of Information Extraction Tasks in Automatic Literary Character Network Construction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages