Complex Claim Verification with Evidence Retrieved in the Wild

Getting started

Clone the repository and install the requirements:

pip install -r requirements.txt

Download the data to the local directory via: https://drive.google.com/file/d/1YMpr5hnJqzrXcp3kBwpVsAK0uv9q_Xf3/view?usp=sharing

Data format

The data files are formatted as jsonlines. The description of each field is as follows:

Field	type	Description
`example_id`	string	Example ID
`claim`	string	Claim
`label`	string	Label: pants-fire, false, barely-true, half-true, mostly-true, true
`person`	string	Person who made the claim
`venue`	string	Date and venue of the claim
`url`	string	Politifact URL of the claim
`justification`	List[string]	Justification paragraph written by the fact-checkers
`qg-output`	List[string]	Sub-questions generated by claim decomposition
`search_results`	List[dict]	Bing search results without timestamp
`search_results_timestamp`	List[dict]	Bing search results with timestamp
`summary`	string	Summary generated by synthesizing the results from second-stage retrieval
`summarization_prompt`	string	Prompt used for generate claim-focused summary

Each search_results is a dictionary with the following fields:

search_results = {
    "entities_info": [
        {"name": name of the entity included in the search results,
         "decsription": description of the entity
        }
        ...
    ],
    'pages_info': [
        {"page_name": name of the page included in the search results,
         "page_url": url of the page,
         "page_timestamp": timestamp of the page,
         "page_snippet": snippet of the page
        }
        ...
    ]
}

Generate sub-questions using claim

To decompopse the claim into a set of sub-questions, you can run the following command:

python generate_subquestions.py --input_file ./data/dev-site-restricted.jsonl --output_file OUTPUT_FILE_PATH

Use Bing API to retrieve evidence

To use Bing API, you need to register a Bing API key from here. Then, you can run the following command to retrieve evidence using Bing:

python evidence_retrieval.py \
--input_url data/train.jsonl \
--output_file OUTPUT_FILE_PATH \
--use_time_stamp 1 \
--sites_constrain 1 \
--use_annotation 0 \
--use_claim 0 \
--question_num 10 \
--answer_count 10 \
--chunk_size 50 \
--time_offset 1

Check the argument description in evidence_retrieval.py for more details.

Second stage retrieval + summarization

To generate the claim-focused summary, you can run the following command:

pyhton python generate_summarization.py \
--input_path ./data/train-site-restricted.jsonl \
--corpus_path ./data/corpus/train.json \
--output_path OUTPUT_FILE_PATH \
--num 5 \
--window_size 30 \
--topk_units 10 \
--topk_docs 5 \
--stride 15 \
--use_claim 0 \
--use_annotation 0 \
--use_justification 0 \
--text_unit span \
--time_restricted 1 \
--time_window 1 \
--summ_type multi_summs \
--use_ChatGPT 0 \
--use_fewshot 0 \
--filter_politifact 0

Check the argument description in generate_summarization.py for more details.

Veracity prediction

To predict the veracity of the claim, you can run the following command:

bash run_veracity_classification.sh

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
models		models
README.md		README.md
evidence_retrieval.py		evidence_retrieval.py
few-shot-prompt.txt		few-shot-prompt.txt
generate_sub_questions.py		generate_sub_questions.py
generate_summarization.py		generate_summarization.py
requirements.txt		requirements.txt
run_veracity_classification.sh		run_veracity_classification.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Complex Claim Verification with Evidence Retrieved in the Wild

Getting started

Data format

Generate sub-questions using claim

Use Bing API to retrieve evidence

Second stage retrieval + summarization

Veracity prediction

About

Releases

Packages

Languages

jifan-chen/Fact-checking-via-Raw-Evidence

Folders and files

Latest commit

History

Repository files navigation

Complex Claim Verification with Evidence Retrieved in the Wild

Getting started

Data format

Generate sub-questions using claim

Use Bing API to retrieve evidence

Second stage retrieval + summarization

Veracity prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages