Skip to content

Official implementation of the ACL paper "Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking"

Notifications You must be signed in to change notification settings

RUCKBReasoning/PiNose

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

PiNose

This repo is the official implementation of the paper "Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking".

This repository contains two main directories: offline_consistency and llm_probe. Each directory contains scripts for different stages of the data processing and model training pipeline.
The relevant data and checkpoint can be downloaded from here.

offline_consistency

The offline_consistency directory contains scripts for constructing the training data offline. The scripts should be run in the following order:

  1. question_bootstrapping.py: This script is used to generate the initial set of questions.
  2. diverse_response_generation.py: This script generates a diverse set of responses for each question.
  3. peer_review_gathering.py: This script collects peer reviews for each response.
  4. peer_review_filtering.py: This script filters the peer reviews to obtain the final training data.

Each sample in the training data is in the format of [Question, Response, Label].

llm_probe

The llm_probe directory contains scripts for probing the language model (LLM). The scripts should be run in the following order:

  1. extract_hidden_state.py: This script extracts the hidden states of the LLM when it reads the training samples. These hidden states are used as features.
  2. train_classifier.py: This script trains a classifier using the extracted features.
  3. test_classifier.py: This script tests the trained classifier on a test dataset.

Please ensure that you have the necessary permissions and environment set up before running these scripts.

About

Official implementation of the ACL paper "Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%