Skip to content

Latest commit

 

History

History
132 lines (87 loc) · 7.27 KB

reproduce_experiment.md

File metadata and controls

132 lines (87 loc) · 7.27 KB

Guidelines for Reproduction Methods

In this document, we will introduce how to reproduce the results of various methods listed in our table under a unified setting. For specific settings and explanations of each method, please refer to implementation details. It is recommended to have some basic understanding of our repository beforehand, which can be found in introduction for beginners.

Preliminary

Reproduce Step

All the code used is based on the repository's example/methods. We have set appropriate hyperparameters for various methods. If you need to adjust them yourself, you can refer to the config dictionary provided for each method and the original papers of each method.

1. Set Basic Config

First, you need to fill in the paths of various downloads in my_config.yaml. Specifically, you need to fill in the following four fields:

  • model2path: Replace the paths of E5 and Llama3-8B-instruct models with your own paths
  • method2index: Fill in the path of the index file built using E5
  • corpus_path: Fill in the path of the Wikipedia corpus file in jsonl format
  • data_dir: Change to the download path of your own dataset

2. Set Config for Specific Method

For some methods that require the use of additional models, extra steps are required. We will introduce the methods that need extra steps below. If you know that the method you want to run does not need these steps, you can skip directly to the third section.

Table of Contents:

AAR

This method requires using a new retriever, so you need to download the retriever and build the index.

  • Additional Step1: Download AAR-Contriever (from here)
  • Additional Step2: Build the index for AAR-Contriever (note that the pooling method should be 'mean')
  • Additional Step3: Modify the index_path and model2path in the AAR function in run_exp.py.

LongLLMLingua

This method requires downloading Llama2-7B.

  • Additional Step1: Download Llama2-7B (from here)
  • Additional Step2: Modify the refiner_model_path in the llmlingua function in run_exp.py

RECOMP

This method requires downloading three checkpoints trained by the authors (trained on NQ, TQA, and HotpotQA respectively).

  • Additional Step1: Download the author's checkpoints (NQ Model, TQA Model, HotpotQA Model)
  • Additional Step2: Fill in the downloaded model paths in the model_dict of the recomp function

Selective-Context

This method requires downloading GPT2.

  • Additional Step1: Download GPT2 (from here)
  • Additional Step2: Modify the refiner_model_path in the sc function in run_exp.py

Ret-Robust

This method requires downloading the Lora trained by the authors and downloading the Llama2-13B model to load the Lora.

  • Additional Step1: Download Llama2-13B (from here)
  • Additional Step2: Download the author's trained Lora, trained on NQ (from here) and trained on 2WikiMultihopQA (from here)
  • Additional Step3: Modify the corresponding Lora paths in the model_dict of the retrobust function and the Llama2-13B path in my_config.yaml

We recommend adjusting the single_hop parameter in the SelfAskPipeline according to different datasets, which controls whether to decompose the query. For NQ, TQA, PopQA, WebQ, we set single_hop to True.

SKR

This method requires an embedding model and training data used during the inference stage. We provide the training data given by the authors. If you wish to use your own training data, you can generate it according to the format of the training data and the original paper.

  • Additional Step1: Download the embedding model (from here)
  • Additional Step2: Download the training data (from here)
  • Additional Step3: Fill in the embedding model path in the model_path of the skr function
  • Additional Step4: Fill in the training data path in the training_data_path of the skr function

Self-RAG

This method requires using a trained model and currently only supports running in the vllm framework.

  • Additional Step1: Download the Self-RAG model (from 7B model, 13B model)
  • Additional Step2: Modify the generator_model_path in the selfrag function.

Spring

This method requires a virtual token embedding file and currently only supports running in the hf framework.

  • Additional Step1: Download virtual token embedding file from official repo
  • Additional Step2: Modify the token_embedding_path in the spring function.

Adaptive-RAG

This method requires a classifier to classify the query. Since the author did not provide an official checkpoint, we used a checkpoint trained by others on Huggingface for the experiment (which may result in inconsistent results).

If the official open-source checkpoint is released in the future, we will update the experimental results.

RQRAG

This method requires downloading the RQRAG model.

  • Additional Step1: Download RQRAG model from huggingface repo: zorowin123/rq_rag_llama2_7B
  • Additional Step2: Modify the generator_model_path in the rqrag function.

3. Run methods

Run the experiment on the NQ dataset using the following command.

python run_exp.py --method_name 'naive' \
                  --split 'test' \
                  --dataset_name 'nq' \
                  --gpu_id '0,1,2,3'

The method can be selected from the following:

naive zero-shot AAR-contriever llmlingua recomp selective-context sure replug skr flare iterretgen ircot trace