Skip to content

Official repository that contains codes and access to dataset used for our paper titled `Prompt-Time Symbolic Knowledge Capture with Large Language Models`

Notifications You must be signed in to change notification settings

HaltiaAI/paper-PTSKC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paper-PTSKC

This repository contains code and access to the dataset used for our paper titled Prompt-Time Symbolic Knowledge Capture with Large Language Models. This document is intended for researchers, developers and those who would like to build, run, and experiment with paper-PTSKC.

Prerequisites and Dependencies

  • requires M series Apple silicon
  • Requires native Python 3.8 - 3.11. In our work, we utilized the system-managed Python installation on macOS, located at /usr/bin/python3, which is pre-installed with the operating system. To use it, you can either create aliases for /usr/bin/python3 and /usr/bin/pip3 as python and pip or directly reference them in every command involving python or pip. It's important to note that if you opt to use a different version of Python, you will need to update the Python path in the `runBenchmarks.py`` file.
  • requires macOS>= 13.3, Please note that MLX platform developers highly recommend using macOS 14 (Sonoma)

Installation

mlx-lm is available on PyPI. Please refer to the official MLX documentation and MLX examples for more details on the MLX platform.
To install the Python API, run:

pip install mlx-lm

How To Use

Generating test, train, and validation files

To generate the data/test.jsonl, data/train.jsonl, and data/valid.jsonl files, run the following command:

python scripts/generateTestTrainValid.py

Details about the dataset generation are as follows:

  • data/base.jsonl is the fundamental dataset file that holds 1600 user-prompt and prompt response pairs.
  • generateTestTrainValid.py script parses the base file and generates the required files for (Q)LoRA and performance evaluation. Please note that the generated output format is compatibale with Mistral-7B-Instruct model. Modifications might be required for different instruction formats.
  • The number of lines that each output file will contain can be configured from generateTestTrainValid.py.
  • All generated files are written under data directory.

Generating ground-truth file

To generate the results/test_ground_truth.jsonlfile, run the following command:

python scripts/generateGroundTruth.py 

generateGroundTruth.py script processes the data/test.jsonl file and writes the expected prompt response for each user input. The generated ground-turth file will be used in performance evaluations.

Model file

In our work, we utilize the 4-bit quantized and mlx-converted version of the Mistral-7B-Instruct-v0.2 model. All model files must be placed under the Mistral-7B-Instruct-v0.2-4bit-mlx folder located in the main directory of our repository. To replicate our test results accurately, please download the mlx-community/Mistral-7B-Instruct-v0.2-4bit-mlx file from the mlx-community on Hugging Face and ensure it is placed in the specified path.

Fine-tuning

In our paper, we run QLoRA finetuning with following parameters and generated the adapter file adapters_b4_l16_1000.npz. Please use the same naming for the adapter file to be able to run following scripts without any change.

python -m mlx_lm.lora --train --model Mistral-7B-Instruct-v0.2-4bit-mlx --iters 1000 --data ./data --batch-size 4 --lora-layers 16 --adapter-file adapters_b4_l16_1000.npz

Running the benchmarks

The proposed zero-shot prompting, few-shot prompting and fine-tuning methods are implemented in files zeroShot.py, fewShot.py, and fineTunedShot.py, respectively.
The runBenchmarks.py script calls these methods, reading input from data/test.jsonl and writing the results to the results directory.

python scripts/runBenchmarks.py

Evaluation

calculateF1Score.py script compares each method's result file with the ground-truth file and calculates precision, recall and f1-score. All results are written to the evaluation_results.txt file under results directory.

python scripts/calculateF1Score.py

Troubleshooting

  • If any script complains about jinja2 library please install it seperately using the following command:
pip install jinja2

Cite

@misc{coplu2024prompttime,
      title={Prompt-Time Symbolic Knowledge Capture with Large Language Models}, 
      author={Tolga Çöplü and Arto Bendiken and Andrii Skomorokhov and Eduard Bateiko and Stephen Cobb and Joshua J. Bouw},
      year={2024},
      eprint={2402.00414},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Official repository that contains codes and access to dataset used for our paper titled `Prompt-Time Symbolic Knowledge Capture with Large Language Models`

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages