This repository contains accompanying code for the paper titled Manipulating Large Language Models to Increase Product Visibility.
Large language models (LLMs) are increasingly being integrated into search engines to provide natural language responses tailored to user queries. Customers and end-users are becoming more dependent on these models to make purchase decisions and access new information. In this work, we investigate whether an LLM can be manipulated to enhance the visibility of specific content or products in its recommendations. We demonstrate that adding a strategic text sequence (STS)—a carefully crafted message—to a product's information page or a website's content can significantly increase its likelihood of being listed as the LLM's top recommendations. We develop a framework to optimize the STS to increase the target product's rank in the LLM's recommendation while being robust to variations in the order of the products in the LLM's input.
To understand the impact of the strategic text sequences, we conduct empirical analyses using datasets comprising catalogs of consumer products (such as coffee machines, books, and cameras) and a collection of political articles. We measure the change in visibility of a product or an article before and after the inclusion of the STS. We observe that the STS significantly enhances the visibility of several products and articles by increasing their chances of appearing as the LLM's top recommendation. This ability to manipulate LLM-generated search responses provides vendors and political entities with a considerable competitive advantage, posing potential risks to fair market competition and the impartiality of public opinion.
The following figure shows the impact of adding an STS to a product's information page. In the "Before" scenario, the target product is not mentioned in the LLM's recommendations. However, in the "After" scenario, the STS on the product's information page enables the target product to appear at the first position, improving its visibility in the LLM's recommendation.
Generating STS: The file rank_opt.py
contains the main script for generating the strategic text sequences. It uses the list of products in data/coffee_machines.jsonl
as the catalog. It optimizes
the probability of the target product's rank being 1.
Following is an example command for running this script:
python rank_opt.py --results_dir [path/to/save/results] --target_product_idx [num] --num_iter [num] --test_iter [num] --random_order --mode [self or transfer]
Options:
-
--results_dir
: To specify the location to save the outputs of the script, such as the STS of the target product. -
--target_product_idx
: To specify the index of the target product in the list of products indata/coffee_machines.jsonl
. -
--num_iter
: Number of iterations of the optimization algorithm. -
--test_iter
: Interval to test the STS. -
--random_order
: To optimize the STS to tolerate variations in the product order. -
--mode
: Mode in which to generate the STS:a.
self
: Optimize and test STS on the same LLM (applicable to open-access LLMs like Llama)b.
transfer
: Optimize to transfer to a different LLM (applicable for API-access models like GPT-3.5), e.g., Optimize using Llama and Vicuna, and test on GPT-3.5.
rank_opt.py
generates the STS for the target product and plots the target loss and the rank of the target product in the results directory.
See self.sh
and transfer.sh
in bash script
for usage of the above options.
coffee_machines.jsonl
in data
contains a catalog of ten fictitious coffee machines listed in increasing order of price.
Evaluating STS: evaluate.py
evaluates the STS generated by rank_opt.py
. We obtain product recommendations from an LLM with and without the STS in the target product's description in the catalog. We then compare the rank of the target product in the LLM's recommendation in the two scenarios. We repeat this experiment several times to quantify the advantage obtained from using the STS.
Following is an example command for running the evaluation script:
python evaluate.py --model_path [LLM for STS evaluation] --prod_idx [num] --sts_dir [path/to/STS] --num_iter [num] --prod_ord [random or fixed]
Options:
-
--model_path
: Path to the LLM to use for STS evaluation. -
--prod_idx
: Target product index. -
--sts_dir
: Path to STS to evaluate. Same as--results_dir
forrank_opt.py
. -
--num_iter
: To specify the number of evaluations. -
--prod_ord
: To specify the product order in the LLMs input.
Plotting Results: plot_dist.py
plots the distribution of the target product's rank before and after STS insertion. It also plots the advantage obtained by using the STS (% of times the target product ranks higher).
See scripts eval_self.sh
and eval_transfer.sh
for usage of evaluate.py
and plot_dist.py
.
System Requirements: The strategic text sequences were optimized using NVIDIA A100 GPUs with 80GB memory. When run in transfer mode, rank_opt.py
requires access to GPUs. All the abopve scripts need to be run in a Conda environment created as per the instructions below.
Follow the instructions below to set up the environment for the experiments.
- Install Anaconda:
- Download .sh installer file from https://www.anaconda.com/products/distribution
- Run:
bash Anaconda3-2023.03-Linux-x86_64.sh
- Set up conda environment
llm-rank
with required packages:conda env create -f env.yml
- Activate environment:
conda activate llm-rank
If setting up the environment using env.yml
does not work, manually build an environment
with the required packages using the following steps:
- Create Conda Environment with Python:
conda create -n [env] python=3.10
- Activate environment:
conda activate [env]
- Install PyTorch with CUDA from: https://pytorch.org/
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
- Install transformers from Huggingface:
pip install transformers
- Install accelerate:
conda install -c conda-forge accelerate
- Install
seaborn
:conda install anaconda::seaborn
- Install
termcolor
:conda install -c conda-forge termcolor
- Instal OpenAI python package:
conda install conda-forge::openai