This Python script benchmarks the performance improvement of prompt caching for the OpenAI GPT-4o and Anthropic Claude sonnet-3.5 language models. It measures the latency for API calls with and without prompt caching by requesting a small number of output tokens, approximating the time to first byte. It then calculates the percentage improvement in latency.
- Install the required Python packages:
pip install -r requirements.txt
- Set the API keys for OpenAI and Anthropic in the
benchmark.py
script. - Run the script:
python benchmark.py