🗺 prompt playground #3435

mikeldking · 2024-06-10T16:43:45Z

As a user of Phoenix I don't want to have to go back to my IDE to iterate on a prompt. I want to be able to use the data stored in Phoenix (spans, dataets) and run them through a prompt.

Use-cases

Replay a template change on an LLM Span
Run a template change on a dataset
Construct an evaluation template on a single chosen production span or Dataset - Workflow is testing your Evals and be able to save as experiment
Synthetic data Generation - Use to generate synthetic data, add columns to current rows of data in a dataset, to help create test data

Milestone 1 - LLM Replay

The most important thing to accomplish is the ability to take a specific LLM step (e.g. a span) and to be able to re-play that execution. In the process of re-executing that step, we should record the new response so that we can compare and evaluate.

Leads: @axiomofjoy @anticorrelator

Out of Scope

focus on Chat Completions (things with roles) - not completions

Planning

UI

API

Testing

Fix

Instrumentation

OpenIneference

[bug][anthropic] Tool result content is not added to spans openinference#1109

Milestone 2 - Datasets on Playgrounds

Add support to run a set of dataset examples through a prompt

API

[playground][datasets] add rate limiter to llm clients #5238
[playground] llm span context manager #5241
[playground][api] dataset example chat completion subscription #4804
[playground] record experiment run on chat completion over a dataset subscription #5261
[playground][datasets] yield experiment, experiment runs, and spans on an ongoing basis from the subscription on an ongoing basis #5276
[playground][datasets] ensure experiment runs are created for template formatting errors
[playground][datasets] chat completion over datasets mutation (non-subscription) #5294
[playground] display json for example input #5313

UI

Milestone 3 - Playground run annotations and evaluators

Add support for easily annotating playground runs via human annotations or evaluators.
Note: Annotations for playground spans is already supported as of the milestones above

Dogfooding

UI

Azure

[playground][ENHANCEMENT][ui][azure] remember the deployment name #5413

Anthropic

Server

Dogfooding issues round 2

Presentations

[playground] demo video #5474

Documentation

[playground][docs] concept section on prompt engineering #5568
[playground][docs] how to section on playground (e.g. compare)

Post-Launch

Punt

The text was updated successfully, but these errors were encountered:

heralight · 2024-06-26T08:12:13Z

Hi!

Enhancement proposal

This feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10.

Goal

The goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression.

Thank you,

Alexandre

mikeldking · 2024-06-27T00:48:03Z

Hi!

Enhancement proposal

This feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10.

Goal

The goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression.

Thank you,

Alexandre

@heralight Hey! Thanks for the feedback! We have a ton of features coming out with regards to prompt iteration, notably prompt experiments. Stay tuned. It has evaluations built in

Noted on the replay and the annotations:) will give it some thought. We have a few ideas around replaying data at prompts, but haven't thought about human annotations on different prom versions a ton. Would love to hear more.

heralight · 2024-06-27T14:32:42Z

Very nice!
my ideal workflow, would be:

trace some openai calls made from code
transform some into a replayable prompt, where each modification can be tested and versioned and annoted and ranked.
prompt can parametized and call from code

best,

axiomofjoy · 2024-12-10T23:38:14Z

🛝🎉

mikeldking added this to phoenix roadmap Jun 10, 2024

mikeldking converted this from a draft issue Jun 10, 2024

github-project-automation bot added this to phoenix Jun 10, 2024

github-project-automation bot moved this to 📘 Todo in phoenix Jun 10, 2024

mikeldking mentioned this issue Aug 29, 2024

🗺️ [tracing] Template Iteration #1164

Closed

mikeldking assigned mikeldking and Parker-Stafford Aug 29, 2024

mikeldking mentioned this issue Oct 1, 2024

feat: prompt playground beta #4775

Merged

mikeldking assigned RogerHYang, anticorrelator, cephalization, Parker-Stafford and mikeldking and unassigned mikeldking and Parker-Stafford Oct 2, 2024

RogerHYang mentioned this issue Nov 12, 2024

feat(playground): graphql mutation for chat completion over dataset #5325

Merged

axiomofjoy closed this as completed Dec 10, 2024

github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix Dec 10, 2024

github-project-automation bot moved this to Done in phoenix roadmap Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗺 prompt playground #3435

🗺 prompt playground #3435

mikeldking commented Jun 10, 2024 •

edited by axiomofjoy

Loading

heralight commented Jun 26, 2024

mikeldking commented Jun 27, 2024

Enhancement proposal

Goal

heralight commented Jun 27, 2024

axiomofjoy commented Dec 10, 2024

🗺 prompt playground #3435

🗺 prompt playground #3435

Comments

mikeldking commented Jun 10, 2024 • edited by axiomofjoy Loading

Use-cases

Milestone 1 - LLM Replay

Planning

UI

API

Testing

Fix

Instrumentation

OpenIneference

Milestone 2 - Datasets on Playgrounds

API

UI

Milestone 3 - Playground run annotations and evaluators

Dogfooding

UI

Azure

Anthropic

Server

Dogfooding issues round 2

Presentations

Documentation

Post-Launch

Punt

heralight commented Jun 26, 2024

Enhancement proposal

Goal

mikeldking commented Jun 27, 2024

Enhancement proposal

Goal

heralight commented Jun 27, 2024

axiomofjoy commented Dec 10, 2024

mikeldking commented Jun 10, 2024 •

edited by axiomofjoy

Loading