-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🗺 prompt playground #3435
Comments
Hi! Enhancement proposalThis feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10. GoalThe goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression. Thank you, Alexandre |
@heralight Hey! Thanks for the feedback! We have a ton of features coming out with regards to prompt iteration, notably prompt experiments. Stay tuned. It has evaluations built in Noted on the replay and the annotations:) will give it some thought. We have a few ideas around replaying data at prompts, but haven't thought about human annotations on different prom versions a ton. Would love to hear more. |
Very nice!
best, |
🛝🎉 |
As a user of Phoenix I don't want to have to go back to my IDE to iterate on a prompt. I want to be able to use the data stored in Phoenix (spans, dataets) and run them through a prompt.
Use-cases
Milestone 1 - LLM Replay
The most important thing to accomplish is the ability to take a specific LLM step (e.g. a span) and to be able to re-play that execution. In the process of re-executing that step, we should record the new response so that we can compare and evaluate.
Leads: @axiomofjoy @anticorrelator
Out of Scope
Planning
UI
endpoint
andapi version
#5023API
wss
whenhttps
is used to initiate websockets connection #4930Testing
Fix
Instrumentation
OpenIneference
Milestone 2 - Datasets on Playgrounds
Add support to run a set of dataset examples through a prompt
API
UI
Milestone 3 - Playground run annotations and evaluators
Add support for easily annotating playground runs via human annotations or evaluators.
Note: Annotations for playground spans is already supported as of the milestones above
Dogfooding
UI
Azure
Anthropic
latest
suffix #5406stop_sequences
invocation parameter is not populated immediately at playground initialization #5409Server
Dogfooding issues round 2
Presentations
Documentation
Post-Launch
Punt
The text was updated successfully, but these errors were encountered: