-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell Outputs aren't actually included in the LLM request. #286
Comments
Here's my Runme doc looking at some sample data. It looks like in the case of a non-interactive cell the stdout is available in one of the cell output items. For interactive cells it doesn't look like the cell output is available in either of the two output items. Non-interactive cellsIt looks like the mime types of the output items are
Interactive CellsIt looks like the mime types of the output items are
QuestionsFor non-interactive cells does it matter which output item we use to get the output from? i.e. whether we use the item with mime type application/vnd.code.notebook.stdout or the mime type stateful.runme/output-items? How would stderr end up being encoded? For interactive cells how difficult would it be to fetch stdout and stderr and include that in the requests that get sent to Foyle? |
I dug into the RunMe code to try to figure out what's going on. Output types are defined here There doesn't appear to be one for stderr. It looks like the function generateOutputUnsafe is responsible for generating the CellOutputItems when a cell is executed. It looks like it goes down the code branch for It looks like this is the line where it tries to generate a serialized representation of the terminal contents. It looks like in the case of non-interactive terminal terminalState is an instance of LocalBufferTermState and serialize returns the actual stdout. For interactive terminal it looks like terminalState is an instance of XtermState and Serialize returns an empty string. It looks like this is using xterm-addon-serialize to get the serialized data. Interestingly, if I save the session outputs, the outputs of interactive cells is saved in the markdown file. Does that use a different code path? It looks like in all code paths for terminal it uses "NotebookCellOutputItem.stdout" So it looks like we can use the output item with mime type application/vnd.code.notebook.stdout to get the stdout. |
On the Foyle side; we convert from Cell protos to Block protos here
This should preserve the mime type and data contents. foyle/app/pkg/docs/converters.go Line 34 in 5eafef9
So my suspicion is that the prompt does get included in non-interactive cells but not interactive cells but most cells are interactive as that appears to be the default. |
So I tested it with a non-interactive cell. The command in the cell was
If you execute it and then generate a completion the markdown that gets sent to the LLM is
So the output is included. Although it looks like we might want to filter out the mime type stateful.runme/output-items. What to do about interactive cells?I suspect there is a bug in how I'm serializing the notebook in the vscode-runme frontend before sending it over the wire to Foyle. The code does this by calling marshalNotebook https://github.com/stateful/vscode-runme/blob/3e36b16e3c41ad0fa38f0197f1713135e5edb27b/src/extension/ai/generate.ts#L46 @sourishkrout Am I missing a call to addExecInfo before serializing the notebook? It seems like that function might be synchronizing information from the terminal to the notebookData structure? |
…stateful.runme/output-items in prompt * Runme has multiple output items for each cell. It looks like the mime type application/vnd.code.notebook.stdout is usually the one that includes stdout * Items with stateful.runme/terminal and stateful.runme/output-items in prompt include JSON objects that don't seem relevant to getting help from the AI so we want to filter them out when converting to markdown so they don't confuse the AI. Related to: #286
#293) …stateful.runme/output-items in prompt * Runme has multiple output items for each cell. It looks like the mime type application/vnd.code.notebook.stdout is usually the one that includes stdout * Items with stateful.runme/terminal and stateful.runme/output-items in prompt include JSON objects that don't seem relevant to getting help from the AI so we want to filter them out when converting to markdown so they don't confuse the AI. Related to: #286
This likely relates to how the notebook state is snapshotted in the AI parts of the extension. I'll look into the details next week. We've recently done some work to "centralize" this so consumers other than the actual notebook serialization can use it. However, I have to make sure it supports the use case. For non-interactive cells, this is also associated with #143. |
* Force interactive to false on the returned cells as a temporary work around for #286
So for non-interactive cells, please use Re, I am continuing with interactive exec in a separate comment/analysis. |
Similarly for interactive execution, So the question here's: why is |
# Experiment Report After running an evaluation experiment, we compute a report that contains the key metrics we want to track. To start with this is * Number of cell match results * Number of errors and examples * Generate latency measured as percentiles * Level1 assertion stats # Level 1 Assertion stats * Add a level 1 assertion to test whether the document is zero. * I believe I observed this started happening when we included a fix to outputs not being included (#286) in #285. * I think the problem is the cell outputs could be very long and this could end up eating all the available context buffer # Reintegrate Level 1 Assertions Into Evaluation * Fix #261 * We start computing level 1 assertions at RunTime so that they are available in production and evaluation * Level1 assertions are computed and then logged * Our Analyzer pipeline reads the assertions from the logs and adds them to the trace * Our evaluation report accumulates assertion statistics and reports them
Shouldn't this go through serialization before logging it, @jlewi? Please see There might also be a secondary issue that |
Update: Yes, you're right. I also totally overlooked that For a different use case, I see the team's used a cache, which we could generalize, but I worry about synchronization issues. The cache might be behind by a version since it's built for saving, not editing. |
As a work around, I modified Foyle to return all code cells as non-interactive cells. |
This fixes a bug in the serializaiton of the notebook before sending it to Foyle that caused the output of interactive cells not to be included in the requests. The problem is that we need to call addExecInfo before converting the VSCode NotebookData representation to the proto. That handles copying the output of the interactive terminals into the NotebookData structure. This necessitated some code refactoring. In order to call addExecInfo we need an instance of the kernel. We create a new Converter class to keep track of the kernel and also provide reuse in the logic for converting notebook data to protos for Foyle. Since addExecInfo is async we need to change buildReq to return a promise and refactor some of the logic to be non blocking. * Fix jlewi/foyle#286
stateful#1756 branch: jlewi/outputs commit d279e74 Author: Jeremy Lewi <jeremy@lewi.us> Date: Wed Oct 23 15:25:38 2024 -0700 Include output from interactive cells in Foyle requests This fixes a bug in the serializaiton of the notebook before sending it to Foyle that caused the output of interactive cells not to be included in the requests. The problem is that we need to call addExecInfo before converting the VSCode NotebookData representation to the proto. That handles copying the output of the interactive terminals into the NotebookData structure. This necessitated some code refactoring. In order to call addExecInfo we need an instance of the kernel. We create a new Converter class to keep track of the kernel and also provide reuse in the logic for converting notebook data to protos for Foyle. Since addExecInfo is async we need to change buildReq to return a promise and refactor some of the logic to be non blocking. * Fix jlewi/foyle#286
* Include output from interactive cells in Foyle requests This fixes a bug in the serializaiton of the notebook before sending it to Foyle that caused the output of interactive cells not to be included in the requests. The problem is that we need to call addExecInfo before converting the VSCode NotebookData representation to the proto. That handles copying the output of the interactive terminals into the NotebookData structure. This necessitated some code refactoring. In order to call addExecInfo we need an instance of the kernel. We create a new Converter class to keep track of the kernel and also provide reuse in the logic for converting notebook data to protos for Foyle. Since addExecInfo is async we need to change buildReq to return a promise and refactor some of the logic to be non blocking. * Fix jlewi/foyle#286 * Update to use await. * Add a comment. * Use await.
* Forcing cells to be non-interactive was a temporary fix for the fact that the output wasn't included in Foyle completion requests (#286) * This is now fixed in the frontend so that the output of interactive cells is included in the requests to Foyle * We don't want to default to non-interactive cells because non-interactive cells don't show stderr Related to /issues/286
* Forcing cells to be non-interactive was a temporary fix for the fact that the output wasn't included in Foyle completion requests (#286) * This is now fixed in the frontend so that the output of interactive cells is included in the requests to Foyle * We don't want to default to non-interactive cells because non-interactive cells don't show stderr Related to /issues/286
* Include output from interactive cells in Foyle requests This fixes a bug in the serializaiton of the notebook before sending it to Foyle that caused the output of interactive cells not to be included in the requests. The problem is that we need to call addExecInfo before converting the VSCode NotebookData representation to the proto. That handles copying the output of the interactive terminals into the NotebookData structure. This necessitated some code refactoring. In order to call addExecInfo we need an instance of the kernel. We create a new Converter class to keep track of the kernel and also provide reuse in the logic for converting notebook data to protos for Foyle. Since addExecInfo is async we need to change buildReq to return a promise and refactor some of the logic to be non blocking. * Fix jlewi/foyle#286 * Update to use await. * Add a comment. * Use await.
I belief this is fixed by the linked PRs. |
Here's a notebook visualizing the LLM requests for a lot of examples.
https://gist.github.com/jlewi/b9247059ebb3cb323eee10504ffc9f6c
It doesn't look like the cell output of previous cells is actually part of the LLM request. Rather, what we have is some RunMe metadata.
The text was updated successfully, but these errors were encountered: