Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell Outputs aren't actually included in the LLM request. #286

Closed
jlewi opened this issue Oct 8, 2024 · 11 comments · Fixed by stateful/vscode-runme#1756
Closed

Cell Outputs aren't actually included in the LLM request. #286

jlewi opened this issue Oct 8, 2024 · 11 comments · Fixed by stateful/vscode-runme#1756
Labels
ai-quality likelihood/4-most-users https://lostgarden.com/2008/05/20/improving-bug-triage-with-user-pain/ type/6-major-usability

Comments

@jlewi
Copy link
Owner

jlewi commented Oct 8, 2024

Here's a notebook visualizing the LLM requests for a lot of examples.
https://gist.github.com/jlewi/b9247059ebb3cb323eee10504ffc9f6c

It doesn't look like the cell output of previous cells is actually part of the LLM request. Rather, what we have is some RunMe metadata.

{
	"type": "stateful.runme/terminal",
	"output": {
		"runme.dev/id": "01J7Y831N6DC58716N7ZM5P2HC",
		"content": "",
		"initialRows": 10,
		"enableShareButton": true,
		"isAutoSaveEnabled": false,
		"isPlatformAuthEnabled": false,
		"isSessionOutputsEnabled": true,
		"backgroundTask": true,
		"nonInteractive": false,
		"interactive": true,
		"fontSize": 16,
		"fontFamily": "Menlo, Monaco, 'Courier New', monospace",
		"rows": 10,
		"cursorStyle": "bar",
		"cursorBlink": true,
		"cursorWidth": 1,
		"smoothScrollDuration": 0,
		"scrollback": 1000,
		"closeOnSuccess": true
	}
}
@jlewi jlewi added ai-quality likelihood/4-most-users https://lostgarden.com/2008/05/20/improving-bug-triage-with-user-pain/ type/6-major-usability labels Oct 8, 2024
@jlewi
Copy link
Owner Author

jlewi commented Oct 10, 2024

Here's my Runme doc looking at some sample data. It looks like in the case of a non-interactive cell the stdout is available in one of the cell output items. For interactive cells it doesn't look like the cell output is available in either of the two output items.

Non-interactive cells

It looks like the mime types of the output items are

  • stateful.runme/output-items
    • data is a base64 encoded JSON dictionary
    • This dictionary contains the field output which is itself a dictionary which contains a field "content" which contains the base64 encoded stdout
         "type": "stateful.runme/output-items",
         "output": {
                 "content": "VGhpcyBjZWxsIGlzIGV4ZWN1dGVkIGluIG5vbiBpbnRlcmFjdGl2ZSBtb2RlIFdlZCBPY3QgIDkgMjA6Mjg6NTUgUERUIDIwMjQNCg==",
                 "mime": "text/plain",
                 "id": "01J9T7H2DRRW89QJMQNV0SS3YM"
         }
      
  • application/vnd.code.notebook.stdout
    • data is the base64 version of the stdout

Interactive Cells

It looks like the mime types of the output items are

  • application/vnd.code.notebook.stdout
    • But there is no actual data
  • stateful.runme/terminal
    • Data is a base64 encoded json dictionary that doesn't have stdout or stderr

Questions

@sourishkrout

For non-interactive cells does it matter which output item we use to get the output from? i.e. whether we use the item with mime type application/vnd.code.notebook.stdout or the mime type stateful.runme/output-items? How would stderr end up being encoded?

For interactive cells how difficult would it be to fetch stdout and stderr and include that in the requests that get sent to Foyle?

@jlewi
Copy link
Owner Author

jlewi commented Oct 10, 2024

I dug into the RunMe code to try to figure out what's going on.

Output types are defined here
https://github.com/stateful/vscode-runme/blob/3e36b16e3c41ad0fa38f0197f1713135e5edb27b/src/constants.ts#L6

There doesn't appear to be one for stderr.

It looks like the function generateOutputUnsafe is responsible for generating the CellOutputItems when a cell is executed.

It looks like it goes down the code branch for case OutputType.terminal regardless of whether its interactive or non-interactive terminal.

It looks like this is the line where it tries to generate a serialized representation of the terminal contents.
https://github.com/stateful/vscode-runme/blob/3e36b16e3c41ad0fa38f0197f1713135e5edb27b/src/extension/cell.ts#L239

It looks like in the case of non-interactive terminal terminalState is an instance of LocalBufferTermState and serialize returns the actual stdout.

For interactive terminal it looks like terminalState is an instance of XtermState and Serialize returns an empty string. It looks like this is using xterm-addon-serialize to get the serialized data.

Interestingly, if I save the session outputs, the outputs of interactive cells is saved in the markdown file. Does that use a different code path?

It looks like in all code paths for terminal it uses "NotebookCellOutputItem.stdout"
https://github.com/stateful/vscode-runme/blob/3e36b16e3c41ad0fa38f0197f1713135e5edb27b/src/extension/cell.ts#L292 to create an output item containing stdout
https://github.com/microsoft/vscode/blob/f5946cee2e01f1702426efd954ecc7250d8e5075/src/vs/workbench/api/common/extHostTypes.ts#L3818

So it looks like we can use the output item with mime type application/vnd.code.notebook.stdout to get the stdout.

@jlewi
Copy link
Owner Author

jlewi commented Oct 10, 2024

On the Foyle side; we convert from Cell protos to Block protos here

func CellOutputToBlockOutput(output *parserv1.CellOutput) (*v1alpha1.BlockOutput, error) {

This should preserve the mime type and data contents.
It looks like if there are multiple output items all of them should be included when the block is converted to markdown.

for _, output := range block.GetOutputs() {

So my suspicion is that the prompt does get included in non-interactive cells but not interactive cells but most cells are interactive as that appears to be the default.

@jlewi
Copy link
Owner Author

jlewi commented Oct 10, 2024

So I tested it with a non-interactive cell. The command in the cell was

echo "The current time is $(date)."

If you execute it and then generate a completion the markdown that gets sent to the LLM is

echo "The current time is $(date)."
{
	"type": "stateful.runme/output-items",
	"output": {
		"content": "VGhlIGN1cnJlbnQgdGltZSBpcyBUaHUgT2N0IDEwIDE2OjQxOjQ3IFBEVCAyMDI0Lg0K",
		"mime": "text/plain",
		"id": "01J9WCSV2CAQVERX6JT7T8DZ5G"
	}
}
The current time is Thu Oct 10 16:41:47 PDT 2024.

So the output is included. Although it looks like we might want to filter out the mime type stateful.runme/output-items.

What to do about interactive cells?

I suspect there is a bug in how I'm serializing the notebook in the vscode-runme frontend before sending it over the wire to Foyle. The code does this by calling marshalNotebook

https://github.com/stateful/vscode-runme/blob/3e36b16e3c41ad0fa38f0197f1713135e5edb27b/src/extension/ai/generate.ts#L46
https://github.com/stateful/vscode-runme/blob/3e36b16e3c41ad0fa38f0197f1713135e5edb27b/src/extension/ai/ghost.ts#L111

@sourishkrout Am I missing a call to addExecInfo before serializing the notebook? It seems like that function might be synchronizing information from the terminal to the notebookData structure?

jlewi added a commit that referenced this issue Oct 11, 2024
…stateful.runme/output-items in prompt

* Runme has multiple output items for each cell. It looks like the mime type
  application/vnd.code.notebook.stdout is usually the one that includes
  stdout

* Items with stateful.runme/terminal and stateful.runme/output-items in prompt include
  JSON objects that don't seem relevant to getting help from the AI
  so we want to filter them out when converting to markdown so they don't confuse the AI.

Related to: #286
jlewi added a commit that referenced this issue Oct 11, 2024
#293)

…stateful.runme/output-items in prompt

* Runme has multiple output items for each cell. It looks like the mime
type application/vnd.code.notebook.stdout is usually the one that
includes stdout

* Items with stateful.runme/terminal and stateful.runme/output-items in
prompt include JSON objects that don't seem relevant to getting help
from the AI so we want to filter them out when converting to markdown so
they don't confuse the AI.

Related to: #286
@sourishkrout
Copy link

This likely relates to how the notebook state is snapshotted in the AI parts of the extension. I'll look into the details next week. We've recently done some work to "centralize" this so consumers other than the actual notebook serialization can use it. However, I have to make sure it supports the use case.

For non-interactive cells, this is also associated with #143.

jlewi added a commit that referenced this issue Oct 14, 2024
* Force interactive to false on the returned cells as a temporary work around for #286
@sourishkrout
Copy link

sourishkrout commented Oct 15, 2024

Questions

@sourishkrout

For non-interactive cells does it matter which output item we use to get the output from? i.e. whether we use the item with mime type application/vnd.code.notebook.stdout or the mime type stateful.runme/output-items? How would stderr end up being encoded?

For interactive cells how difficult would it be to fetch stdout and stderr and include that in the requests that get sent to Foyle?

So for non-interactive cells, please use application/vnd.code.notebook.stdout which is the VS Code API default for stdout/stderr. The reason we have stateful.runme/output-items and some form of duplication is that VS Code's notebook APIs tie web-components tasked with rendering the UI to a media-type and to have the custom UI/UX such as action buttons (Copy, Save, etc) available we are using stateful.runme/output-items. While stateful.runme/output-items is subject to UI-driven changes, application/vnd.code.notebook.stdout will stay stable and only have the output.

Re, stderr in non-interactive, I've not been happy with how we're handling stdout vs stderr. I have a refactor on my list that will only send stdout into the notebook (primarily because third-party rich renderers are stdio-unaware) and, unless otherwise configured on the cell or doc, will omit stderr in the notebook-integrated terminal. However, with the same refactor, we'll introduce a multiplexed panel-based terminal that will receive both stdout and stderr. The idea is to enable the UI/UX to highlight the panel-based terminal for non-zero exits, either auto-focusing the terminal or providing a button to call the user's attention to the error.

I am continuing with interactive exec in a separate comment/analysis.

@sourishkrout
Copy link

Interactive Cells

It looks like the mime types of the output items are

  • application/vnd.code.notebook.stdout

    • But there is no actual data
  • stateful.runme/terminal

    • Data is a base64 encoded json dictionary that doesn't have stdout or stderr

Similarly for interactive execution, stateful.runme/terminal renders a web-component with an instance of xterm.js that's configured according to the payload. Studio is bidirectionally streamed directly between kernel <> terminal web-component because the notebook APIs are not capable of terminal emulation, and replacing outputs too frequently causes side-effects that make the notebook unusable (flickering, excessive resource consumption, etc).

So the question here's: why is application/vnd.code.notebook.stdout empty?

jlewi added a commit that referenced this issue Oct 15, 2024
# Experiment Report

After running an evaluation experiment, we compute a report that
contains the key metrics we want to track. To start with this is

* Number of cell match results
* Number of errors and examples
* Generate latency measured as percentiles
* Level1 assertion stats

# Level 1 Assertion stats

* Add a level 1 assertion to test whether the document is zero.
* I believe I observed this started happening when we included a fix to
outputs not being included (#286) in #285.
* I think the problem is the cell outputs could be very long and this
could end up eating all the available context buffer

# Reintegrate Level 1 Assertions Into Evaluation
* Fix #261 
* We start computing level 1 assertions at RunTime so that they are
available in production and evaluation
* Level1 assertions are computed and then logged
* Our Analyzer pipeline reads the assertions from the logs and adds them
to the trace
* Our evaluation report accumulates assertion statistics and reports
them
@sourishkrout
Copy link

sourishkrout commented Oct 15, 2024

Shouldn't this go through serialization before logging it, @jlewi? Please see GrpcSerializer.marshalNotebook calls in both generate.ts or ghost.ts.

https://github.com/stateful/vscode-runme/blob/6eaa8a2d30c29cc7e9c5e84f800b94ad6076758f/src/extension/kernel.ts#L777

There might also be a secondary issue that Auto-save (on) has to be active because otherwise, the terminal output will not be processed. The UI/UX state is awkwardly coupled here, which wasn't/isn't a problem for what outputs have been used for non-AI. We can fix that separately.

@sourishkrout
Copy link

sourishkrout commented Oct 15, 2024

@sourishkrout Am I missing a call to addExecInfo before serializing the notebook? It seems like that function might be synchronizing information from the terminal to the notebookData structure?

Update: Yes, you're right. I also totally overlooked that marshalNotebook is called and not serializeNotebook, which includes the logic to fill the terminal buffer. We could attempt moving this business logic up the call chain.

For a different use case, I see the team's used a cache, which we could generalize, but I worry about synchronization issues. The cache might be behind by a version since it's built for saving, not editing.

https://github.com/stateful/vscode-runme/blob/65eac909b37922eddac1ed3df4f3d1200a2e71c9/src/extension/messages/platformRequest/saveCellExecution.ts#L112-L116

@jlewi
Copy link
Owner Author

jlewi commented Oct 22, 2024

As a work around, I modified Foyle to return all code cells as non-interactive cells.
This works in terms of the output being included in the subsequent prompts.
However, non-interactive cells don't show stderr which make them unusable in many instances.

stateful/runme#684

jlewi added a commit to stateful/vscode-runme that referenced this issue Oct 23, 2024
This fixes a bug in the serializaiton of the notebook before sending
it to Foyle that caused the output of interactive cells not to be
included in the requests.

The problem is that we need to call addExecInfo before converting
the VSCode NotebookData representation to the proto. That
handles copying the output of the interactive terminals into
the NotebookData structure.

This necessitated some code refactoring. In order to call
addExecInfo we need an instance of the kernel.

We create a new Converter class to keep track of the kernel
and also provide reuse in the logic for converting notebook data to
protos for Foyle.

Since addExecInfo is async we need to change buildReq to return
a promise and refactor some of the logic to be non blocking.

* Fix jlewi/foyle#286
jlewi added a commit to jlewi/vscode-runme that referenced this issue Oct 24, 2024
stateful#1756
branch: jlewi/outputs

commit d279e74
Author: Jeremy Lewi <jeremy@lewi.us>
Date:   Wed Oct 23 15:25:38 2024 -0700

    Include output from interactive cells in Foyle requests

    This fixes a bug in the serializaiton of the notebook before sending
    it to Foyle that caused the output of interactive cells not to be
    included in the requests.

    The problem is that we need to call addExecInfo before converting
    the VSCode NotebookData representation to the proto. That
    handles copying the output of the interactive terminals into
    the NotebookData structure.

    This necessitated some code refactoring. In order to call
    addExecInfo we need an instance of the kernel.

    We create a new Converter class to keep track of the kernel
    and also provide reuse in the logic for converting notebook data to
    protos for Foyle.

    Since addExecInfo is async we need to change buildReq to return
    a promise and refactor some of the logic to be non blocking.

    * Fix jlewi/foyle#286
sourishkrout pushed a commit to stateful/vscode-runme that referenced this issue Oct 30, 2024
* Include output from interactive cells in Foyle requests

This fixes a bug in the serializaiton of the notebook before sending
it to Foyle that caused the output of interactive cells not to be
included in the requests.

The problem is that we need to call addExecInfo before converting
the VSCode NotebookData representation to the proto. That
handles copying the output of the interactive terminals into
the NotebookData structure.

This necessitated some code refactoring. In order to call
addExecInfo we need an instance of the kernel.

We create a new Converter class to keep track of the kernel
and also provide reuse in the logic for converting notebook data to
protos for Foyle.

Since addExecInfo is async we need to change buildReq to return
a promise and refactor some of the logic to be non blocking.

* Fix jlewi/foyle#286

* Update to use await.

* Add a comment.

* Use await.
jlewi added a commit that referenced this issue Oct 30, 2024
* Forcing cells to be non-interactive was a temporary fix for the
  fact that the output wasn't included in Foyle completion requests (#286)

* This is now fixed in the frontend so that the output of interactive cells
  is included in the requests to Foyle

* We don't want to default to non-interactive cells because non-interactive
  cells don't show stderr

Related to /issues/286
jlewi added a commit that referenced this issue Oct 30, 2024
* Forcing cells to be non-interactive was a temporary fix for the fact
that the output wasn't included in Foyle completion requests (#286)

* This is now fixed in the frontend so that the output of interactive
cells is included in the requests to Foyle

* We don't want to default to non-interactive cells because
non-interactive cells don't show stderr

Related to /issues/286
hotpocket pushed a commit to hotpocket/vscode-runme that referenced this issue Nov 5, 2024
* Include output from interactive cells in Foyle requests

This fixes a bug in the serializaiton of the notebook before sending
it to Foyle that caused the output of interactive cells not to be
included in the requests.

The problem is that we need to call addExecInfo before converting
the VSCode NotebookData representation to the proto. That
handles copying the output of the interactive terminals into
the NotebookData structure.

This necessitated some code refactoring. In order to call
addExecInfo we need an instance of the kernel.

We create a new Converter class to keep track of the kernel
and also provide reuse in the logic for converting notebook data to
protos for Foyle.

Since addExecInfo is async we need to change buildReq to return
a promise and refactor some of the logic to be non blocking.

* Fix jlewi/foyle#286

* Update to use await.

* Add a comment.

* Use await.
@jlewi
Copy link
Owner Author

jlewi commented Dec 13, 2024

I belief this is fixed by the linked PRs.

@jlewi jlewi closed this as completed Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai-quality likelihood/4-most-users https://lostgarden.com/2008/05/20/improving-bug-triage-with-user-pain/ type/6-major-usability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants