Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the Foyle API to report Log Events #1589

Merged
merged 21 commits into from
Sep 4, 2024
Merged

Use the Foyle API to report Log Events #1589

merged 21 commits into from
Sep 4, 2024

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Aug 28, 2024

  • For motivation and description see: Move AI Logging Of Execution Into VSCode Frontend and out of RunMe GRPC Server jlewi/foyle#211

  • This PR allows us to log all executions along with some context for the execution. This will allow us to improve learning. Right now we don't learn unless a user accepts a suggestion and edits it. This will allow us to learn even if the user didn't use a suggested cell.

  • This PR introduces a SessionManager for the AI. A session is initiated each time the user switches focus to a different cell. The session corresponds to all the activity related to generating completions as the user edits that cell. A session is associated with a context (basically the notebook) as well as events (cell executions, suggestion acceptances). We log the start and end of the session.

  • Logging the start and end of the session should simplify log processing on the backend because now we know when a session is closed and there can be no more events associated with the session. Notably, we want to be able to determine which suggestions were rejected rather than accepted. Once a session is closed we know the suggestion was either accepted (there will be an accepted event) or the suggestion was not accepted.

@jlewi
Copy link
Contributor Author

jlewi commented Aug 30, 2024

@sourishkrout PTAL when you have a second.

J

Copy link
Member

@sourishkrout sourishkrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM. The only nit I have is perhaps adding a test to guard against accidental removal/change of the event reporter.

tests/extension/kernel.test.ts should have sufficiently mocked examples to test for it. I'm happy to push a test on top if you want me to write one. Lemme know.

@sourishkrout
Copy link
Member

@jlewi, will this make the server side AI user logs entirely obsolete? I'm asking because I have a todo to port them from v1 to the v2 runner. Will that be moot now?

Copy link
Member

@sourishkrout sourishkrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I did find some issues with async/await.

@@ -691,6 +692,7 @@ export class Kernel implements Disposable {
}

TelemetryReporter.sendTelemetryEvent('cell.startExecute')
getEventReporter().reportExecution(cell)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reporter call is async, we should probably await it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that block execution below? My thinking was that we want LogEvent Reporting to be out of band and not block critical processing. If I was doing this in go I'd fire off a go routine. Is calling an async function and not awaiting it similar to doing go somefunc?

event.type = LogEventType.EXECUTE
event.cells = cells
event.contextId = SessionManager.getManager().getID()
this.reportEvents([event])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be returned to pass up the promise, expects Promise<void>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an await should to it too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it.

event.eventId = ulid()
}
}
await this.client.logEvents(req).catch((e) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return instead of await here otherwise the call stack won't account for asynchronicity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually if the return value is irrelevant the await will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we use a return here it will change the return type to Promise
I think Promise is the better return type because I don't think the caller of the reporter should be trying to process any results of the reporter.

@jlewi
Copy link
Contributor Author

jlewi commented Aug 30, 2024

@jlewi, will this make the server side AI user logs entirely obsolete?

Yes. I will send you a follow on PR to remove the EnableAILogs option in the frontend.

I was going to ask you if you wanted me to send you a PR to revert/cleanup much of stateful/runme#585

@sourishkrout
Copy link
Member

Yes. I will send you a follow on PR to remove the EnableAILogs option in the frontend.

I was going to ask you if you wanted me to send you a PR to revert/cleanup much of stateful/runme#585

Yes, on the cleanup PR, please.

Btw, I just pushed #1602 to add testing, which I needed for the review anyway.

@jlewi
Copy link
Contributor Author

jlewi commented Aug 30, 2024

Btw, I just pushed #1602 to add testing, which I needed for the review anyway

Thanks!

@jlewi
Copy link
Contributor Author

jlewi commented Aug 30, 2024

stateful/runme#661 has the gRPC server changes

@@ -691,6 +692,7 @@ export class Kernel implements Disposable {
}

TelemetryReporter.sendTelemetryEvent('cell.startExecute')
getEventReporter().reportExecution(cell)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourishkrout I'm replying to your comment
#1602 (review)
about adding an await here; on the original PR.

Yes. I incorrectly assumed the noop event reporter would be unless the "AI experiment" is turned on.

That was the intent. Is there a bug here that I need to fix? This is how I intended it to work.

The reporter should be the null op reporter by default
https://github.com/stateful/vscode-runme/pull/1589/files#diff-f0a5c022488c3922874e791475fce22933ea551ffbc4a37c414de2831f8af593R89

and then when the extension loads if the AutoCellExperiment is turned on it get changed to the actual event reproter.
https://github.com/stateful/vscode-runme/pull/1589/files#diff-d1df40b876292a3e89a6d7d6d6e7496db98fa7ad03028b3b0f78bc90487749bdR30

One potential issue I see is that if you disable AIAutoCell the reporter won't be switched back to the null op reporter until the extension is reloaded. I assume that can be triggered by doing a reload window?
That's not great but I didn't know how to subscribe to notifications when options changed and I figured its probably good enough for now.

I can rewrite this reasonably quickly at a later point. Wdyt @jlewi?
If your good with the current implementation then I'm happy.

For users trying out AI and Foyle I think this fixes a big UX issue. In particular, this will remove the need to explicitly add the location of the RunMe logs to the Foyle config. This also means we can turn on learning by default in Foyle (AiAutoCell would still be disabled by default in RunMe). So I see this is as a strict improvement.

The risk is that reporting cell executions causes a degraded experience. I've tried to mitigate that with the NullOp reporter. If users don't enable AI via the AutoCell option then no additional logic (modulo the actuall noop invocation) should be invoked.

Copy link
Member

@sourishkrout sourishkrout Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a // todo(sebastian): rewrite to use non-blocking impl on top of the reportExecution call and merge for now. 👍

My bad. I turned off AI logs not aiAutoCell to check the default behavior. In any case, as long as it's not default, we're good to go.

Re, settings requiring a hard reload: that's consistent with all other settings and not something to worry about in this context. Ghost cell completing might be something we want to move into the notebook toolbar and have a setting for the default state—however, one thing at a time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go: bc2b0ac

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourishkrout Thanks. I patched it. Your patch has an await in it. Did you mean to include the await?

* Cherry-pick Sebastian's change bc2b0ac
@@ -691,6 +692,8 @@ export class Kernel implements Disposable {
}

TelemetryReporter.sendTelemetryEvent('cell.startExecute')
// todo(sebastian): rewrite to use non-blocking impl
await getEventReporter().reportExecution(cell)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourishkrout The await here came from your patch. Did you mean to include an await and make it blocking? If we call await here then I assume we block until the event is full reported? Whereas I thought if we don't use await here then reportExecution would perform the RPC asynchronously and not block actual execution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's intentional, @jlewi. the way the node eventloop works is that without await here the call isn't just asynchronous, it can be terminate/dropped prematurely. Since I/O is involved, that's highly likely. If the delivery succeeds it'll likely just be due to a "lucky race" and as a result unreliable. I'm happy to rewrite this but as long as it's behind a experiment/feature flag, it's safe to merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourishkrout Is this good to merge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, approving the PR now.

@sourishkrout sourishkrout self-requested a review September 3, 2024 22:10
* Add to tests

* Add todo

* Let vs var
Copy link
Member

@sourishkrout sourishkrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM. Will take care of a non-blocking event reporter at a later time.

Copy link

sonarqubecloud bot commented Sep 3, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
37.4% Coverage on New Code (required ≥ 45%)

See analysis details on SonarCloud

@jlewi jlewi merged commit dda7d8a into main Sep 4, 2024
2 of 3 checks passed
@jlewi jlewi deleted the jlewi/logevents branch September 4, 2024 04:54
sourishkrout pushed a commit to stateful/runme that referenced this pull request Sep 5, 2024
As described in jlewi/foyle#211 we will no longer rely on processing
RunMe grpc logs to train the AI. This means we can simplify the Runme
logging code and revert some of the changes in #585

* Related to stateful/vscode-runme#1589
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants