-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create machine logs to support AI retraining #585
Conversation
To support collecting implicit feedback and retraining an AI we need to create machine readable logs that contain the actual command executed. The design is described in stateful#574. With this change and ai logs enabled here is what the logs look like. On the console a sample log message looks like the following 1.716425780157274e+09 info runner/service.go:233 received initial request {"_id": "01HYHF5TXWTPV0ZGSQP9WKMZP9", "req": "{\"commands\":[\"ls\"]}"} In the JSON log we get a log message like the following {"severity":"info","time":1716425780.157274,"caller":"runner/service.go:233","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","message":"received initial request","_id":"01HYHF5TXWTPV0ZGSQP9WKMZP9","req":"{\"commands\":[\"ls\"]}"} Related to stateful#574
@sourishkrout This is ready for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense 👍; I've added some comments I'd like to address before merging.
internal/cmd/server.go
Outdated
cmd.Flags().StringVar(&tlsDir, "tls", defaultTLSDir, "Directory in which to generate TLS certificates & use for all incoming and outgoing messages") | ||
|
||
cmd.Flags().StringVar(&configDir, configDirF, GetDefaultConfigHome(), "If ai logs is enabled logs will be written to ${config-dir}/logs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to reuse fLogFilePath
as base? The config home is a bit weird on darwin
(and frankly, tentative). Also, as part of the beta
commands (incomplete alpha) we are migrating config away from a central location to a runme.yaml
that resides inside a project/repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we can break this down into separate questions
- What's the default location for logs? e.g. is it ${CONFIGDIR}/logs?
- What flags control the location of the logs?
Regarding the first question.
- I don't think we want the default value to be a temporary directory because the logs are too valuable to be lost when the temporary directory gets cleaned up
we are migrating config away from a central location to a runme.yaml that resides inside a project/repository
Are you suggesting the logs get stored relative to the project/repo specific runme.yaml? Or is this just an indication that the runme user configuration is going away in favor of per project/repository configuration.
Either way; I think of log location as being configured per user not per project/repository. Which implies we need some per user directory.
The config home is a bit weird on darwin
What about ${HOME}/.runme for runme user configuration and ${HOME}/.runme/logs
Two; I think once we agree on where logs should be stored we can decide if it makes sense to change fLogFilePath to use it and if so use it to compute the location of the AI Logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point here, especially that it's a user
vs a server
log. That really answers both questions 1. & 2. for me. Here's my suggestion, we roll with your code change as is, however, we rename GetDefaultConfigHome
to GetUserConfigHome
to make the difference in base directories more apparent and future proof @jlewi?
Ideally GetDefaultConfigHome
/ GetUserConfigHome
would actually be rooted in the home directory (on all platforms) but we can fix that at a later time. I'll create a bug ticket for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
logger.Error("failed to marshal request", zap.Error(err)) | ||
} | ||
|
||
// We want to always log the request because it is used for AI training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to add a tests for his in https://github.com/jlewi/runme/blob/jlewi/logs/internal/runner/service_test.go - the reason being is that I'm currently in the middle of charting a migration from runnerv1
(this impl) to runnerv2
(alpha; future) and the tests serve as a spec to not have things fall thru the cracks.
runnerv2
lives over here https://github.com/jlewi/runme/tree/jlewi/logs/internal/runnerv2service but instead of syncing the service impl, I'd rather have a comprehensive test suite in runnerv1
as guard rails for filling in all the gaps.
I'm happy to take a stab at adding a test case if you need a template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sourishkrout added a test. The test passes locally when I run it individually and the tests appear to be green in CI. However, as noted in #587 I'm seeing test failures when running locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at the test failures. Some of the newly introduced tests for runnerv2 have flaking issue which is something we're working on separately. They haven't gone through the hardening of runnerv1 tests. I'll take a look to get some clarity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the #587 expedition. After rerunning the C/I job it looks like it's indeed the flaking issue we are working on.
@sourishkrout PTAL; I did the rename of the logging directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅
As described in jlewi/foyle#211 we will no longer rely on processing RunMe grpc logs to train the AI. This means we can simplify the Runme logging code and revert some of the changes in #585 * Related to stateful/vscode-runme#1589
To support collecting implicit feedback and retraining an AI we need to create machine readable logs that contain the actual command executed.
The design is described in #574.
With this change and ai logs enabled here is what the logs look like.
On the console a sample log message looks like the following in dev mode
In non dev mode the console output is
In the JSON log we get a log message like the following
Note that "req" is logged as a nested json object; its not string field containing the serialized JSON of the request.
Prior to this change here's what the console logs look like.
In dev mode:
In non dev mode:
Related to #574