-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VS Code: improve characters logger #5855
Conversation
@kelsey-brown @Tobiryd, let me know if you think this is too restrictive. I plan to make a patch release once this is merged and observe changes in the events we receive while looking at what's required for the agent fix. |
@valerybugakov thank you so much for working on this! I think my only concerns with this approach is the "automated or bulk edits" restrictions. I worry this will exclude changes cody made? 15-20% of fixups are >1000 characters, for example...would we not log that cody written code in the characters logger? Probably not the end of the world if that's the only way to make this work, we could exclude it from our calculation of "cody written code" as well so we don't throw the percentages off. But I just want to understand what the implications of that are so we can account for them in the metric! |
@valerybugakov another idea @dadlerj had is just to log as much metadata as possible about the type of insertion (ex. a bulk change, an undo action, a redo action, etc) and then we can make a call on our end about which insertions to include exclude from the metric based on how we want to define the metric, rather than doing it all in the code? It would allow us to get as much info as possible, and then use it to make a decision about the metric definition. It also means it would be easy to change/update the definition if we need to in the future. Not sure if that's easier or harder on your end, just another thing to consider. |
const DOCUMENT_CHANGE_TYPES = [ | ||
'normal', | ||
'undo', | ||
'redo', | ||
'windowNotFocused', | ||
'nonVisibleDocument', | ||
'inactiveSelection', | ||
'rapidLargeChange', | ||
] as const | ||
|
||
type DocumentChangeType = (typeof DOCUMENT_CHANGE_TYPES)[number] | ||
|
||
// This flat structure is required by the 'metadata' field type in the telemetry event. | ||
export type DocumentChangeCounters = { | ||
[K in `${DocumentChangeType}_${'inserted' | 'deleted'}`]: number | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log as much metadata as possible about the type of insertion
@kelsey-brown, I love this idea. Here's the new event structure to capture all document change sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such and interesting PR.
My main thing is that When Kelsey and Tobi mentioned the skewed metrics in the original Linear issue we were not able to make sense of the metrics because Vscode doesn't really have something like "these changes were specifically made by Cody alone" so having these heuristics can be very helpful to discern the difference between "We legitimately added this using Cody" vs "someone did a crazy git pull" or maybe someone is using something else like say supermaven for completions and then cody for chat(I am not even sure if that's possible but just making an example).
That being said I think we gotta be honest about taking this metric a little less seriously and perhaps having this as a decent first iteration that might require some tuning so that this makes more sense would be warranted. Coz what if certain kinds of paste/delete operations are harder to differentiate from operations done by Cody.
Alternatively perhaps we can do some sort of overengineering on the side of the queries that we use with this so perhaps we can make an educated guess from this query and then cross check that with the results of the autocomplete metrics separately. Just thinking outloud here.
I think we might learn the best from this AFTER we merge this so I am unblocking this right now. I like the heuristics and can't really suggest anything more.
Hey @kelsey-brown, I will merge this one because I plan to cut a patch release later today. If you have any suggestions, I can make additional changes in a follow-up PR. |
This update fixes data discrepancies in calculating the percentage of code written by Cody events by increasing the accuracy of the `CharactersLogger`. Previously, the logger was counting character changes not directly typed by the user. Now, document changes are grouped by sources, providing flexibility in handling this data as needed. The new `cody.characters` telemetry event structure: ```json { "normal_inserted": 0, "normal_deleted": 0, "undo_inserted": 0, "undo_deleted": 0, "redo_inserted": 0, "redo_deleted": 0, "windowNotFocused_inserted": 0, "windowNotFocused_deleted": 0, "nonVisibleDocument_inserted": 3, "nonVisibleDocument_deleted": 0, "inactiveSelection_inserted": 0, "inactiveSelection_deleted": 0, "rapidLargeChange_inserted": 0, "rapidLargeChange_deleted": 0 } ```
This update fixes data discrepancies in calculating the percentage of code written by Cody events by increasing the accuracy of the
CharactersLogger
. Previously, the logger was counting character changes not directly typed by the user. Now, document changes are grouped by sources, providing flexibility in handling this data as needed.The new
cody.characters
telemetry event structure:Test plan
CI with new unit tests.