Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS Code: improve characters logger #5855

Merged
merged 3 commits into from
Oct 14, 2024
Merged

VS Code: improve characters logger #5855

merged 3 commits into from
Oct 14, 2024

Conversation

valerybugakov
Copy link
Member

@valerybugakov valerybugakov commented Oct 10, 2024

This update fixes data discrepancies in calculating the percentage of code written by Cody events by increasing the accuracy of the CharactersLogger. Previously, the logger was counting character changes not directly typed by the user. Now, document changes are grouped by sources, providing flexibility in handling this data as needed.

The new cody.characters telemetry event structure:

{
  "normal_inserted": 0,
  "normal_deleted": 0,
  "undo_inserted": 0,
  "undo_deleted": 0,
  "redo_inserted": 0,
  "redo_deleted": 0,
  "windowNotFocused_inserted": 0,
  "windowNotFocused_deleted": 0,
  "nonVisibleDocument_inserted": 3,
  "nonVisibleDocument_deleted": 0,
  "inactiveSelection_inserted": 0,
  "inactiveSelection_deleted": 0,
  "rapidLargeChange_inserted": 0,
  "rapidLargeChange_deleted": 0
}

Test plan

CI with new unit tests.

@valerybugakov
Copy link
Member Author

@kelsey-brown @Tobiryd, let me know if you think this is too restrictive. I plan to make a patch release once this is merged and observe changes in the events we receive while looking at what's required for the agent fix.

@kelsey-brown
Copy link

kelsey-brown commented Oct 10, 2024

@valerybugakov thank you so much for working on this! I think my only concerns with this approach is the "automated or bulk edits" restrictions. I worry this will exclude changes cody made? 15-20% of fixups are >1000 characters, for example...would we not log that cody written code in the characters logger?

Probably not the end of the world if that's the only way to make this work, we could exclude it from our calculation of "cody written code" as well so we don't throw the percentages off. But I just want to understand what the implications of that are so we can account for them in the metric!

@kelsey-brown
Copy link

@valerybugakov another idea @dadlerj had is just to log as much metadata as possible about the type of insertion (ex. a bulk change, an undo action, a redo action, etc) and then we can make a call on our end about which insertions to include exclude from the metric based on how we want to define the metric, rather than doing it all in the code? It would allow us to get as much info as possible, and then use it to make a decision about the metric definition. It also means it would be easy to change/update the definition if we need to in the future. Not sure if that's easier or harder on your end, just another thing to consider.

Comment on lines +11 to +26
const DOCUMENT_CHANGE_TYPES = [
'normal',
'undo',
'redo',
'windowNotFocused',
'nonVisibleDocument',
'inactiveSelection',
'rapidLargeChange',
] as const

type DocumentChangeType = (typeof DOCUMENT_CHANGE_TYPES)[number]

// This flat structure is required by the 'metadata' field type in the telemetry event.
export type DocumentChangeCounters = {
[K in `${DocumentChangeType}_${'inserted' | 'deleted'}`]: number
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log as much metadata as possible about the type of insertion

@kelsey-brown, I love this idea. Here's the new event structure to capture all document change sources.

@valerybugakov valerybugakov marked this pull request as ready for review October 11, 2024 05:54
Copy link
Contributor

@arafatkatze arafatkatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such and interesting PR.

My main thing is that When Kelsey and Tobi mentioned the skewed metrics in the original Linear issue we were not able to make sense of the metrics because Vscode doesn't really have something like "these changes were specifically made by Cody alone" so having these heuristics can be very helpful to discern the difference between "We legitimately added this using Cody" vs "someone did a crazy git pull" or maybe someone is using something else like say supermaven for completions and then cody for chat(I am not even sure if that's possible but just making an example).

That being said I think we gotta be honest about taking this metric a little less seriously and perhaps having this as a decent first iteration that might require some tuning so that this makes more sense would be warranted. Coz what if certain kinds of paste/delete operations are harder to differentiate from operations done by Cody.

Alternatively perhaps we can do some sort of overengineering on the side of the queries that we use with this so perhaps we can make an educated guess from this query and then cross check that with the results of the autocomplete metrics separately. Just thinking outloud here.

I think we might learn the best from this AFTER we merge this so I am unblocking this right now. I like the heuristics and can't really suggest anything more.

@valerybugakov
Copy link
Member Author

valerybugakov commented Oct 14, 2024

Hey @kelsey-brown, I will merge this one because I plan to cut a patch release later today. If you have any suggestions, I can make additional changes in a follow-up PR.

@valerybugakov valerybugakov merged commit f9038ef into main Oct 14, 2024
41 of 42 checks passed
@valerybugakov valerybugakov deleted the vb/chars-logger branch October 14, 2024 06:56
valerybugakov added a commit that referenced this pull request Oct 15, 2024
This update fixes data discrepancies in calculating the percentage of
code written by Cody events by increasing the accuracy of the
`CharactersLogger`. Previously, the logger was counting character
changes not directly typed by the user. Now, document changes are
grouped by sources, providing flexibility in handling this data as
needed.

The new `cody.characters` telemetry event structure:

```json
{
  "normal_inserted": 0,
  "normal_deleted": 0,
  "undo_inserted": 0,
  "undo_deleted": 0,
  "redo_inserted": 0,
  "redo_deleted": 0,
  "windowNotFocused_inserted": 0,
  "windowNotFocused_deleted": 0,
  "nonVisibleDocument_inserted": 3,
  "nonVisibleDocument_deleted": 0,
  "inactiveSelection_inserted": 0,
  "inactiveSelection_deleted": 0,
  "rapidLargeChange_inserted": 0,
  "rapidLargeChange_deleted": 0
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants