VS Code: improve characters logger #5855

valerybugakov · 2024-10-10T04:05:01Z

This update fixes data discrepancies in calculating the percentage of code written by Cody events by increasing the accuracy of the CharactersLogger. Previously, the logger was counting character changes not directly typed by the user. Now, document changes are grouped by sources, providing flexibility in handling this data as needed.

The new cody.characters telemetry event structure:

{
  "normal_inserted": 0,
  "normal_deleted": 0,
  "undo_inserted": 0,
  "undo_deleted": 0,
  "redo_inserted": 0,
  "redo_deleted": 0,
  "windowNotFocused_inserted": 0,
  "windowNotFocused_deleted": 0,
  "nonVisibleDocument_inserted": 3,
  "nonVisibleDocument_deleted": 0,
  "inactiveSelection_inserted": 0,
  "inactiveSelection_deleted": 0,
  "rapidLargeChange_inserted": 0,
  "rapidLargeChange_deleted": 0
}

Test plan

CI with new unit tests.

valerybugakov · 2024-10-10T04:06:55Z

@kelsey-brown @Tobiryd, let me know if you think this is too restrictive. I plan to make a patch release once this is merged and observe changes in the events we receive while looking at what's required for the agent fix.

kelsey-brown · 2024-10-10T15:46:22Z

@valerybugakov thank you so much for working on this! I think my only concerns with this approach is the "automated or bulk edits" restrictions. I worry this will exclude changes cody made? 15-20% of fixups are >1000 characters, for example...would we not log that cody written code in the characters logger?

Probably not the end of the world if that's the only way to make this work, we could exclude it from our calculation of "cody written code" as well so we don't throw the percentages off. But I just want to understand what the implications of that are so we can account for them in the metric!

kelsey-brown · 2024-10-10T18:08:31Z

@valerybugakov another idea @dadlerj had is just to log as much metadata as possible about the type of insertion (ex. a bulk change, an undo action, a redo action, etc) and then we can make a call on our end about which insertions to include exclude from the metric based on how we want to define the metric, rather than doing it all in the code? It would allow us to get as much info as possible, and then use it to make a decision about the metric definition. It also means it would be easy to change/update the definition if we need to in the future. Not sure if that's easier or harder on your end, just another thing to consider.

valerybugakov · 2024-10-11T05:49:09Z

vscode/src/services/CharactersLogger.ts

+const DOCUMENT_CHANGE_TYPES = [
+    'normal',
+    'undo',
+    'redo',
+    'windowNotFocused',
+    'nonVisibleDocument',
+    'inactiveSelection',
+    'rapidLargeChange',
+] as const
+
+type DocumentChangeType = (typeof DOCUMENT_CHANGE_TYPES)[number]
+
+// This flat structure is required by the 'metadata' field type in the telemetry event.
+export type DocumentChangeCounters = {
+    [K in `${DocumentChangeType}_${'inserted' | 'deleted'}`]: number
+}


log as much metadata as possible about the type of insertion

@kelsey-brown, I love this idea. Here's the new event structure to capture all document change sources.

arafatkatze

Such and interesting PR.

My main thing is that When Kelsey and Tobi mentioned the skewed metrics in the original Linear issue we were not able to make sense of the metrics because Vscode doesn't really have something like "these changes were specifically made by Cody alone" so having these heuristics can be very helpful to discern the difference between "We legitimately added this using Cody" vs "someone did a crazy git pull" or maybe someone is using something else like say supermaven for completions and then cody for chat(I am not even sure if that's possible but just making an example).

That being said I think we gotta be honest about taking this metric a little less seriously and perhaps having this as a decent first iteration that might require some tuning so that this makes more sense would be warranted. Coz what if certain kinds of paste/delete operations are harder to differentiate from operations done by Cody.

Alternatively perhaps we can do some sort of overengineering on the side of the queries that we use with this so perhaps we can make an educated guess from this query and then cross check that with the results of the autocomplete metrics separately. Just thinking outloud here.

I think we might learn the best from this AFTER we merge this so I am unblocking this right now. I like the heuristics and can't really suggest anything more.

valerybugakov · 2024-10-14T06:55:55Z

Hey @kelsey-brown, I will merge this one because I plan to cut a patch release later today. If you have any suggestions, I can make additional changes in a follow-up PR.

This update fixes data discrepancies in calculating the percentage of code written by Cody events by increasing the accuracy of the `CharactersLogger`. Previously, the logger was counting character changes not directly typed by the user. Now, document changes are grouped by sources, providing flexibility in handling this data as needed. The new `cody.characters` telemetry event structure: ```json { "normal_inserted": 0, "normal_deleted": 0, "undo_inserted": 0, "undo_deleted": 0, "redo_inserted": 0, "redo_deleted": 0, "windowNotFocused_inserted": 0, "windowNotFocused_deleted": 0, "nonVisibleDocument_inserted": 3, "nonVisibleDocument_deleted": 0, "inactiveSelection_inserted": 0, "inactiveSelection_deleted": 0, "rapidLargeChange_inserted": 0, "rapidLargeChange_deleted": 0 } ```

VS Code: improve characters logger

847f43b

valerybugakov added the autocomplete label Oct 10, 2024

valerybugakov self-assigned this Oct 10, 2024

valerybugakov added 2 commits October 11, 2024 13:10

Merge branch 'main' into vb/chars-logger

15c6587

Autocomplete: count all document change sources

6f85266

valerybugakov commented Oct 11, 2024

View reviewed changes

valerybugakov marked this pull request as ready for review October 11, 2024 05:54

valerybugakov requested review from dadlerj, arafatkatze, Tobiryd, abeatrix and kelsey-brown October 11, 2024 05:55

arafatkatze approved these changes Oct 13, 2024

View reviewed changes

valerybugakov merged commit f9038ef into main Oct 14, 2024
41 of 42 checks passed

valerybugakov deleted the vb/chars-logger branch October 14, 2024 06:56

This was referenced Oct 15, 2024

VS Code: Release v1.38.1 #5906

Merged

VS Code: log more data from characters logger #5931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VS Code: improve characters logger #5855

VS Code: improve characters logger #5855

valerybugakov commented Oct 10, 2024 •

edited

Loading

valerybugakov commented Oct 10, 2024

kelsey-brown commented Oct 10, 2024 •

edited

Loading

kelsey-brown commented Oct 10, 2024

valerybugakov Oct 11, 2024

arafatkatze left a comment •

edited

Loading

valerybugakov commented Oct 14, 2024 •

edited

Loading

VS Code: improve characters logger #5855

VS Code: improve characters logger #5855

Conversation

valerybugakov commented Oct 10, 2024 • edited Loading

Test plan

valerybugakov commented Oct 10, 2024

kelsey-brown commented Oct 10, 2024 • edited Loading

kelsey-brown commented Oct 10, 2024

valerybugakov Oct 11, 2024

Choose a reason for hiding this comment

arafatkatze left a comment • edited Loading

Choose a reason for hiding this comment

valerybugakov commented Oct 14, 2024 • edited Loading

valerybugakov commented Oct 10, 2024 •

edited

Loading

kelsey-brown commented Oct 10, 2024 •

edited

Loading

arafatkatze left a comment •

edited

Loading

valerybugakov commented Oct 14, 2024 •

edited

Loading