Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track command usage with telemetry #68

Closed
Tracked by #58 ...
noklam opened this issue Aug 16, 2024 · 7 comments · Fixed by #84
Closed
Tracked by #58 ...

Track command usage with telemetry #68

noklam opened this issue Aug 16, 2024 · 7 comments · Fixed by #84
Assignees

Comments

@noklam
Copy link
Contributor

noklam commented Aug 16, 2024

We have download stats but we need a bit more details to help making decision for feature development.

We will introduce kedro viz command within the extension, we can add telemetry to get some insight about the usage. From my understanding, viz itself tracks the UI click but it wouldn't be sufficient to tell if it's from VSCode.

Questions

Do we implement the telemetry in TS or Python? This may depends on what do we want to track.

  • How many people are using the command (and how often)
  • Can we differentiate clicks from a browser versus inside vscode?

Python

Pro:

  • Leverage kedro-telemetry for consent
  • Use heap to send relevant event as event triggered as a "command"
  • Click event is embedded in flowchart (but is there a way to tell it's from VSCode?)

Con:

  • Limited as not everything trigger LSP server (e.g. "restart environment" command)

TS

Pro:

  • Most flexible

Con:

  • Need to implement telemetry from scratch maybe?
@noklam noklam changed the title Optional - Track command usage with telemetry. Track command usage with telemetry. Aug 20, 2024
@noklam noklam changed the title Track command usage with telemetry. Track command usage with telemetry Aug 20, 2024
@jitu5
Copy link
Contributor

jitu5 commented Aug 20, 2024

@noklam
In term of Kedro-Viz right now we have only two actions.

  1. To open kedro Viz flowchart from command palette. And this action involves running a LSP command kedro.getProjectData.
  2. When user clicks on dataset node on flowchart, from extension side we are running kedro.goToDefinitionFromFlowchart LSP command to open relevant file.

So we can go with Python and use kedro-telemetry.

@noklam
Copy link
Contributor Author

noklam commented Aug 27, 2024

For now, I decide to first understand how kedro-viz track telemetry and how the consent flow work.

I realise kedro-telemetry is not a complete solution since by default the extension will not execute the hook (ofcourse we can manually trigger it). We can borrow the consent logic from kedro-telemetry by vendoring the library.

For the actual telemetry tracking, I prefer to trigger this in TS since we can track the usage of command directly (compare to tracking it from requests to language server indirectly), this is also much easier to extend in the future.

In the meantime, I have some conversation with @ravi-kumar-pilla to research on viz's telemetry. I aim to kick start this work this week so we can finished it before the release (1st or 2nd week of September).

@ravi-kumar-pilla
Copy link

In the meantime, I have some conversation with @ravi-kumar-pilla to research on viz's telemetry. I aim to kick start this work this week so we can finished it before the release (1st or 2nd week of September).

Some information on how Telemetry works internally in Viz -

  1. FE: Kedro-Viz builds a telemetry.html by default (when you do make build, webpack does not touch telemetry.html as it is in public folder. More info - https://create-react-app.dev/docs/using-the-public-folder/#adding-assets-outside-of-the-module-system)
  2. BE: The FAST API app (apps.py file -> create_api_app_from_project) has registered a root GET request (I.e., initial doc request at home page and ET).
  3. The above request does the following - If consent is true (at .telemetry of Kedro project) and kedro_telemetry.plugin is available in the env, it gets the heap_app_id (in-case of dev from env HEAP_APPID_DEV or HEAP_APPID_PROD which is hardcoded in the plugin for PROD) and heap_user_identity from kedro-telemetry plugin
  4. Once we get the heap_app_id and heap_user_identity, we inject them into telemetry.html and then append the telemetry.html file into the section of index.html file.
  5. This html content is then served at root (@app.get(“/“), @app.get("/experiment-tracking”))

@noklam lets connect to discuss if this is not clear. Thank you

@noklam
Copy link
Contributor Author

noklam commented Aug 27, 2024

If I understand it correctly, telemetry.html is only appended to the UI if heap_app_id and heap_user_identity is not null (this is, consented). This is, btw outdated with how kedro-telemetry works today as viz re-implement some logic from kedor-telemetry, in the latest kedro-telemetry, we introduced a user ID that stored in a different place. (This maybe something that viz need to look into Cc @DimedS ).

So the main "consent" flow is still done in Python, via mimic kedro-telemetry

    @app.get("/")
    @app.get("/experiment-tracking")
    async def index():
        heap_app_id = kedro_telemetry.get_heap_app_id(project_path)
        heap_user_identity = kedro_telemetry.get_heap_identity()

How should I understand this code? Does that mean only the index page will check for consent but not the others?

@ravi-kumar-pilla
Copy link

How should I understand this code? Does that mean only the index page will check for consent but not the others?

This includes everything on viz. All other routes are subpaths

@DimedS
Copy link

DimedS commented Aug 27, 2024

after finishing framework telemetry opt-out, I created a ticket about consent check and UUID update in viz, some discussion is also there:
kedro-org/kedro-viz#2020

@noklam
Copy link
Contributor Author

noklam commented Aug 27, 2024

Thanks a lot @ravi-kumar-pilla ! Now I have the full picture of how telemetry would work in the extension. So Heap has 3 ways to track information.

  1. via the webpage directly (that is how Viz is tracking stuff currently)
  2. the other way is using a server API (kedro-telemetry), which is essentially a POST request.
  3. Client API (Js/TS)

AFAIK, the flowchart currently will not collect telemetry, because it's not served via the kedro-viz FastAPI route (see comments above). The most important thing to track now is the run of command "Kedro: Run Kedro Viz", to give us some sense how many people are trying to launch that view.

To achieve that, we need to implement this in two place:

  1. Use kedro-telemetry as a consent flow and get the user id, this information need to return to the extension later.
  2. Extension (client), need to check the consent whenever a command is triggered. If consent is given, send an custom event to HEAP.
  3. (Not needed for now, only need when we want to track the flowchart clicks) - the KedroViz reach component need a new prop, and take the consent & user id information.
  • Inject the telemetry.html into webview in similar way if consent is given.
  • need a custom way to handle slicing since the metrics are implemented separately Cc @Huongg

Cc @jitu5

@noklam noklam self-assigned this Aug 27, 2024
This was referenced Aug 30, 2024
@jitu5 jitu5 closed this as completed in #84 Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants