Skip to content

Conversation

@filiptronicek
Copy link
Member

@filiptronicek filiptronicek commented Aug 19, 2022

Description

This PR adds error counting of failed VS Code Web resources to track down #12107. We investigated one such case at level of GCP Load Balancer and found out that resources are successfully returned by blobserve. [1] Something goes wrong in the browser.

gitpod-io/openvscode-server#417 is a PR to instrument critical resources with error handler.

Related Issue(s)

Related to #12107

How to test

  • Open a preview environment workspace
  • Cause a load error by for example blocking workbench.web.main.css as a source
  • Check if error count increased (with label)
  • Exec throw new Error('1') in browser devTool terminal
  • Check if error count increased

How to monitor Prometheus data

  • Start a new workspace with this PR
  • Run ./dev/preview/portforward-monitoring-satellite.sh -c harvester
  • Open in Browser

Release Notes

NONE

Documentation

Werft options:

  • /werft with-preview

@akosyakov
Copy link
Member

@filiptronicek you need to change installer configuration to add resource and error labels here:

{
Name: "gitpod_supervisor_frontend_error_total",
Help: "Total count of supervisor frontend client errors",
},

@akosyakov
Copy link
Member

@filiptronicek there is a PR in openvscode server we need to review it as well?

@filiptronicek filiptronicek force-pushed the ft/capture-errors-frontend branch 2 times, most recently from 6448f8b to 0fc1bc4 Compare August 19, 2022 17:16
@filiptronicek
Copy link
Member Author

It seems that if I block the CSS from responding the error gets captured and if I don't there are no errors to report. I have to check next whether they actually get reported.
image

@filiptronicek
Copy link
Member Author

Updated the configmap with the labels in 3cfa878 (#12222).

@filiptronicek filiptronicek marked this pull request as ready for review August 20, 2022 14:09
@filiptronicek filiptronicek requested a review from a team August 20, 2022 14:09
@akosyakov
Copy link
Member

@iQQBot is going to create a PR which allow to post metrics via usual fetch directly to IDE proxy.
@mustard-mh Coudl you please:

Copy link
Contributor

@iQQBot iQQBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheus don't allow dynamic change labels

@akosyakov
Copy link
Member

prometheus don't allow dynamic change labels

@iQQBot I'm not sure I understand it. We for sure would like to add new labels as we learn about new error kinds we don't want to create a new metric each time. Can we restart metrics component to pick up new labels? I think it is rather deployment issue or metrics component, but not this PR.

@iQQBot
Copy link
Contributor

iQQBot commented Aug 23, 2022

prometheus don't allow dynamic change labels

@iQQBot I'm not sure I understand it. We for sure would like to add new labels as we learn about new error kinds we don't want to create a new metric each time. Can we restart metrics component to pick up new labels? I think it is rather deployment issue or metrics component, but not this PR.

This is a bit complicated, for example, if you add a label resource for now, then all subsequent clients must fill in this label when reporting this metrics, otherwise prometheus will not accept the request, unless we design a default value

And this can also lead to a series of problems when merging queries, For example, if we add the label resource and restart the metrics-server, there will be two sets of data on prometheus, and when you do sum or something like that, there will be some strange problems, such as their meaning changing, especially when changing the name

@akosyakov
Copy link
Member

This is a bit complicated, for example, if you add a label resource for now, then all subsequent clients must fill in this label when reporting this metrics, otherwise prometheus will not accept the request, unless we design a default value

default value like unknown sounds good to me for old clients

@akosyakov
Copy link
Member

And this can also lead to a series of problems when merging queries, For example, if we add the label resource and restart the metrics-server, there will be two sets of data on prometheus, and when you do sum or something like that, there will be some strange problems, such as their meaning changing, especially when changing the name

Is not it the same with other components like if someone add a new error code in server for http request?

@iQQBot
Copy link
Contributor

iQQBot commented Aug 23, 2022

Is not it the same with other components like if someone add a new error code in server for http request?

Not very clear, can you elaborate?

@mustard-mh mustard-mh force-pushed the ft/capture-errors-frontend branch from 2992972 to f5d23a9 Compare August 25, 2022 11:04
@mustard-mh
Copy link
Contributor

I'll test and post images here after werft build succeed

@iQQBot iQQBot self-requested a review August 25, 2022 14:34
@mustard-mh mustard-mh force-pushed the ft/capture-errors-frontend branch 3 times, most recently from e63ad70 to 70806fd Compare August 26, 2022 13:40
@mustard-mh
Copy link
Contributor

Force push to use Record<string, string> instead of Map (tested and found labels not worked (not sent) in preview env, because Map stringify not works)

@filiptronicek
Copy link
Member Author

Force push to use Record<string, string> instead of Map (tested and found labels not worked (not sent) in preview env, because Map stringify not works)

@mustard-mh have you investigated using Maps and then just converting to JSON with the following? I personally do not mind using objects, but this could be an alternative (this should work every time if the map keys are all strings, which in our case are [1]).

JSON.stringify(Object.fromEntries(labels))

@mustard-mh
Copy link
Contributor

mustard-mh commented Aug 26, 2022

Force push to use Record<string, string> instead of Map (tested and found labels not worked (not sent) in preview env, because Map stringify not works)

@mustard-mh have you investigated using Maps and then just converting to JSON with the following? I personally do not mind using objects, but this could be an alternative (this should work every time if the map keys are all strings, which in our case are [1]).

JSON.stringify(Object.fromEntries(labels))

💡 Good to know this. Record is enough and easier for us ( and don't want to wait another 30+ minutes 🙈)

@mustard-mh mustard-mh force-pushed the ft/capture-errors-frontend branch from 70806fd to 169d65b Compare August 26, 2022 14:27
@mustard-mh
Copy link
Contributor

Force push to

  • Remove content-type in fetch function to reduce an OPTION request (same with openvscode-server's PR)
  • Update changes of configmap to correct position

Copy link
Member Author

@filiptronicek filiptronicek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise LGTM. Tested and it seems like the load errors get reported correctly (see screenshot). I am not sure whether the script errors get reported as well though, because it seems like from the devtools console they do not get caught - maybe I just looked in the wrong place.

image

Copy link
Contributor

@iQQBot iQQBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and it works well 🎉

image

image

/hold

in case somebody want to have a look again, feel free to unhold it

@iQQBot
Copy link
Contributor

iQQBot commented Aug 26, 2022

Codewise LGTM. Tested and it seems like the load errors get reported correctly (see screenshot). I am not sure whether the script errors get reported as well though, because it seems like from the devtools console they do not get caught - maybe I just looked in the wrong place.

image

if you try throw new Error("xxx") and monitor network tab, you will see it

@akosyakov
Copy link
Member

Most important that we get errors on load of VS Code resources. If we can track it now. Let's bring it in. We need it in both insider and stable. It would be ideally to have a label as well whether error is from stable or latest, but for now let's skip it.

JS errors is interesting but I think it would be more important to report them to GCP first that we can categorize and do something about them. It is for later as well.

@mustard-mh
Copy link
Contributor

I'm going to unhold it

@mustard-mh
Copy link
Contributor

/unhold

@roboquat roboquat merged commit cbe5665 into main Aug 26, 2022
@roboquat roboquat deleted the ft/capture-errors-frontend branch August 26, 2022 16:35
@mads-hartmann mads-hartmann mentioned this pull request Aug 26, 2022
1 task
@roboquat roboquat added deployed: IDE IDE change is running in production deployed Change is completely running in production labels Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployed: IDE IDE change is running in production deployed Change is completely running in production release-note-none size/L team: IDE

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants