Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Prebuilds stability #10361

Closed
11 tasks done
geropl opened this issue May 30, 2022 · 10 comments
Closed
11 tasks done

Epic: Prebuilds stability #10361

geropl opened this issue May 30, 2022 · 10 comments
Assignees
Labels
team: webapp Issue belongs to the WebApp team type: epic

Comments

@geropl
Copy link
Member

geropl commented May 30, 2022

Summary

The goal of this epic is to focus on two things:

  1. fix the most glaring issues we currently have with prebuilds
  2. ensure quality beyond that by adding tests and metrics as we see fit

Context

Historically the "prebuild experience" has been really shaky (cmp. #7812, for instance), mostly because:

  • it's a rather high-level feature that involves a lot of moving parts to ⚙️ into each other
  • it has close to zero test coverage
  • it's opaque to users as to a) why and b) if a prebuild is executed after they "triggered" it: some of the perceived instability is actually working as intended but a) we're doing a bad job at explaining it and b), it's not observable for users (not exactly this epic, but touches parts of this one)
  • we do not have proper metrics + alerts setup (SLI anyone)

As one of our quarterly goals is to improve perceived reliability, and we're driving usage-based pricing now, it feels like the right time to step up our game in this area. 💪

We already have two other related epics that have a certain overlap:

We might tackle those as well if there's time. But we start out with the issues listed here, and and work our way towards those. Also, I expect that the individual issues are a) outdated and b) have overlap themselves, so we'll have to draw and move the line as we go.

Value

  • we fix some immediate issues that plague our customers
  • we ensure those do not happen again, or are detected earlier when they happen again next time

Acceptance Criteria

  • all issues referenced in this epic are done ✔️
  • we have metrics from which we can derive a "success rate" (bonus points for having an SLI dashboard)

Measurement

The user perceived reliability increased.

Tasks:

Issues

deferring

Observability/Metrics

@geropl geropl added team: webapp Issue belongs to the WebApp team type: epic labels May 30, 2022
@geropl geropl moved this to Epic in Progress in 🍎 WebApp Team May 30, 2022
@lucasvaltl
Copy link
Contributor

👋 Other prebuild related issues that should also be included imo: #8942 and #10024

@geropl
Copy link
Member Author

geropl commented Jun 1, 2022

@lucasvaltl Thanks for sharing/chiming in! I added #10024 to the list ☝️

#8942, however, looks like a runtime issue, and more in Team Workspaces area of expertise. 👍

@AlexTugarev
Copy link
Member

Also, #10024 might not be a bug, at least not a functional one.

@AlexTugarev
Copy link
Member

@geropl, w.r.t. acceptance criteria, let's set a limit or define when to freeze this, otherwise this isn't valid.

@jldec
Copy link
Contributor

jldec commented Jun 1, 2022

I put a comment into #10024 pointing to the relevant docs issue (assuming it's just a case of missing init task.)

@jldec
Copy link
Contributor

jldec commented Jun 1, 2022

@AlexTugarev, @geropl, I also added #10341
and #8452 after discussing with @laushinka
Both of these are important for debugging and support.

@geropl
Copy link
Member Author

geropl commented Jun 1, 2022

@jldec I left those out intentionally. But let's see if we find time to have a look. 🙃

@geropl, w.r.t. acceptance criteria, let's set a limit or define when to freeze this, otherwise this isn't valid.
💯

Please, do not add any more issues! ❄️

@axonasif
Copy link
Member

axonasif commented Jun 3, 2022

Not sure if it's appropriate to put it here but hoping it is 😄
A user reported repeated SYSTEM ERROR for prebuilds.
IDs:

1. 4084cff8-5a8a-4505-a87d-88ce26d28023
2. d0960544-7e89-48de-9f3e-e5e6127767d3

Ref: https://discord.com/channels/816244985187008514/816246578594840586/982343828105211934

@woss
Copy link

woss commented Jun 3, 2022

Not sure if it's appropriate to put it here but hoping it is 😄 A user reported repeated SYSTEM ERROR for prebuilds. IDs:

1. 4084cff8-5a8a-4505-a87d-88ce26d28023
2. d0960544-7e89-48de-9f3e-e5e6127767d3

Ref: https://discord.com/channels/816244985187008514/816246578594840586/982343828105211934

That was me. I checked my files and realized that the path for the image inside the gitpod.yml was not correct. When corrected the manual rebuild started and finished successfully.

I hope this will help you in building even better product!!

@geropl
Copy link
Member Author

geropl commented Jul 4, 2022

Closing as all boxes are ticked ✔️ 🙃

@geropl geropl closed this as completed Jul 4, 2022
Repository owner moved this from Epic in Progress to Done in 🍎 WebApp Team Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team: webapp Issue belongs to the WebApp team type: epic
Projects
Archived in project
Development

No branches or pull requests

6 participants