Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2024-02-19] [$500] Create Automation for when main fails in /App #34110

Closed
blimpich opened this issue Jan 8, 2024 · 29 comments
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Daily KSv2 External Added to denote the issue can be worked on by a contributor Improvement Item broken or needs improvement.

Comments

@blimpich
Copy link
Contributor

blimpich commented Jan 8, 2024

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Problem:

We have a Github Action defined in App/.github/workflows/preDeploy.yml that can fail without being noticed. It runs every time a PR is merged into our main branch. When it fails there is no process for notifying engineers of its failure other than an email to the person who merged the PR, which can easily get lost or ignored. Previous examples of this happening and being handled manually are here and here.

Allowing this test suite to break and go unnoticed can make it difficult to diagnose what caused the issue in the first place, and opens us up to performance regressions that could negatively impact the user experience.

Solution:

Create an automation that produces a new Github issue when main fails. It should only create an issue if there isn't already an open issue for the associated breakage already. The issue should be labelled as daily and should automatically assign both the the author of the PR that contains the commit that kicked off the failed workflow run and the person who merged the PR into main. The issue should also link to the PR that is associated with the failed run.

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~018b564dd37aa88e42
  • Upwork Job ID: 1744445606630244352
  • Last Price Increase: 2024-01-08
  • Automatic offers:
    • jjcoffee | Reviewer | 28091182
    • rayane-djouah | Contributor | 28091183
Issue OwnerCurrent Issue Owner: @muttmuure
@blimpich blimpich added External Added to denote the issue can be worked on by a contributor Daily KSv2 Improvement Item broken or needs improvement. labels Jan 8, 2024
@melvin-bot melvin-bot bot changed the title Create Automation for when main fails in /App [$500] Create Automation for when main fails in /App Jan 8, 2024
Copy link

melvin-bot bot commented Jan 8, 2024

Job added to Upwork: https://www.upwork.com/jobs/~018b564dd37aa88e42

@melvin-bot melvin-bot bot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Jan 8, 2024
Copy link

melvin-bot bot commented Jan 8, 2024

Triggered auto assignment to Contributor-plus team member for initial proposal review - @jjcoffee (External)

@rayane-djouah
Copy link
Contributor

rayane-djouah commented Jan 8, 2024

Proposal

Please re-state the problem that we are trying to solve in this issue.

The current Process new code merged to main GitHub Action lacks a notification system for workflow failures, risking unnoticed issues and impacting user experience.

What is the root cause of that problem?

The absence of automated alerts delays issue detection as notifications rely on emails, often overlooked or lost.

What changes do you think we should make in order to solve the problem?

Implement a new workflow, failureNotifier.yml, triggered on Process new code merged to main completion. Create a GitHub issue on failure if there is no open failure notifier issue for the failed workflow, labeling it 'daily' and assigning it to the author and PR merger.

What alternative solutions did you explore? (Optional)

@jjcoffee
Copy link
Contributor

jjcoffee commented Jan 9, 2024

@rayane-djouah's proposal LGTM! I'm guessing we'll dig into the details on the PR.

🎀👀🎀 C+ reviewed

Copy link

melvin-bot bot commented Jan 9, 2024

Triggered auto assignment to @pecanoro, see https://stackoverflow.com/c/expensify/questions/7972 for more details.

@pecanoro
Copy link
Contributor

pecanoro commented Jan 9, 2024

The solution is pretty vague, but I am hoping you guys will handle it in the PR. Assigning @rayane-djouah

@melvin-bot melvin-bot bot removed the Help Wanted Apply this label when an issue is open to proposals by contributors label Jan 9, 2024
Copy link

melvin-bot bot commented Jan 9, 2024

📣 @jjcoffee 🎉 An offer has been automatically sent to your Upwork account for the Reviewer role 🎉 Thanks for contributing to the Expensify app!

Offer link
Upwork job

Copy link

melvin-bot bot commented Jan 9, 2024

📣 @rayane-djouah 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!

Offer link
Upwork job
Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑‍💻
Keep in mind: Code of Conduct | Contributing 📖

@pecanoro pecanoro added the Bug Something is broken. Auto assigns a BugZero manager. label Jan 9, 2024
Copy link

melvin-bot bot commented Jan 9, 2024

Triggered auto assignment to @muttmuure (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

@pecanoro pecanoro removed the Bug Something is broken. Auto assigns a BugZero manager. label Jan 9, 2024
@pecanoro
Copy link
Contributor

pecanoro commented Jan 9, 2024

I added the bug label so we can get someone from the Bug Zero team to handle payments later

@ishpaul777
Copy link
Contributor

ishpaul777 commented Jan 9, 2024

Hey @pecanoro i know i am late to party, would that be ok if i add a Proposal with implementation detail now, as selected proposal shares no Plan of action or any details on the solution

@rayane-djouah
Copy link
Contributor

I started to work on it already. The PR will be up soon

@ishpaul777
Copy link
Contributor

Sorry to say, but the proposal seems incomplete and I think the general guidelines is to share a Plan of action for a fair selection. But thats ok if you started working on a fix. Good luck!

@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Daily KSv2 labels Jan 10, 2024
@jjcoffee
Copy link
Contributor

@blimpich Sorry, I should've clarified this earlier but I realised I'm not 100% on the expected result. Should a new issue be created for any job that's part of the Process new code merged to main workflow failing? I see that we already have a Slack notification for the build job, for example:

- name: Announce failed workflow in Slack
if: ${{ needs.typecheck.result == 'failure' || needs.lint.result == 'failure' || needs.test.result == 'failure' }}

But no notification for the E2E tests, which seems to be the main issue here from the examples you've linked to.

@rayane-djouah
Copy link
Contributor

@blimpich, yes, it will create a single issue for the e2ePerformanceTests job failure because both of the failed jobs (Build apk from latest release as a baseline and Build apk from delta ref) are part of the e2ePerformanceTests job in the Process new code merged to main workflow. Do you think we should create separate issues for them?

@blimpich
Copy link
Contributor Author

No I like the way it currently works, just confirming 🙂👍

@melvin-bot melvin-bot bot added Monthly KSv2 and removed Weekly KSv2 labels Feb 5, 2024
Copy link

melvin-bot bot commented Feb 5, 2024

This issue has not been updated in over 15 days. @pecanoro, @jjcoffee, @muttmuure, @rayane-djouah eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

@rayane-djouah
Copy link
Contributor

updating PR by EOD

@muttmuure
Copy link
Contributor

I think we're just waiting for the above PR to pass the regression period? Or is there more to do here? @blimpich

@pecanoro
Copy link
Contributor

pecanoro commented Feb 9, 2024

The issue will get updated when the PR gets deployed to prod, it's still on staging

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Feb 12, 2024
@melvin-bot melvin-bot bot changed the title [$500] Create Automation for when main fails in /App [HOLD for payment 2024-02-19] [$500] Create Automation for when main fails in /App Feb 12, 2024
Copy link

melvin-bot bot commented Feb 12, 2024

Reviewing label has been removed, please complete the "BugZero Checklist".

@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label Feb 12, 2024
Copy link

melvin-bot bot commented Feb 12, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.4.39-8 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-02-19. 🎊

For reference, here are some details about the assignees on this issue:

@jjcoffee
Copy link
Contributor

@muttmuure Friendly bump for payment 🙇

@blimpich
Copy link
Contributor Author

Just wanted to say I say this working for the first time in the wild today (link). Nice job!! 🙂

@melvin-bot melvin-bot bot added the Overdue label Feb 22, 2024
Copy link

melvin-bot bot commented Feb 23, 2024

@pecanoro, @jjcoffee, @muttmuure, @rayane-djouah Whoops! This issue is 2 days overdue. Let's get this updated quick!

@muttmuure
Copy link
Contributor

Everyone is paid up

@melvin-bot melvin-bot bot removed the Overdue label Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Daily KSv2 External Added to denote the issue can be worked on by a contributor Improvement Item broken or needs improvement.
Projects
None yet
Development

No branches or pull requests

6 participants