Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2023-01-09] [Tracking] Add performance CI tests #11711

Closed
roryabraham opened this issue Oct 10, 2022 · 55 comments
Closed

[HOLD for payment 2023-01-09] [Tracking] Add performance CI tests #11711

roryabraham opened this issue Oct 10, 2022 · 55 comments
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Engineering NewFeature Something to build that is a new item. Reviewing Has a PR in review Weekly KSv2

Comments

@roryabraham
Copy link
Contributor

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

Open a pull request in NewDot.

Expected Result:

A suite of performance tests should run, so that we know before a PR is merged, if it creates a statistically significant performance regression.

Actual Result:

We don't have performance CI tests

Workaround:

Just don't break stuff™

Platform:

n/a

View all open jobs on GitHub

@roryabraham roryabraham added Engineering Daily KSv2 NewFeature Something to build that is a new item. labels Oct 10, 2022
@roryabraham
Copy link
Contributor Author

cc @hannojg Can you please comment on this so I can assign it to you

@hannojg
Copy link
Contributor

hannojg commented Oct 11, 2022

Yes 👋

@melvin-bot
Copy link

melvin-bot bot commented Oct 14, 2022

@AndrewGable, @hannojg Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

@AndrewGable AndrewGable added the Reviewing Has a PR in review label Oct 17, 2022
@melvin-bot melvin-bot bot removed the Overdue label Oct 17, 2022
@AndrewGable
Copy link
Contributor

PR in review

@AndrewGable
Copy link
Contributor

PR was merged, however, we had a lot of testing to confirm a few questions:

  1. Does the PR work?
  2. Can we get the "Large Runners" working?
  3. Do the "Large Runners" speed things up?
  4. Do the "Large Runners" have KVM support to allow hardware acceleration?

@AndrewGable
Copy link
Contributor

After testing via this PR:

  1. PR works, but required some small edits I made here: [No QA]Remove baseline branch as it doesn't exist on remote #11936
  2. Large runners work!
  3. Large runners seem to speed up most things, but not the tests.
  4. Large runners do not seem to have kvm support out of the box, but @hannojg might be able to enable them. We will discuss further.

@JmillsExpensify JmillsExpensify changed the title Add performance CI tests [Tracking] Add performance CI tests Oct 19, 2022
@AndrewGable
Copy link
Contributor

Problem: We are trying to configure cloud machines to run an emulator very quickly to run performance tests. However:

Emulators can be fast, but require hardware acceleration (HAXM on Mac & Windows, KVM on Linux) from the host. This presents a challenge on CI especially when running hardware accelerated emulators within a docker container, because Nested Virtualization must be supported by the host VM which isn't the case for most cloud-based CI providers due to infrastructural limits. (source)

Solution: Skip running the tests on emulators on CI machines and run the tests on physical devices via AWS Device Farm.

I'm still validating this idea is possible, but I've confirmed we can at least start the tests via AWS Device Farm. I think the real question will be if they are fast enough to get reliable results.

Screen Shot 2022-10-19 at 7 27 29 PM

Once we prove the POC, we can hook it up to realm/aws-devicefarm which provides an easy to use GitHub Actions workflow.

@AndrewGable
Copy link
Contributor

Discussing in Slack with @hannojg - But I was unable to get the custom node http server working on the aws device farm machines. I think looking to see if Appium can support our tests might be the best next move since they are supported out of the box by AWS device farm.

@AndrewGable AndrewGable added Weekly KSv2 and removed Reviewing Has a PR in review Daily KSv2 labels Oct 21, 2022
@AndrewGable
Copy link
Contributor

I've reached out to AWS Device Farm support via a support case, I have chatted with their support on the phone and provided code details. They are circling back internally and I will let you know when I hear back from them.

@hannojg
Copy link
Contributor

hannojg commented Oct 30, 2022

My guess after trying hard yesterday to make our simple HTTP server impl working is that it would work with a "private device" for which we can setup a VPC https://docs.aws.amazon.com/devicefarm/latest/developerguide/amazon-vpc-endpoints.html.

However, that might not be feasible as setup, I am currently migrating to appium.

@AndrewGable
Copy link
Contributor

@hannojg and I are getting on a call with AWS device farm team today

@AndrewGable
Copy link
Contributor

Update from our call with AWS Device farm:

  1. They gave us a code snippet that enables us to run the Android tests as we'd like 🎉 I am working on getting the tests working as originally written now. If I cannot get them to work today, I will hand this off to @hannojg for tomorrow.
  2. AWS Device farm let us know however that the way we are running these tests will not be able to be run on iOS devices. This is because iOS restricts the ability to manipulate the network on iOS devices starting on iOS 12 (very old OS version).
  3. I would think we can start with Android devices as they tend to be slower, than if we want to build this out we can look at their recommended solution (VPC).

@AndrewGable
Copy link
Contributor

We had our first successful run via AWS device farm for Android e2e tests 🎉 Now we just need to improve and tweak things.

❇️  Performance comparison results:

➡️  Significant changes to duration

➡️  Meaningless changes to duration
 - TTI: 1560.9 ms → 1605.7 ms (+44.8 ms, +2.9%) 
 - runJsBundle: 232.0 ms → 241.4 ms (+9.4 ms, +4.0%) 
 - regularAppStart: 0.0 ms → 0.0 ms (+0.0 ms, +8.5%) 
 - nativeLaunch: 16.7 ms → 15.9 ms (-0.8 ms, -4.7%) 

@AndrewGable
Copy link
Contributor

AndrewGable commented Nov 3, 2022

Here is our working list:


I've done 1 & 2 on this branch/fork: https://github.com/AndrewGable/App/pull/1

I think the last part of this will be figuring out how to configure AWS to report back the results. I believe this can be done using the "artifacts" as mentioned here: https://github.com/realm/aws-devicefarm#download-artifacts

@AndrewGable
Copy link
Contributor

Here is an example of a passing test run: https://github.com/AndrewGable/App/actions/runs/3382629912/jobs/5617747052

It took 45 minutes. I think we can definitely figure out how to get this working more efficiently, but we can focus on that after step 3 & 4 are done.

@AndrewGable
Copy link
Contributor

@rafecolton enabled the large runners again for us, seeing how fast the tests are now 🚀

@AndrewGable
Copy link
Contributor

Large runner results:

  1. First APK took ~9 minutes (compared to ~17 minutes to build)
  2. Second APK took ~4 minutes (compared to ~7 minutes to build)
  3. Tests take about ~14 minutes (compared to ~15 minutes)

Total time ~30 minutes vs ~45 minutes

@JmillsExpensify
Copy link

Nice improvement!

@AndrewGable
Copy link
Contributor

I still worry this is too long to roll out in it's current iteration, our current tests run in under 5 minutes. I don't think we can expect to run these tests in that time, but I think we need to get them under a certain amount of time or increase their value.

@JmillsExpensify
Copy link

Totally. So what do you think that number is out of curiosity? 10? 15? Something else?

@AndrewGable
Copy link
Contributor

If we are going to run it on every commit, I think ~10 minutes would be OK. I don't know if it's realistic we will ever be able to speed these tests up that much so I think we might have to do something "out of the box".

One idea is we could run these tests on merge, then report the results back to the PR. If they introduce a performance regression then we could revert the PR, or something similar. Otherwise we could run these via a label, or only on large PRs, etc. I don't have an exact proposed solution yet, but curious for thoughts from @Expensify/mobile-deployers @marcaaron ?

@JmillsExpensify
Copy link

One idea is we could run these tests on merge, then report the results back to the PR. If they introduce a performance regression then we could revert the PR, or something similar.

I would love to have the most coverage possible, so this is an ideal place to start. Given that time between merge and QA is most cases, the 10 min "delay" also feels OK. In any case, not the real decision maker in this department. 😄

@AndrewGable
Copy link
Contributor

Maybe a more refined proposal would be:

  1. Run any long running tests after merge in the "Process new code merged to main" workflow
  2. When the perf tests finish, add a comment to the PR mentioning the results
  3. If there is a regression found ("Significant Changes To Duration") then add the deploy blocker label to the PR with a comment like "There was a performance regression found on this PR, please investigate with priority"
  4. This will be added to the deploy checklist and we will not deploy until it's resolved

@roryabraham
Copy link
Contributor Author

Sounds good @AndrewGable, but can we please make sure that, unlike these post-merge test and lint jobs we already have, the long-running tests do not block the deploy until they finish? i.e: assume that PRs do not introduce a performance regression, deploy them to staging, and then only if they do not pass post a comment and mark them as a deploy blocker

@AndrewGable
Copy link
Contributor

End of week update: We have a workable solution here, but still talking with AWS to juice up the device farm side of things.

@AndrewGable
Copy link
Contributor

I am attempting to split the e2e tests up from the rest of the source code to reduce the size of the ZIP we send to the Android phone via AWS. However, I have been blocked today by a broken Android build.

@AndrewGable
Copy link
Contributor

I was able to squeeze some more performance improvements out of the test, total time is now ~23 minutes:

  1. First APK took ~9 minutes (compared to ~17 minutes to build)
  2. Second APK took ~4 minutes (compared to ~7 minutes to build)
  3. Tests take about ~9 minutes (compared to ~15 minutes)

I will work on the final piece which is running this in the "Process new code merged to main" workflow.

@marcaaron
Copy link
Contributor

One idea is we could run these tests on merge, then report the results back to the PR. If they introduce a performance regression then we could revert the PR, or something similar.

Chiming in late, but this seems reasonable to me. Another option could be to require the test in CI but manually trigger it with a comment like "I am ready for my performance test now". Maybe that is too manual 😂

@AndrewGable
Copy link
Contributor

Running into an issue with linting as described here: SchemaStore/schemastore#2579

@melvin-bot melvin-bot bot added the Overdue label Nov 24, 2022
@AndrewGable
Copy link
Contributor

AndrewGable commented Nov 29, 2022

Trying to finish this right now, next steps are:

  • Call from external workflow (preDeploy.yml)
  • Leave the deploy blocker label when there is Significant Changes To Duration

@AndrewGable
Copy link
Contributor

PR finally in review 😮‍💨

@AndrewGable AndrewGable added the Reviewing Has a PR in review label Nov 29, 2022
@JmillsExpensify
Copy link

Woo! P.S. I didn't cover this in the regular open source update today but was going to profile it in the margelo room tomorrow.

@AndrewGable
Copy link
Contributor

Nice, confirmed these are now running on merge into main: #13570 (comment)

@AndrewGable
Copy link
Contributor

These are running, there are some further separate improvements to make (e.g. adding more tests and mocking API responses), but I am going to consider this one done for now.

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Jan 2, 2023
@melvin-bot melvin-bot bot changed the title [Tracking] Add performance CI tests [HOLD for payment 2023-01-09] [Tracking] Add performance CI tests Jan 2, 2023
@melvin-bot
Copy link

melvin-bot bot commented Jan 2, 2023

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.2.46-0 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2023-01-09. 🎊

After the hold period, please check if any of the following need payment for this issue, and if so check them off after paying:

  • External issue reporter
  • Contributor that fixed the issue
  • Contributor+ that helped on the issue and/or PR

As a reminder, here are the bonuses/penalties that should be applied for any External issue:

  • Merged PR within 3 business days of assignment - 50% bonus
  • Merged PR more than 9 business days after assignment - 50% penalty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Engineering NewFeature Something to build that is a new item. Reviewing Has a PR in review Weekly KSv2
Projects
None yet
Development

No branches or pull requests

5 participants