-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] Remove karma #16263
[test] Remove karma #16263
Conversation
(On a side note, browser support for Jest is the most requested feature of the project.) |
5987776
to
19dec9e
Compare
No bundle size changes comparing 4909bd8...19dec9e |
@eps1lon Thanks for linking the Facebook link. When I read this section:
I'm wondering, how will we cover browser testing? I could think of adding Sentry on the documentation website, but most of our traffic comes from Chrome. Ultimately it will be GitHub issues and release reactivity from our part?
I'm confused. Aren't the interaction tests the ones that are the flakiest? So is this about replacing karma with cypress then? If the pain point is with flaky tests, I can think of a simple approach: automatic retries. We were facing this problem at Doctolib. hundreds of tests, a lot of flaky ones. In the first place, we have tried to make the test less flaky, but it was very time-consuming. We have fixed quite some flaky tests. But at some point, we introduced an auto-retry logic (while monitoring the tests that are retried the most often). It was the perfect soluting we needed at Doctolib. Is this something we can reproduce here? I only know two cases of flaky browser tests, we could apply the following fixes:
|
cc @nathanmarks (who first introduced karma) |
The whole point of all this data is that we dont do browser integration tests at the moment. karma does not test anything jsdom doesnt test. Yes I am working on tests with cypress but those would be new tests. If your takeaway here is that we loose some testing then you havent looked at the data |
I think I have been helped by karma twice most of the time it failed due to different css syntax or flaky tests. I think at the very least we could trial not having them. Nothing stops us from adding them back if we notice a regression. |
Yeah if I say it doesnt do anything I mean the number of true positives is near zero. Compared with a high number of false positives it really hurts productivity |
I'm not challenging what we will gain. I'm challenging that cypress will be better than some of the other options we can try at a cheaper cost:
@dmtrKovalenko works for Cypress, I'm sure you can provide his perspective (biased of course :). |
From my experience solving flaky tests at Doctolib, the problems were 50% of the time coming from tests not written correctly and 50% wrong core logic. Maybe investigating the flaky tests would be a valid direction too? I have seen one recently on a Menu timeout. |
I need to write it down at some point that this distinction makes no sense for a component library. Every component is a unit. There is not a single integration test here. Just unit tests with different APIs.
The inherent issue with these is that they are designed for jsdom not an actual DOM. They are not fit for browser test because in a browser environment we don't mock events. We use the browser API for that.
Unsourced and therefore ignored from this argument. |
Also: All of these still have karma in mind so the original statement seems still not accepted: Karma's value is dwarfed by the reduction in productivity. The cost is multiple orders of magnitude higher than the value gained. That is what I'm expressing in this PR. Not that there are other options we can explore. If we don't acknowledge this issue we don't have to explore other solutions. |
I am working for Cypress only because it is the only tool, (really only) that allows not flaky tests for browser. But anyway Cypress is really good choise, Especially because of the entierly free plan for open-source. With included unlimited parallelization and test recordings. |
@dmtrKovalenko Thanks for the feedback! |
Coming back to the pain points mentionned in the description of the pull request:
If I read your methodology correctly, we have a flakiness rate of +2%. We would like to go below 1%. If we agree, then I think that we should categories where these flaky tests come from. Having this data, we can then prioritize. If 80% of the flakiness comes from browserstack. We can kill it, problem solved. If 80% if the flakiness comes from the Menu or SwipeableDrawer tests, we can auto retry them. Problem solved.
I believe that removing browser stack would yield 80% of the value. At Doctolib, we killed it for this reason. It was initially introduced for screenrecording, but that was useless.
Will cypress be less code to maintain? |
50% of the evaluated test failures were caused by flaky test.
And what is the purpose of karma at that point? It's just running tests twice at that point.
No data, no rationale provided. Also not part of this argument.
It will further reduce CI times only worsening the problem. It's not as easy as you think it is.
It's starting again. This is not the scope of the PR. Please don't move the goal post and stay on topic. I urge you again don't put your opinion on a pedestal and expect everyone else to come to you. What actual value does the karma test provide? And not just your believes or feelings but actual hard data. Do the extra work please and not just discredit the work of others. |
So over the last 6000 pull requests, we had 129 failures, and 50% of the failures were because of flakiness. So the current "flakiness rate" is 129 / 2 / 6000 = 1.075%. What number should we target? Is 1% a problem?
It provides the value that you have highlighted in the pull request description: "Actual interactions should be tested in the browsers" (Chrome headless, thanks to karma).
Let's breakdown the "Tests real browsers" test. It takes 20s to build the assets. The ChromeHeadless takes about 10s. The BrowserStack tests take about 1 minute. Assuming there is another 10s of misc CPU done. The "Tests real browsers" task takes about 20 + 60 + 10 = 90s.
Will it reduce CI times 1% of the time? Why will it worsen the problem?
I think that we should have an holistic view of the problems we solve. I believe that how we test interactions in the browsers is key to the issue, hence we should consider Cypress part of the solution.
As long as we carefully consider them, I'm happy. |
I would also add Jest as part of the equation. I believe that Karma is preventing us from using it. It's worth mentioning. |
Alright different proposal: jsdom for "unit" tests i.e. the tests that live next to the component, karma for integration tests when jsdom is insufficient. It was a common pattern that I observed when I evaluated the failures: Some assertion was made based on jsdom implementations that don't hold up in the browser. Basically let test:unit only run |
@eps1lon Do you have an example of this distinction between a unit and integration test? |
I am sorry to getting into your discussion. But. I am sure that for the components library - any test that runs in the browser will be This is kinda philosophical theory - the distinction between test types :D |
I'm not a fan of this distinction for a component library. Because components are already a unit. Or the whole app if you think of it in terms of public vs. private. My main concern is more that anything that heavily relies on jsdom should only run in the browser. Especially anything related to event dispatching tends be very far removed from what's actually happening in the browser. I'd probably go with render1 output => jsdom tests, interactions => browser. And for browser test you can make some trade offs if you don't have the infra for actual e2e tests e.g. dispatch click events or keydown events. 1 Basically markdown only. Styles support is very limited in jsdom.
As soon as you're using imperative DOM APIs to act you're not doing e2e tests. The user isn't using the DOM API but the browsers. For example anytime you're writing a test where you dispatch a change event you no longer have a full picture of what's happening. In an actual e2e test you would click into the textfield and start typing. |
From my perspective, we shouldn't make any distinction, we shouldn't make an empirical distinction between when a test should run in a browser or not. I think that the simplest option is to run everything in the browser. Our +2000 tests run in 10s in Chrome. It's nothing, we had 65 minutes of e2e tests at Doctolib, tests we had to split between 15 parallel workers. |
We could do that but we still only have the jsdom. Until something better comes along what's important in the browser will always be brittle. The rest is redundant. Still missing an argument for browser test. It's never been made theoretically nor empirically. |
@eps1lon They shouldn't be any argument for browser test, we need them, it's not an option. The argument should be on: Is JSDOM good enough to approximate browser tests? Not always but does it worth it the tradeoff? From my perspective, running the browser tests is a low hanging fruit. It doesn't cost much (<1% rate of flaky test right now, + the potential to improve it with retries or by dropping). |
Says who? |
Our users, using the components in a browser? |
I’ve been thinking about the browser test situation, I think we should probably keep them. Stuff like style specific testing should probably only be tested on jsdom or removed and the rest tested on actual browsers. As a UI library it makes sense to have browser tests. |
There is probably a better test architecture that could deploy. There is no question that by investing more time, we could improve our tests. |
Thanks, guys for the discussion! |
There was none. There was evidence from me and some claims from other people that weren't justified. Dangerous precedent but maybe I can do the same. |
test_browser
is looks nice in theory but in practice we don't get any value from it. Even facebook does not run unit tests in the browser (hence why jest isn't supported in the browser). If a website with over a billion users is still functional without unit tests in the browser I'd say we can live without it as well.Removing it reduces the number of false positive failures, CI runtime in general and just less code we have to maintain. All of this should improve productivity. Just the commit messages in the evaluated PRs speak for themself ("fix test (hopefully)" or "give up on test").
when does a test suite provide value?
The test suite has value if it finds bugs that other tests didn't find.
Methodology:
Results
In the last 6_000 PRs 129 had different results for test_browser and test_unit. I evaluated the 17 most recent of those 129 and found only 1 actual issue. Most of them were either flaky (restart fixed it) or the test needed to be implemented differently. I can evaluate more if the removal is contested.
Actual issue (react-testing-library):
react-testing-library was introduced after this PR. Can't use this as a pro for karma if you're against rtl.
Actual issue:
Indeterminate (e.g. force-push?)
Fixed by changing test:
Test skipped:
Flaky:
Not evaluated (yet; in those PRs docs:api is still part of test_browser):
Notes
With our methodology we will not find all builds with flaky test_browser because they are usually restarted manually from the CircleCI dashboard. Unless we can query failed CircleCI builds we have to sift PRs.
It also does not catch cancelled test_browser where
test:karma
failed but the job couldn't finish (should be very rare)./cc @mui-org/core-contributors
Future work
Actual interactions should be tested in the browsers (e.g. keyboard a11y). I'm looking into cypress to test these more advanced scenarios.
TODO:
test_browser
from required GitHub checks