Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement --shard option #12546

Merged
merged 54 commits into from
Mar 6, 2022
Merged

feat: implement --shard option #12546

merged 54 commits into from
Mar 6, 2022

Conversation

marionebl
Copy link
Contributor

@marionebl marionebl commented Mar 4, 2022

Summary

  • Add a --shard=n/m option, allowing for parallel execution on multiple machines
  • Use new option in jest's Github workflows
  • Use new option in jest's CircleCI jobs
  • More context in Parallelizing across machines in CI #6270

Fixes #6270

Test plan

@marionebl marionebl force-pushed the 6270 branch 2 times, most recently from 7934540 to 3052d3b Compare March 4, 2022 05:34
Copy link
Member

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super exciting!

can you add an integration test using this?

and a changelog entry 🙂

packages/jest-config/src/normalize.ts Outdated Show resolved Hide resolved
packages/jest-config/src/normalize.ts Outdated Show resolved Hide resolved
packages/jest-core/src/runJest.ts Outdated Show resolved Hide resolved
packages/jest-test-sequencer/src/index.ts Outdated Show resolved Hide resolved
Co-authored-by: Simen Bekkhus <sbekkhus91@gmail.com>
@marionebl
Copy link
Contributor Author

marionebl commented Mar 4, 2022

Todo

  • Changelog
  • Integration test
  • Validate in normalize.ts
  • Throw if testSequencer.shard is not defined
  • Remove .shard default parameter

@marionebl
Copy link
Contributor Author

@SimenB I applied all requested changes and changed the CircleCI config to make use of the new feature. Let me know :)

Copy link
Contributor

@nickofthyme nickofthyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, this is incredibly thorough and quick!

From a quick review, it appears that this only shards the spec files themselves and not the tests within the suites. This implementation would be better than nothing, but to me the true solution would be to shard the total test count between machines. This is how the --shard option works in @playwright/test for single or many test suites.

# shards all tests within a single test file
npx playwright test --shard=1/3 tests/one.test.ts
# shards all tests across 2 test files
npx playwright test --shard=1/3 tests/one.test.ts tests/two.test.ts
# shards all tests across all project test files
npx playwright test --shard=1/3

So if you had 2 test files one with 20 tests and the other with 80, both commands below would run 50 tests, never the same test twice.

 npx playwright test --shard=1/2 # 50 tests run
 npx playwright test --shard=2/2 # 50 tests run

I imagine most jest setups have good distribution and separation of test files such that the test suite sharding solution would still be incredibly useful. I'm not aware if the jest runner setup is suitable for sharding on a per-test basis (i.e. running a consistent subset of tests for for a single test file/suite), maybe a jest maintainer can opine.

I'm just a spectator so feel free to ignore my thoughts. 👍🏼

@marionebl
Copy link
Contributor Author

This implementation would be better than nothing, but to me the true solution would be to shard the total test count between machines.

Good shout! I've toyed with the concept and discarded if after initial tests in a code base I maintain. My observations:

  • For Jest transpilation cost is proportional to the amount of code under test
  • For Playwright transpilation cost is proportional to the amount of test code

This assumes we need full blown transpilation before listing test cases which I'm not 100% confident about maybe @SimenB can chime in.

@marionebl marionebl requested a review from SimenB March 4, 2022 23:58
@SimenB
Copy link
Member

SimenB commented Mar 5, 2022

Yeah, we don't want to execute the test files before splitting up as that would require transpilation as mentioned, but also actually running the test file (but not the describes and its) which will cause imports etc. negating much, if not all, of the benefit. That just doesn't fit with Jest's model. The approach made in this PR is the correct one.

@SimenB
Copy link
Member

SimenB commented Mar 5, 2022

One thing to explore for this repo is to shard in a way that mostly splits off e2e/** to separate runs as those are the slow ones

image

So we should probably, funnily enough, implement our of test scheduler and not use the default one. That said of course, the gains here are already quite massive 😅

Copy link
Member

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the numbers reported from CI makes me quite excited about this! 😀

.circleci/config.yml Outdated Show resolved Hide resolved
.github/workflows/nodejs.yml Outdated Show resolved Hide resolved
docs/CLI.md Outdated Show resolved Hide resolved
docs/CLI.md Show resolved Hide resolved
e2e/__tests__/shard.test.ts Show resolved Hide resolved
packages/jest-cli/src/cli/args.ts Outdated Show resolved Hide resolved
packages/jest-config/src/parseShardPair.ts Show resolved Hide resolved
packages/jest-core/src/runJest.ts Outdated Show resolved Hide resolved
packages/jest-core/src/runJest.ts Outdated Show resolved Hide resolved
const shardEnd = shardSize * options.shardIndex;

return [...tests]
.sort((a, b) => (a.path > b.path ? 1 : -1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be more clever here. Thoughts on calling await this.sort(tests) and then splitting off from there? and then using index % options.shardIndex === 0 or something (not that exactly since it won't work 😅) since sort by default tries to schedule slowest/failing/largest first.

Might be overkill, dunno

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this one a bit - my team will use a jump consistent hash (https://arxiv.org/pdf/1406.2294.pdf) based on the filename, sort based on that and then apply modulo. That should make for nicely distributed and very stable shards.

In a later iteration I believe we could try and balance the shards to contain a similar amount of predicted test run time based on historical data, not sure yet what that would like yet though 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR If you don't object I'll give a smarter shard algorithm a crack; thoughts on pulling in a jump consistent hashing lib as dependency vs. having it in source?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a go! Might be best to do that in a separate PR though, just to not block this one 🙂 I'm very interested in seeing that happen, though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added simple hashing of the test path, results are encouraging. The distribution of e2e test went from

108
102
0
0

to

51
48
62
49

Command used:

λ yarn jest --listTests --shard='1/4' | grep e2e | wc -l && yarn jest --listTests --shard='2/4' | grep e2e | wc -l && yarn jest --listTests --shard='3/4' | grep e2e | wc -l && yarn jest --listTests --shard='4/4' | grep e2e | wc -l

For CI this means:

Screen Shot 2022-03-06 at 12 02 11 am

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, this is awesome! 8 minutes CI is a dream 😀

marionebl and others added 4 commits March 5, 2022 20:29
Co-authored-by: Simen Bekkhus <sbekkhus91@gmail.com>
Co-authored-by: Simen Bekkhus <sbekkhus91@gmail.com>
Comment on lines 107 to 110
const relativeTestPath = path.relative(
test.context.config.rootDir,
test.path,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a change in the logic to make the sharding stable between machines with different paths (test.path is an absolute path, so the hash differs between machines). Not really an issue except that we test the sharding in this repo, so we need to be stable 🙂

@SimenB
Copy link
Member

SimenB commented Mar 5, 2022

image

4 minute CI, gotta love it

EDIT: or even 3 😅

image

@SimenB SimenB changed the title Implement shard option #6270 Implement shard option Mar 5, 2022
@SimenB
Copy link
Member

SimenB commented Mar 5, 2022

Huh, test is failing on windows node@14, but not 12, 16 or 17...

test,
};
})
.sort((a, b) => a.hash.localeCompare(b.hash))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will produce inconsistent results across machines with different locale settings - maybe forcing it to a specific locale would be the best option here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would have thought that didn't impact file paths, but I've been surprised by FS behaviour before 😛

@marionebl
Copy link
Contributor Author

Test failures:


  • 4 seem to be a recurring timeout issue that only affects macOS. (typescript › reads config)
  • 1 needs investigation but I can't see how it relates to the changes at hand (notify › does not report)
  • 1 seems to be related, possibly caused by localCompare differences (--shard=1/2 custom sharding test sequencer)

@SimenB
Copy link
Member

SimenB commented Mar 6, 2022

This time it failed with jest jasmine, not circus... something is off. I only have access to node 12 on windows, but I'll try dig

@SimenB SimenB changed the title Implement shard option feat: implement --shard option Mar 6, 2022
@SimenB SimenB merged commit 54eab57 into jestjs:main Mar 6, 2022
@SimenB
Copy link
Member

SimenB commented Mar 6, 2022

Thank you again @marionebl!

@marionebl
Copy link
Contributor Author

Thank you for getting it across the line! 🎉

@therynamo
Copy link

QQ for you both @SimenB @marionebl - when running coverage with shards there is a discrepancy because coverage doesn't account for the tests being sliced. So the coverage report comes back with a lot of failed coverage. Is there a way to use shards with coverage right now? Or should I raise an issue that can be worked on?

@llwt
Copy link

llwt commented Mar 15, 2022

@therynamo This package is old, but still does the trick for me https://www.npmjs.com/package/istanbul-combine

Istanbul report supports it natively now: jamestalmage/istanbul-combine#2

istanbul report --dir /path/to/coverage/dir --include **/*coverage.json json

@therynamo
Copy link

@llwt - woah! Nice! So I just need to make sure that jest allows for the --include flag on --coverage. If it isn't there - that'd likely be an easy add to the CLI/Config options.

Wow - great find! Thank you for the reply too 👍

@SimenB
Copy link
Member

SimenB commented Mar 15, 2022

You'll need to merge the reports manually, yeah. Or use a CI/reporter that does it for you (e.g. coveralls)

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelizing across machines in CI
6 participants