Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the default number of CPUs #1839

Merged
merged 15 commits into from
Nov 13, 2015
Merged

Use the default number of CPUs #1839

merged 15 commits into from
Nov 13, 2015

Conversation

tamird
Copy link
Contributor

@tamird tamird commented Jul 28, 2015

This reverts commit 30b6050.

Review on Reviewable

@@ -101,7 +101,7 @@ func TestClientGossip(t *testing.T) {
rpcContext := rpc.NewContext(&base.Context{Insecure: true}, lclock, stopper)
client.start(local, disconnected, rpcContext, stopper)

util.SucceedsWithin(t, 500*time.Millisecond, func() error {
util.SucceedsWithin(t, 5*time.Minute, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why that high?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was debugging, see second commit

@tamird
Copy link
Contributor Author

tamird commented Nov 13, 2015

OK, I think this once again ready for prime time. There are 5 tests being suppressed here.
I've run 10 builds with this in CI (results):

  • 6 successes
  • 2 TestTxnDBLostUpdateAnomaly failure (these tests are somewhat flaky by their nature)
  • 1 Too many goroutines running after tests failure in cli; looks like slow or bad shutdown
  • 1 TestRaftLogQueue failure (looks like it just took too long. can't repro locally after 4000 runs)

I think this is about as good as we're gonna get, assuming we're ok with the 5 suppressions.

@tschottdorf @bdarnell could you both please weigh in? everyone else is welcome to comment as well, of course.

@tbg
Copy link
Member

tbg commented Nov 13, 2015

I think we should merge them. Ultimately the "big ones" here have @bdarnell's name on it so he should weigh in, but according to @tamird all test errors are reproducible locally (with sufficient iteration), so we're not losing much by skipping the tests in CI as long as we prioritize the corresponding issues accordingly. Plus, the introduction of new flaky tests has pretty much kept up with the rate at which we can fix them, and @tamird has sisyphus'ed it for long enough.

@bdarnell
Copy link
Contributor

I agree we should go ahead and merge this even if we have to skip some tests, but four failures out of ten is still unacceptably high. Looks like TestTxnDBLostUpdateAnomaly should also be skipped, and maybe something in cli. I'd like to see this branch pass 9 of its last 10 builds before merging, even if that means expanding the list of skipped tests.

@tamird
Copy link
Contributor Author

tamird commented Nov 13, 2015

I don't think these are representative results, so I'm running 10 more builds (prior to this batch of 10, I had hardly seen TestTxnDBLostUpdateAnomaly fail, and had never seen the other two failures).

@tamird
Copy link
Contributor Author

tamird commented Nov 13, 2015

OK, 6 of those 10 failed:

  • 1 circleci: cache the docker-spy image #3025
  • 1 legit new race in gossip, fixed here
  • 4 other failures, all of which are now skipped (2 of these in the same run)
  • 1 timeout on Stopper.Quiesce in TestLogic, when TestLogic was panicking (after a call to t.Fatalf). This one is fixed by re-panicking in Stopper.Stop. The underlying Fatalf was unfortunately not printed.

10 more builds are now running.

@tamird
Copy link
Contributor Author

tamird commented Nov 13, 2015

10 for 10, merging

tamird added a commit that referenced this pull request Nov 13, 2015
@tamird tamird merged commit 613e2bd into cockroachdb:master Nov 13, 2015
@tamird tamird deleted the revert-revert-multicpu branch November 13, 2015 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants