Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roach{prod,test}: build roach{prod,test} from the SHA being tested #51897

Closed
irfansharif opened this issue Jul 25, 2020 · 4 comments
Closed

roach{prod,test}: build roach{prod,test} from the SHA being tested #51897

irfansharif opened this issue Jul 25, 2020 · 4 comments
Assignees
Labels
A-build-system C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-wishlist A wishlist feature.

Comments

@irfansharif
Copy link
Contributor

In our CI, roach{prod,test} is built from HEAD and run against all our major release branches. It's also run from HEAD when testing out our alpha releases. #51535 (comment) is an example of what can go wrong with this approach.

The roachtest code running here is from master, and landed alongside changes that introduced this membership column (#50329). In this SHA, that we're attempting to cut a release from, we didn't pick up the CRDB changes, but we are running it using roachtest code on master, which is expecting to find this column.

This "temporary mismatch" between roachprod code and CRDB code, where roachprod expects to find certain commits as part of the 20.2 release and does not find them because of when the alpha was cut doesn't seem like something roachprod should be taught to handle. If anything, we should start considering running roachprod from the SHA being tested. Asking around, we've been running it from a version built off of HEAD because roach{prod,test} used to change quite a lot, which I don't think is any longer the case. In fact, making roachprod compatible across multiple versions is increasingly cumbersome and fragile, not to mention largely untested.

If we don't run our unit tests from HEAD, we probably shouldn't do the same for our integration tests. Admittedly doing so forces us to think about compatibility issues, but I'm not sure that's enough of a reason to do such a thing.

@irfansharif irfansharif added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-build-system C-wishlist A wishlist feature. labels Jul 25, 2020
@petermattis
Copy link
Collaborator

I'm thumbs-up on making this change, though we need to be quite careful in doing so. We haven't historically backported changes/fixes to roachtests to release branches. So we'll need to make sure to backport current master roachtests to the releases we're still making patch releases for (19.1, 19.2, and 20.1).

irfansharif added a commit to irfansharif/cockroach that referenced this issue Jul 28, 2020
Fixes cockroachdb#51965 (and all referencing issues).

Roachprod clusters running v20.1+ crdb nodes persist this
`cluster-bootstrapped` file on disk after explicitly bootstrapping the
cluster. Roachprod then uses the existence of this file to avoid doubly
bootstrapping the cluster.

Given cockroachdb#51897 remains unresolved, master-built roachprod is used to run
roachtests against the 20.1 branch. Some of those roachtests test
mixed-version clusters that start off at 19.2. Consequently, we manually
add this file where roachprod expects to find it for already-initialized
clusters. (This is a pretty gross hack, that we should address by
addressing cockroachdb#51897.)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jul 28, 2020
Fixes cockroachdb#51965 (and all referencing issues).

Roachprod clusters running v20.1+ crdb nodes persist this
`cluster-bootstrapped` file on disk after explicitly bootstrapping the
cluster. Roachprod then uses the existence of this file to avoid doubly
bootstrapping the cluster.

Given cockroachdb#51897 remains unresolved, master-built roachprod is used to run
roachtests against the 20.1 branch. Some of those roachtests test
mixed-version clusters that start off at 19.2. Consequently, we manually
add this file where roachprod expects to find it for already-initialized
clusters. (This is a pretty gross hack, that we should address by
addressing cockroachdb#51897.)

Release note: None
craig bot pushed a commit that referenced this issue Jul 29, 2020
52040: roachtest: fix release-20.1 roachtests failing due to double-init r=RaduBerinde a=irfansharif

Fixes #51965 (and all referencing issues).

Roachprod clusters running v20.1+ crdb nodes persist this
`cluster-bootstrapped` file on disk after explicitly bootstrapping the
cluster. Roachprod then uses the existence of this file to avoid doubly
bootstrapping the cluster.

Given #51897 remains unresolved, master-built roachprod is used to run
roachtests against the 20.1 branch. Some of those roachtests test
mixed-version clusters that start off at 19.2. Consequently, we manually
add this file where roachprod expects to find it for already-initialized
clusters. (This is a pretty gross hack, that we should address by
addressing #51897.)

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
craig bot pushed a commit that referenced this issue Jul 29, 2020
52015: sql: use a structured error to detect roachpb.BatchTimestampBeforeGCE… r=ajwerner a=ajwerner

…rror

Fixes #50198.

See the issue for more details. This issue is so obscure it does not deserve
a release note.

Release note: None

52040: roachtest: fix release-20.1 roachtests failing due to double-init r=irfansharif a=irfansharif

Fixes #51965 (and all referencing issues).

Roachprod clusters running v20.1+ crdb nodes persist this
`cluster-bootstrapped` file on disk after explicitly bootstrapping the
cluster. Roachprod then uses the existence of this file to avoid doubly
bootstrapping the cluster.

Given #51897 remains unresolved, master-built roachprod is used to run
roachtests against the 20.1 branch. Some of those roachtests test
mixed-version clusters that start off at 19.2. Consequently, we manually
add this file where roachprod expects to find it for already-initialized
clusters. (This is a pretty gross hack, that we should address by
addressing #51897.)

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 4, 2020
Fixes cockroachdb#52181. This test doesn't make use of all the `roachtest start`
smartness to figure out what sub-command to use and instead constructs
the raw start command by hand.

Given our support policy, let's just bump the minimum version this test
expects to run. This feels like another instance where
cockroachdb#51897 would be nice to
have.

Release note: None
craig bot pushed a commit that referenced this issue Aug 4, 2020
51959: backupccl: distribute RESTORE work using DistSQL r=dt,yuzefovich a=pbardea

This commit replaces restore's old method of coordinating work on a
single node to instead use DistSQL. It creates a 2 stage DistSQL flow.
The first stage assigns chunks to SplitAndScatter processors which
forward spans that they scattered to the second stage made up of the
RestoreData processors which ingest the data. The SplitAndScatter
processors attempt to send the work to the RestoreData colocated with
the leaseholder of the span after the scattering.

Release note: None.

52269: roachtest: correctly start crdb in acceptance/rapid_restart r=andreimatei a=andreimatei

--start-single-node was needed. Without it, I think the test's killing
of a node raced with that node dieing by itself, and sometimes the race
resulted in `cockroach stop` first seeing the process but then
`/bin/bash: line 8: kill: (16016) - No such proces`

Fixes #52060

Release note: None

52273: roachtest: remove confusing "--- PASS" log lines r=andreimatei a=andreimatei

Two workloads were printing PASS/FAIL lines from inside the predicates
passed to Searcher.Search. This is really confusing when reading the
test's output, because they don't correspond to the test's disposition.

Release note: None

52343: roachtest: reflake (skip) disk-stalled test for release-19.1 r=irfansharif a=irfansharif

Fixes #52181. This test doesn't make use of all the `roachtest start`
smartness to figure out what sub-command to use and instead constructs
the raw start command by hand.

Given our support policy, let's just bump the minimum version this test
expects to run. This feels like another instance where #51897 would be 
nice to have.

Release note: None

Co-authored-by: Paul Bardea <pbardea@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@irfansharif
Copy link
Contributor Author

We should do the same for workload too. Would avoid failures like #53136.

@jlinder
Copy link
Collaborator

jlinder commented Oct 21, 2020

Slack thread about this subject.

@adityamaru is taking on this issue. Targeting making these changes post 20.2.0 being cut.

adityamaru added a commit to adityamaru/cockroach that referenced this issue Oct 23, 2020
This is in preparation for the work being done in cockroachdb#51897

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 26, 2020
This reverts commit 6b11918.

Fixes cockroachdb#55954.
Fixes cockroachdb#55953.
Fixes cockroachdb#55952.
Fixes cockroachdb#55951.
Fixes cockroachdb#55949.

In cockroachdb#30624, we split various column families up in the TPC-C workload. This
had the unintended consequence of breaking CDC roachtests, because CDC
does not support multiple column families per row:
```
CHANGEFEEDs are currently supported on tables with exactly 1 column family: warehouse has 2
```

This PR reverts that commit. I intend to come back to this and introduce
a `--families` flag to TPC-C (like we have to YCSB) that can be used to
selectively toggle column families on or off. However, I'll do this
after cockroachdb#51897 lands, because it's going to be more effort than it's worth
to conditionally set this flag based on whether the version of workload
used in the test supports it or not.
craig bot pushed a commit that referenced this issue Oct 26, 2020
55983: Revert "tpcc: use multiple column families to avoid contention" r=nvanbenschoten a=nvanbenschoten

This reverts commit 6b11918.

Fixes #55954.
Fixes #55953.
Fixes #55952.
Fixes #55951.
Fixes #55949.

In #30624, we split various column families up in the TPC-C workload. This
had the unintended consequence of breaking CDC roachtests, because CDC
does not support multiple column families per row:
```
CHANGEFEEDs are currently supported on tables with exactly 1 column family: warehouse has 2
```

This PR reverts that commit. I intend to come back to this and introduce
a `--families` flag to TPC-C (like we have to YCSB) that can be used to
selectively toggle column families on or off. However, I'll do this
after #51897 lands, because it's going to be more effort than it's worth
to conditionally set this flag based on whether the version of workload
used in the test supports it or not.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Oct 30, 2020
@irfansharif
Copy link
Contributor Author

@adityamaru, going to close this out now. Thanks for diligently chewing through this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-wishlist A wishlist feature.
Projects
None yet
Development

No branches or pull requests

4 participants