-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl benchshort times out #59126
Comments
Logs suggest that the problematic benchmark |
This benchmark causes repeated CI failures. Informs cockroachdb#59126. Release note: None
While investigating cockroachdb#59126, I realized we were still going through the engine for listing the files. I rectified this and as a result, completely removed the in-mem implementation of the sideloaded storage that we had in place: instead of it, we simply use an in-mem engine, which now works like a charm thanks to not bypassing the engine for listing the file contents (`filepath.Glob` --> `eng.List`). While I was here, I added some defense in depth: it was unreasonable that `TruncateTo` was returning an error as a result of failing to perform best-effort cleanup on its directory. I am no longer seeing the spurious errors observed in cockroachdb#59126 with this change. This makes sense, as in this test the sideloaded storage is set up with an in-memory engine. The old code, which read from actual disk, would try to glob `auxiliary/...` (i.e. some path relative to the cwd - oops), which would return no results. As a result, the sideloaded storage would try to invoke `eng.Remove("auxiliary/...")` which the engine would refuse, since there would actually be (in-mem) files stored in the engine. This all resolves now that everyone is looking at the same files. It's possible that this fixes the issue with BenchmarkUserFile, as previously errors from TruncateTo could block the raft log queue, which in turn could at least slow down the test, but it's hard to say for sure. Touches cockroachdb#59126. Touches cockroachdb#31913. Release note: None
59028: coldata: fix Bytes invariant in some cases r=yuzefovich a=yuzefovich **execgen: remove SLICE method** `execgen.SLICE` directive is used only in one place, and we can use `execgen.WINDOW` there instead (which will have the same effect). Release note: None **coldata: fix updating offsets of bytes in Batch.SetLength** In `SetLength` method we are maintaining the invariant of `Bytes` vectors that the offsets are non-decreasing sequences. Previously, this was done incorrectly when a selection vector is present on the batch which could lead to out of bounds errors (caught by our panic-catcher) some time later. This is now fixed by correctly paying attention to the selection vector. I neither can easily come up with an example query that would trigger this condition nor can I prove that it won't occur, but I think we have seen a single sentry report that could be explained by this bug, so I think it's worth backporting. Additionally, this commit uses the assumption that the selection vectors are increasing sequences in order to calculate the largest index accessed by the batch. Fixes: #57297. Release note (bug fix): Previously, CockroachDB could encounter an internal error when executing queries with BYTES or STRING types via the vectorized engine in rare circumstances, and now this is fixed. 59127: importccl: skip BenchmarkUserFileImport r=RaduBerinde a=RaduBerinde This benchmark causes repeated CI failures. Informs #59126. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
While investigating cockroachdb#59126, I realized we were still going through the engine for listing the files. I rectified this and as a result, completely removed the in-mem implementation of the sideloaded storage that we had in place: instead of it, we simply use an in-mem engine, which now works like a charm thanks to not bypassing the engine for listing the file contents (`filepath.Glob` --> `eng.List`). While I was here, I added some defense in depth: it was unreasonable that `TruncateTo` was returning an error as a result of failing to perform best-effort cleanup on its directory. I am no longer seeing the spurious errors observed in cockroachdb#59126 with this change. This makes sense, as in this test the sideloaded storage is set up with an in-memory engine. The old code, which read from actual disk, would try to glob `auxiliary/...` (i.e. some path relative to the cwd - oops), which would return no results. As a result, the sideloaded storage would try to invoke `eng.Remove("auxiliary/...")` which the engine would refuse, since there would actually be (in-mem) files stored in the engine. This all resolves now that everyone is looking at the same files. It's possible that this fixes the issue with BenchmarkUserFile, as previously errors from TruncateTo could block the raft log queue, which in turn could at least slow down the test, but it's hard to say for sure. Touches cockroachdb#59126. Touches cockroachdb#31913. Release note: None
59103: tracing: link child into parent, even if not verbose r=irfansharif a=tbg Prior to this change, when a child was derived from a local parent, we would not register the child with the parent. In effect, this meant that any payloads in the child would not be collected. Release note: None 59134: kvserver: avoid engine bypass in sideload storage r=jbowens a=tbg While investigating #59126, I realized we were still going through the engine for listing the files. I rectified this and as a result, completely removed the in-mem implementation of the sideloaded storage that we had in place: instead of it, we simply use an in-mem engine, which now works like a charm thanks to not bypassing the engine for listing the file contents (`filepath.Glob` --> `eng.List`). While I was here, I added some defense in depth: it was unreasonable that `TruncateTo` was returning an error as a result of failing to perform best-effort cleanup on its directory. I am no longer seeing the spurious errors observed in #59126 with this change. Touches #59126. Touches #31913. Fixes #58948. Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
While investigating cockroachdb#59126, I realized we were still going through the engine for listing the files. I rectified this and as a result, completely removed the in-mem implementation of the sideloaded storage that we had in place: instead of it, we simply use an in-mem engine, which now works like a charm thanks to not bypassing the engine for listing the file contents (`filepath.Glob` --> `eng.List`). While I was here, I added some defense in depth: it was unreasonable that `TruncateTo` was returning an error as a result of failing to perform best-effort cleanup on its directory. I am no longer seeing the spurious errors observed in cockroachdb#59126 with this change. This makes sense, as in this test the sideloaded storage is set up with an in-memory engine. The old code, which read from actual disk, would try to glob `auxiliary/...` (i.e. some path relative to the cwd - oops), which would return no results. As a result, the sideloaded storage would try to invoke `eng.Remove("auxiliary/...")` which the engine would refuse, since there would actually be (in-mem) files stored in the engine. This all resolves now that everyone is looking at the same files. It's possible that this fixes the issue with BenchmarkUserFile, as previously errors from TruncateTo could block the raft log queue, which in turn could at least slow down the test, but it's hard to say for sure. Touches cockroachdb#59126. Touches cockroachdb#31913. Release note: None
I've gotten this too, but in normal unit tests... https://teamcity.cockroachdb.com/viewLog.html?buildId=2602625&buildTypeId=Cockroach_UnitTests |
I've run this ~20 times and each run averages ~13s. Going to unskip this under the assumption that the above fix has mitigated this since I can't reproduce the timeout. |
61406: sql: validate against array type usages when dropping enum values r=ajwerner a=arulajmani Previously, when dropping an enum value, we weren't validating if the enum value was used in a column of array type. This patch fixes the bug. Fixes #60004 Release justification: bug fix to new functionality Release note: None 61650: roachtest: exclude `lost+found` directory r=knz a=rickystewart This directory is created by the filesystem and unowned file chunks are put there by `fsck`. The directory and its contents aren't readable by anyone except `root`, so this can cause the `du -c /mnt/data1` that `roachtest` performs to fail -- add an `--exclude` to handle this. We already ignore this directory in other contexts (for example, see `pkg/storage/mvcc.go`). Fixes #53663. Release justification: Non-production code change Release note: None 61788: importccl: unskip userfile benchmark r=pbardea a=adityamaru I've run this ~20 times and it averages ~13s to run. I suspect the fixes to linked issues mentioned in #59126 might have mitigated this. Going to unskip due to lack of reproducibility. Fixes: #59126 Release note: None 61828: contention: store contention events on non-SQL keys r=yuzefovich a=yuzefovich Previously, whenever we tried to add a contention event on a non-SQL key, it would encounter an error during decoding tableID/indexID pair, and the event was dropped. This commit extends the contention registry to additionally store information about contention on non-SQL keys. That information is stored in two levels: - on the top level, all `SingleNonSQLKeyContention` objects are ordered by their keys - on the bottom level, all `SingleTxnContention` objects are ordered by the number of times that transaction was observed to contend with other transactions. `SingleTxnContention` protobuf message is moved out of `SingleKeyContention` and is reused for non-SQL keys. This commit also updates the status server API response. I assume that no changes are needed with regards to backwards compatibility since the original version was merged just a few weeks ago, and we haven't had a beta released since then. Fixes: #60669. Release note (sql change): CockroachDB now also stores the information about contention on non-SQL keys. 61862: cliccl: add `load show backups` to display backup collection r=pbardea a=Elliebababa Previously, users can list backups created by `BACKUP INTO` with `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. This PR updates `load show` with `backups` subcommand, which allows users to list backups in a backup collection created by `BACKUP INTO`. With the same purpose as other `load show` subcommands, this update allows users to list backups without running `SHOW BACKUP IN` in a sql session. see #61131 #61829 to checkout other `load show` subcommand. Release note (cli change): Add `load show backups` to display backup collection. Previously, users can list backups created by `BACKUP INTO` via `SHOW BACKUP IN`in a sql session. But this listing task can be also done offline without a running cluster. Now, users are able to list backups in a collection with `cockroach load show backups <collection_url>. 61877: bench/ddl_analysis: fix test for real r=ajwerner a=ajwerner See individual commits. Last is the critical one. Fixes #61856. 61937: colexec: make vectorized stats concurrency safe r=yuzefovich a=yuzefovich **colflow: clean up vectorized stats for rowexec processors** Previously, the wrapped row-execution KV reading processors were implementing `execinfra.KVReader` interface, but they were never used as such, only the ColBatchScans would get used to retrieve the KV stats. This is the case because the row-execution processors report their execution stats themselves, and we don't want to duplicate that info. This commit moves `KVReader` interface into `colexecop` package and now only the ColBatchScans implement it. This allowed for some cleanup around the vectorized stats code, but the main reason for performing this change is that the contract of the interface will be modified by the follow-up commit to mention the safety under concurrent usage, and I didn't want to change the row-execution processors for that since the relevant methods never get called anyway. Additionally, this commit begins emitting of rows-read and bytes-read by the zigzagJoiners and invertedJoiners to complete the metrics picture. Release note: None **colexec: make vectorized stats concurrency safe** Previously, the collection of vectorized stats was not synchronized with the operators themselves. Namely, it was possible to call methods like `GetBytesRead` on the ColBatchScans and Inboxes from a different goroutine (the root materializer or the outbox) than from the main goroutine of the operator. This is now fixed by putting mutexes in place and updating `colexecop.KVReader` interface to require concurrency-safe implementations. Fixes: #61899. Release note: None Co-authored-by: arulajmani <arulajmani@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: elliebababa <ellie24.huang@gmail.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>
Fixes: cockroachdb#59126 Release note: None
I've seen this CI failure a few times. Full log in build artifacts https://teamcity.cockroachdb.com/viewLog.html?buildId=2591863&buildTypeId=Cockroach_UnitTests_Bench&tab=buildResultsDiv
The text was updated successfully, but these errors were encountered: