Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl benchshort times out #59126

Closed
RaduBerinde opened this issue Jan 19, 2021 · 5 comments · Fixed by #61788
Closed

importccl benchshort times out #59126

RaduBerinde opened this issue Jan 19, 2021 · 5 comments · Fixed by #61788
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered).

Comments

@RaduBerinde
Copy link
Member

I've seen this CI failure a few times. Full log in build artifacts https://teamcity.cockroachdb.com/viewLog.html?buildId=2591863&buildTypeId=Cockroach_UnitTests_Bench&tab=buildResultsDiv

*** Test killed with quit: ran too long (6m0s).
exit status 2
FAIL	github.com/cockroachdb/cockroach/pkg/ccl/importccl	360.203s
@RaduBerinde RaduBerinde added the C-test-failure Broken test (automatically or manually discovered). label Jan 19, 2021
@RaduBerinde RaduBerinde changed the title importccl bench times out importccl benchshort times out Jan 19, 2021
@RaduBerinde
Copy link
Member Author

RaduBerinde commented Jan 19, 2021

Logs suggest that the problematic benchmark BenchmarkUserfileImport. May be the same issue as #58948 (I see the same error in the log).

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 19, 2021
This benchmark causes repeated CI failures.

Informs cockroachdb#59126.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Jan 19, 2021
While investigating cockroachdb#59126, I realized we were still going through the
engine for listing the files. I rectified this and as a result,
completely removed the in-mem implementation of the sideloaded storage
that we had in place: instead of it, we simply use an in-mem engine,
which now works like a charm thanks to not bypassing the engine for
listing the file contents (`filepath.Glob` --> `eng.List`).

While I was here, I added some defense in depth: it was unreasonable
that `TruncateTo` was returning an error as a result of failing to
perform best-effort cleanup on its directory.

I am no longer seeing the spurious errors observed in cockroachdb#59126 with
this change. This makes sense, as in this test the sideloaded storage
is set up with an in-memory engine. The old code, which read from
actual disk, would try to glob `auxiliary/...` (i.e. some path relative
to the cwd - oops), which would return no results. As a result, the
sideloaded storage would try to invoke `eng.Remove("auxiliary/...")`
which the engine would refuse, since there would actually be (in-mem)
files stored in the engine. This all resolves now that everyone is
looking at the same files.

It's possible that this fixes the issue with BenchmarkUserFile, as
previously errors from TruncateTo could block the raft log queue,
which in turn could at least slow down the test, but it's hard to
say for sure.

Touches cockroachdb#59126.
Touches cockroachdb#31913.

Release note: None
craig bot pushed a commit that referenced this issue Jan 19, 2021
59028: coldata: fix Bytes invariant in some cases r=yuzefovich a=yuzefovich

**execgen: remove SLICE method**

`execgen.SLICE` directive is used only in one place, and we can use
`execgen.WINDOW` there instead (which will have the same effect).

Release note: None

**coldata: fix updating offsets of bytes in Batch.SetLength**

In `SetLength` method we are maintaining the invariant of `Bytes`
vectors that the offsets are non-decreasing sequences. Previously, this
was done incorrectly when a selection vector is present on the batch
which could lead to out of bounds errors (caught by our panic-catcher)
some time later. This is now fixed by correctly paying attention to the
selection vector.

I neither can easily come up with an example query that would trigger
this condition nor can I prove that it won't occur, but I think we have
seen a single sentry report that could be explained by this bug, so I
think it's worth backporting.

Additionally, this commit uses the assumption that the selection vectors
are increasing sequences in order to calculate the largest index
accessed by the batch.

Fixes: #57297.

Release note (bug fix): Previously, CockroachDB could encounter an
internal error when executing queries with BYTES or STRING types via the
vectorized engine in rare circumstances, and now this is fixed.

59127: importccl: skip BenchmarkUserFileImport r=RaduBerinde a=RaduBerinde

This benchmark causes repeated CI failures.

Informs #59126.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this issue Jan 20, 2021
While investigating cockroachdb#59126, I realized we were still going through the
engine for listing the files. I rectified this and as a result,
completely removed the in-mem implementation of the sideloaded storage
that we had in place: instead of it, we simply use an in-mem engine,
which now works like a charm thanks to not bypassing the engine for
listing the file contents (`filepath.Glob` --> `eng.List`).

While I was here, I added some defense in depth: it was unreasonable
that `TruncateTo` was returning an error as a result of failing to
perform best-effort cleanup on its directory.

I am no longer seeing the spurious errors observed in cockroachdb#59126 with
this change. This makes sense, as in this test the sideloaded storage
is set up with an in-memory engine. The old code, which read from
actual disk, would try to glob `auxiliary/...` (i.e. some path relative
to the cwd - oops), which would return no results. As a result, the
sideloaded storage would try to invoke `eng.Remove("auxiliary/...")`
which the engine would refuse, since there would actually be (in-mem)
files stored in the engine. This all resolves now that everyone is
looking at the same files.

It's possible that this fixes the issue with BenchmarkUserFile, as
previously errors from TruncateTo could block the raft log queue,
which in turn could at least slow down the test, but it's hard to
say for sure.

Touches cockroachdb#59126.
Touches cockroachdb#31913.

Release note: None
craig bot pushed a commit that referenced this issue Jan 21, 2021
59103: tracing: link child into parent, even if not verbose r=irfansharif a=tbg

Prior to this change, when a child was derived from a local parent, we
would not register the child with the parent. In effect, this meant that
any payloads in the child would not be collected.

Release note: None


59134: kvserver: avoid engine bypass in sideload storage r=jbowens a=tbg

While investigating #59126, I realized we were still going through the
engine for listing the files. I rectified this and as a result,
completely removed the in-mem implementation of the sideloaded storage
that we had in place: instead of it, we simply use an in-mem engine,
which now works like a charm thanks to not bypassing the engine for
listing the file contents (`filepath.Glob` --> `eng.List`).

While I was here, I added some defense in depth: it was unreasonable
that `TruncateTo` was returning an error as a result of failing to
perform best-effort cleanup on its directory.

I am no longer seeing the spurious errors observed in #59126 with
this change.

Touches #59126.
Touches #31913.
Fixes #58948.

Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
pbardea pushed a commit to pbardea/cockroach that referenced this issue Jan 21, 2021
While investigating cockroachdb#59126, I realized we were still going through the
engine for listing the files. I rectified this and as a result,
completely removed the in-mem implementation of the sideloaded storage
that we had in place: instead of it, we simply use an in-mem engine,
which now works like a charm thanks to not bypassing the engine for
listing the file contents (`filepath.Glob` --> `eng.List`).

While I was here, I added some defense in depth: it was unreasonable
that `TruncateTo` was returning an error as a result of failing to
perform best-effort cleanup on its directory.

I am no longer seeing the spurious errors observed in cockroachdb#59126 with
this change. This makes sense, as in this test the sideloaded storage
is set up with an in-memory engine. The old code, which read from
actual disk, would try to glob `auxiliary/...` (i.e. some path relative
to the cwd - oops), which would return no results. As a result, the
sideloaded storage would try to invoke `eng.Remove("auxiliary/...")`
which the engine would refuse, since there would actually be (in-mem)
files stored in the engine. This all resolves now that everyone is
looking at the same files.

It's possible that this fixes the issue with BenchmarkUserFile, as
previously errors from TruncateTo could block the raft log queue,
which in turn could at least slow down the test, but it's hard to
say for sure.

Touches cockroachdb#59126.
Touches cockroachdb#31913.

Release note: None
@jordanlewis
Copy link
Member

I've gotten this too, but in normal unit tests...

https://teamcity.cockroachdb.com/viewLog.html?buildId=2602625&buildTypeId=Cockroach_UnitTests

@adityamaru
Copy link
Contributor

I've run this ~20 times and each run averages ~13s. Going to unskip this under the assumption that the above fix has mitigated this since I can't reproduce the timeout.

craig bot pushed a commit that referenced this issue Mar 15, 2021
61406: sql: validate against array type usages when dropping enum values r=ajwerner a=arulajmani

Previously, when dropping an enum value, we weren't validating if the
enum value was used in a column of array type. This patch fixes the bug.

Fixes #60004

Release justification: bug fix to new functionality
Release note: None

61650: roachtest: exclude `lost+found` directory r=knz a=rickystewart

This directory is created by the filesystem and unowned file chunks are
put there by `fsck`. The directory and its contents aren't readable by
anyone except `root`, so this can cause the `du -c /mnt/data1` that
`roachtest` performs to fail -- add an `--exclude` to handle this.

We already ignore this directory in other contexts (for example, see
`pkg/storage/mvcc.go`).

Fixes #53663.

Release justification: Non-production code change
Release note: None

61788: importccl: unskip userfile benchmark r=pbardea a=adityamaru

I've run this ~20 times and it averages ~13s to run. I suspect the fixes to linked issues mentioned in #59126 might have mitigated this. Going to unskip due to lack of reproducibility.

Fixes: #59126

Release note: None

61828: contention: store contention events on non-SQL keys r=yuzefovich a=yuzefovich

Previously, whenever we tried to add a contention event on a non-SQL
key, it would encounter an error during decoding tableID/indexID pair,
and the event was dropped. This commit extends the contention registry
to additionally store information about contention on non-SQL keys. That
information is stored in two levels:
- on the top level, all `SingleNonSQLKeyContention` objects are ordered
  by their keys
- on the bottom level, all `SingleTxnContention` objects are ordered by the
  number of times that transaction was observed to contend with other
  transactions.

`SingleTxnContention` protobuf message is moved out of
`SingleKeyContention` and is reused for non-SQL keys. This commit also
updates the status server API response. I assume that no changes are
needed with regards to backwards compatibility since the original
version was merged just a few weeks ago, and we haven't had a beta
released since then.

Fixes: #60669.

Release note (sql change): CockroachDB now also stores the information
about contention on non-SQL keys.

61862: cliccl: add `load show backups` to display backup collection r=pbardea a=Elliebababa

Previously, users can list backups created by `BACKUP INTO`
with `SHOW BACKUP IN`in a sql session. But this listing task
can be also done offline without a running cluster.

This PR updates `load show` with `backups` subcommand, 
which allows users to list backups in a backup collection 
created by `BACKUP INTO`. 
With the same purpose as other `load show` subcommands, 
this update allows users to list backups without running 
`SHOW BACKUP IN` in a sql session.

see #61131 #61829 to checkout other `load show` subcommand.

Release note (cli change): Add `load show backups` to
display backup collection. Previously, users can list backups
created by `BACKUP INTO` via `SHOW BACKUP IN`in a sql
session. But this listing task can be also done offline without a
running cluster. Now, users are able to list backups in a collection
with `cockroach load show backups <collection_url>.

61877: bench/ddl_analysis: fix test for real r=ajwerner a=ajwerner

See individual commits. Last is the critical one. 

Fixes #61856.

61937: colexec: make vectorized stats concurrency safe r=yuzefovich a=yuzefovich

**colflow: clean up vectorized stats for rowexec processors**

Previously, the wrapped row-execution KV reading processors were
implementing `execinfra.KVReader` interface, but they were never used as
such, only the ColBatchScans would get used to retrieve the KV stats.
This is the case because the row-execution processors report their
execution stats themselves, and we don't want to duplicate that info.
This commit moves `KVReader` interface into `colexecop` package and now
only the ColBatchScans implement it. This allowed for some cleanup
around the vectorized stats code, but the main reason for performing
this change is that the contract of the interface will be modified by
the follow-up commit to mention the safety under concurrent usage, and
I didn't want to change the row-execution processors for that since the
relevant methods never get called anyway.

Additionally, this commit begins emitting of rows-read and bytes-read by
the zigzagJoiners and invertedJoiners to complete the metrics picture.

Release note: None

**colexec: make vectorized stats concurrency safe**

Previously, the collection of vectorized stats was not synchronized with
the operators themselves. Namely, it was possible to call methods like
`GetBytesRead` on the ColBatchScans and Inboxes from a different
goroutine (the root materializer or the outbox) than from the main
goroutine of the operator. This is now fixed by putting mutexes in place
and updating `colexecop.KVReader` interface to require concurrency-safe
implementations.

Fixes: #61899.

Release note: None

Co-authored-by: arulajmani <arulajmani@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: elliebababa <ellie24.huang@gmail.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
@craig craig bot closed this as completed in 32c6608 Mar 15, 2021
adityamaru added a commit to adityamaru/cockroach that referenced this issue May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants