Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: open_ports failed #123887

Open
cockroach-teamcity opened this issue May 9, 2024 · 3 comments
Open

roachtest: open_ports failed #123887

cockroach-teamcity opened this issue May 9, 2024 · 3 comments
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented May 9, 2024

roachtest.open_ports failed with artifacts on release-24.1 @ 4665969fb6f4a4c1b58f10d9f99799f1939a0d62:

test import/tpcc/warehouses=4000/geo failed: (cluster.go:2235).Start: failed to find services to register: failed to find 1 open ports: TRANSIENT_ERROR(open_ports): expected 1 ports, got 0
test artifacts and logs in: /artifacts/import/tpcc/warehouses=4000/geo/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=16
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-38588

@cockroach-teamcity cockroach-teamcity added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue labels May 9, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone May 9, 2024
@renatolabs
Copy link
Contributor

@herkolategan curious if you have any clues about what could have happened here.

open_ports.sh should have failed (exit 1) if it didn't find any ports, but it didn't. Which means ports_found is > 0 but strings.Fields(out) is empty. Not sure what to make of this.

@herkolategan
Copy link
Collaborator

This was an SSH failure:

OpenSSH_8.2p1 Ubuntu-4ubuntu0.11, OpenSSL 1.1.1f  31 Mar 2020
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname 34.91.33.240 is address
debug2: ssh_connect_direct
debug1: Connecting to 34.91.33.240 [34.91.33.240] port 22.
debug2: fd 4 setting O_NONBLOCK
debug1: connect to address 34.91.33.240 port 22: Connection timed out
ssh: connect to host 34.91.33.240 port 22: Connection timed out

renatolabs added a commit to renatolabs/cockroach that referenced this issue May 20, 2024
Previously, roachtest would only look at the topmost error in a chain
that matched a `TransientError` (or `ErrorWithOwnership`) when
checking for flakes. However, that is in most cases *not* what we
want: if a transient error wraps another transient error, the actual
reason for the failure is the original (wrapped) error.

Informs: cockroachdb#123887

Release note: None
renatolabs added a commit to renatolabs/cockroach that referenced this issue May 21, 2024
Previously, roachtest would only look at the outermost error in a
chain that matched a `TransientError` (or `ErrorWithOwnership`) when
checking for flakes. However, that is in most cases *not* what we
want: if a transient error wraps another transient error, the actual
reason for the failure is the original (wrapped) error.

Informs: cockroachdb#123887

Release note: None
craig bot pushed a commit that referenced this issue May 21, 2024
119416: pkg/util/eventagg: general aggregation framework for reduction of event cardinality r=dhartunian a=abarganier

**Reviewer note: review commit-wise**

The eventagg package is (currently) a proof of concept ("POC") that aims to provide an easy-to-use library that standardizes the way in which we aggregate Observability event data in CRDB. The goal is to eventually emit that data as "exhaust" from CRDB, which downstream systems can consume to build Observability features that do not rely on CRDB's own availability to aid in debugging & investigations. Additionally, we want to provide facilities for code within CRDB to consume this same data, such that it can also power features internally.

This pull request contains work to create the aggregation mechanism in `pkg/util/eventagg`.

This facilities provide a way of aggregating notable events to reduce cardinality, before performing further processing and/or structured logging.

In addition to the framework, a toy SQL Stats example is provided in `pkg/sql/sqlstats/aggregate.go`, which shows the current developer experience when using the APIs.

See `pkg/util/eventagg/doc.go` for more details

Since this feature is currently experimental, it's gated by the `COCKROACH_ENABLE_STRUCTURED_EVENTS` environment variable, which is disabled by default.

---

Release note: none

Epic: CRDB-35919

123120: ui: Highlight unavailable ranges in red on the summary bar with nonzero r=abarganier a=theloneexplorerquest

Modify the summary bar to change the color of unavailable ranges. When the unavailable range is greater than zero, it will be displayed in red; if it is zero, it will be green.

Fix: #122014

Release note (ui): Changed the color of unavailable ranges on the summary bar to red when nonzero; ranges are green when zero.

124160: roachtest: add test for admission control disk bandwidth  r=sumeerbhola a=aadityasondhi

This test runs a single node target cluster that has two workloads
running on it. The lower priority (qos=background) is very bandwidth
intensive, and without the AC bandwidth limiter would saturate the
provisioned bandwidth (controlled using cgroups).

This test shows how setting the cluster setting
`kvadmission.store.provisioned-bandwidth` limits the disk bandwidth
usage of lower priority work and shapes it at the value set in the
setting.

Fixes #121576.

Release note: None


124293: tools: switch md5 cmd name based on existence  r=dt a=dt

Release note: none.
Epic: none.

124348: backupccl: download pre restore data in cluster restore r=dt a=msbutler

This patch adds the pre restore data spans to the list of spans to download.
While these pre restore spans map to data in the temporary system table
database that are then rewwritten to the actual system table, the download job
ought to download all external data linked into the cluster out of principle.

Fixes #124330

Release note: none

124403: roachtest: use first transient error when checking for flakes r=srosenberg a=renatolabs

Previously, roachtest would only look at the outermost error in a chain that matched a `TransientError` (or `ErrorWithOwnership`) when checking for flakes. However, that is in most cases *not* what we want: if a transient error wraps another transient error, the actual reason for the failure is the original (wrapped) error.

Informs: #123887

Release note: None

124486: kvclient: add WithFiltering option to rangefeed client r=nvanbenschoten,msbutler a=stevendanna

This adds a WithFiltering option to the rangefeed client that passes through the option to the underlying rangefeed.

Epic: none
Release note: None

124491: raft: remove RawNode.TickQuiesced r=pav-kv a=nvanbenschoten

This commit removes the `(*RawNode).TickQuiesced` method. The method was deprecated back in etcd-io/raft#62 and has not been in use since 2018.

Epic: None
Release note: None

Co-authored-by: Alex Barganier <abarganier@cockroachlabs.com>
Co-authored-by: theloneexplorerquest <theloneexplorerquest@gmail.com>
Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
blathers-crl bot pushed a commit that referenced this issue May 23, 2024
Previously, roachtest would only look at the outermost error in a
chain that matched a `TransientError` (or `ErrorWithOwnership`) when
checking for flakes. However, that is in most cases *not* what we
want: if a transient error wraps another transient error, the actual
reason for the failure is the original (wrapped) error.

Informs: #123887

Release note: None
renatolabs added a commit to renatolabs/cockroach that referenced this issue May 23, 2024
Previously, roachtest would only look at the outermost error in a
chain that matched a `TransientError` (or `ErrorWithOwnership`) when
checking for flakes. However, that is in most cases *not* what we
want: if a transient error wraps another transient error, the actual
reason for the failure is the original (wrapped) error.

Informs: cockroachdb#123887

Release note: None
@cockroach-teamcity
Copy link
Member Author

roachtest.open_ports failed with artifacts on release-24.1 @ 050227d6cc528b8877a15bb7cfc312632b0e3f5c:

test c2c/shutdown/dest/coordinator failed: (cluster.go:2290).Start: failed to find services to register: failed to find 1 open ports: TRANSIENT_ERROR(open_ports): expected 1 ports, got 0
test artifacts and logs in: /artifacts/c2c/shutdown/dest/coordinator/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=8
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 O-roachtest O-robot Originated from a bot. T-testeng TestEng Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Projects
No open projects
Status: Triage
Development

No branches or pull requests

3 participants