Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: unify and harden crdb_internal.cluster_inflight_traces #106326

Merged
merged 1 commit into from
Jul 11, 2023

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 6, 2023

This commit unifies population of crdb_internal.cluster_inflight_traces between single-tenant and multi-tenant modes - from now on, in all deployment modes we use all available SQL instances to fetch span recordings from. The trace collector has been refactored accordingly.

Additionally, the trace collector is hardened to ignore any errors that are encountered when retrieving recordings from a single SQL instance (previously, it would fail if we encountered an error when dialing the SQL instance or when receiving GetSpanRecordingResponse, and now we will log a warning and proceed silently). The rationale for making this change is that instances might no longer be "ready" when the iteration reaches them, so this should make the virtual table generation more bullet-proof.

Epic: CRDB-26691
Touches: #100826

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

This commit unifies population of
`crdb_internal.cluster_inflight_traces` between single-tenant and
multi-tenant modes - from now on, in all deployment modes we use all
available SQL instances to fetch span recordings from. The trace
collector has been refactored accordingly.

Additionally, the trace collector is hardened to ignore any errors
that are encountered when retrieving recordings from a single SQL
instance (previously, it would fail if we encountered an error when
dialing the SQL instance or when receiving GetSpanRecordingResponse,
and now we will log a warning and proceed silently). The rationale
for making this change is that instances might no longer be "ready"
when the iteration reaches them, so this should make the virtual
table generation more bullet-proof.

Release note: None
@yuzefovich yuzefovich marked this pull request as ready for review July 7, 2023 23:20
@yuzefovich yuzefovich requested a review from a team July 7, 2023 23:20
@yuzefovich yuzefovich requested a review from a team as a code owner July 7, 2023 23:20
@yuzefovich yuzefovich requested review from dhartunian, knz and a team and removed request for a team July 7, 2023 23:20
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)


pkg/sql/crdb_internal.go line 1928 at r1 (raw file):

CREATE TABLE crdb_internal.cluster_inflight_traces (
  trace_id     INT NOT NULL,    -- The trace's ID.
  node_id      INT NOT NULL,    -- The node's ID.

Do we want to rename node_id to instance_id? I'm leaning towards "no" given that "instance" seems to be internal-only concept.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, and @yuzefovich)

a discussion (no related file):
Overall this LGTM.

I agree with not renaming the node_id column for now. Until the operator experience for MT clusters is further understood, SQL instance ID does currently seem like an internal concept.

I suppose an alternative to completely dropping errors when dialing SQL instances, would be to send a notice or something to indicate that you might be viewing partial results. But, since we include the node_id in the output and since this is an internal function, I'm not sure that is worth it.


@yuzefovich
Copy link
Member Author

But, since we include the node_id in the output and since this is an internal function, I'm not sure that is worth it.

This was exactly my thinking.

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 11, 2023

Build succeeded:

@craig craig bot merged commit 76e60d8 into cockroachdb:master Jul 11, 2023
2 checks passed
@yuzefovich yuzefovich deleted the mt-traces-followup branch July 11, 2023 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants