Skip to content

rpc: avoid allocation of TracingInternalClient when tracing disabled#136941

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/internalClientTracingWrap
Dec 10, 2024
Merged

rpc: avoid allocation of TracingInternalClient when tracing disabled#136941
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/internalClientTracingWrap

Conversation

@nvb
Copy link
Contributor

@nvb nvb commented Dec 7, 2024

Small performance improvement in the common case where tracing is disabled.

name                                            old time/op    new time/op    delta
Sysbench/SQL/1node_remote/oltp_read_only-30       6.84ms ± 2%    6.86ms ± 3%    ~     (p=0.912 n=10+10)
Sysbench/SQL/1node_remote/oltp_write_only-30      4.16ms ± 2%    4.17ms ± 2%    ~     (p=0.529 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_write-30      11.2ms ± 2%    11.2ms ± 2%    ~     (p=0.912 n=10+10)
Sysbench/SQL/1node_remote/oltp_point_select-30     458µs ± 2%     459µs ± 3%    ~     (p=0.971 n=10+10)
Sysbench/SQL/3node/oltp_read_only-30              7.02ms ± 2%    7.06ms ± 2%    ~     (p=0.529 n=10+10)
Sysbench/SQL/3node/oltp_write_only-30             4.42ms ± 2%    4.41ms ± 2%    ~     (p=0.604 n=10+9)
Sysbench/SQL/3node/oltp_read_write-30             11.7ms ± 2%    11.7ms ± 2%    ~     (p=0.684 n=10+10)
Sysbench/SQL/3node/oltp_point_select-30            473µs ± 2%     474µs ± 2%    ~     (p=0.796 n=10+10)
Sysbench/KV/1node_remote/oltp_read_only-30        2.67ms ± 2%    2.68ms ± 2%    ~     (p=0.243 n=9+10)
Sysbench/KV/1node_remote/oltp_write_only-30       2.20ms ± 3%    2.19ms ± 2%    ~     (p=0.739 n=10+10)
Sysbench/KV/1node_remote/oltp_read_write-30       5.04ms ± 2%    5.05ms ± 2%    ~     (p=0.853 n=10+10)
Sysbench/KV/1node_remote/oltp_point_select-30      125µs ± 2%     125µs ± 1%    ~     (p=0.278 n=10+9)
Sysbench/KV/3node/oltp_read_only-30               2.79ms ± 2%    2.79ms ± 2%    ~     (p=0.579 n=10+10)
Sysbench/KV/3node/oltp_write_only-30              2.51ms ± 2%    2.50ms ± 1%    ~     (p=0.762 n=10+8)
Sysbench/KV/3node/oltp_read_write-30              5.57ms ± 2%    5.55ms ± 1%    ~     (p=0.968 n=10+9)
Sysbench/KV/3node/oltp_point_select-30             131µs ± 4%     132µs ± 3%    ~     (p=0.481 n=10+10)

name                                            old alloc/op   new alloc/op   delta
Sysbench/SQL/1node_remote/oltp_read_only-30       1.35MB ± 0%    1.35MB ± 0%    ~     (p=0.063 n=10+10)
Sysbench/SQL/1node_remote/oltp_write_only-30       670kB ± 0%     670kB ± 0%    ~     (p=0.739 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_write-30      2.11MB ± 0%    2.11MB ± 0%    ~     (p=0.063 n=10+10)
Sysbench/SQL/1node_remote/oltp_point_select-30    42.2kB ± 0%    42.2kB ± 0%    ~     (p=0.540 n=10+10)
Sysbench/SQL/3node/oltp_read_only-30              1.42MB ± 4%    1.40MB ± 1%    ~     (p=0.965 n=10+8)
Sysbench/SQL/3node/oltp_write_only-30             1.15MB ± 3%    1.13MB ± 8%    ~     (p=0.165 n=10+10)
Sysbench/SQL/3node/oltp_read_write-30             2.77MB ± 1%    2.76MB ± 1%    ~     (p=0.190 n=10+10)
Sysbench/SQL/3node/oltp_point_select-30           44.9kB ± 1%    47.2kB ±12%    ~     (p=0.101 n=8+10)
Sysbench/KV/1node_remote/oltp_read_only-30         791kB ± 0%     790kB ± 0%    ~     (p=0.280 n=10+10)
Sysbench/KV/1node_remote/oltp_write_only-30        334kB ± 2%     334kB ± 2%    ~     (p=0.971 n=10+10)
Sysbench/KV/1node_remote/oltp_read_write-30       1.12MB ± 0%    1.12MB ± 0%    ~     (p=0.447 n=10+9)
Sysbench/KV/1node_remote/oltp_point_select-30     16.4kB ± 0%    16.4kB ± 0%    ~     (p=0.956 n=10+10)
Sysbench/KV/3node/oltp_read_only-30                808kB ± 1%     809kB ± 0%    ~     (p=0.211 n=10+9)
Sysbench/KV/3node/oltp_write_only-30               618kB ± 1%     621kB ± 1%    ~     (p=0.075 n=10+10)
Sysbench/KV/3node/oltp_read_write-30              1.45MB ± 1%    1.45MB ± 0%    ~     (p=0.315 n=10+8)
Sysbench/KV/3node/oltp_point_select-30            17.1kB ± 1%    17.0kB ± 1%    ~     (p=0.382 n=10+10)

name                                            old allocs/op  new allocs/op  delta
Sysbench/KV/1node_remote/oltp_point_select-30        211 ± 0%       210 ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/KV/3node/oltp_point_select-30               213 ± 0%       212 ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/SQL/3node/oltp_read_write-30              17.5k ± 1%     17.4k ± 1%  -0.46%  (p=0.029 n=10+10)
Sysbench/KV/1node_remote/oltp_read_only-30         3.99k ± 0%     3.97k ± 0%  -0.35%  (p=0.000 n=10+10)
Sysbench/KV/3node/oltp_read_only-30                4.02k ± 0%     4.01k ± 0%  -0.34%  (p=0.000 n=10+10)
Sysbench/KV/1node_remote/oltp_write_only-30        3.15k ± 0%     3.14k ± 0%  -0.32%  (p=0.000 n=10+10)
Sysbench/KV/3node/oltp_read_write-30               8.63k ± 0%     8.61k ± 0%  -0.31%  (p=0.000 n=10+9)
Sysbench/KV/1node_remote/oltp_read_write-30        7.20k ± 0%     7.18k ± 0%  -0.29%  (p=0.000 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_only-30        7.70k ± 0%     7.67k ± 0%  -0.27%  (p=0.000 n=10+10)
Sysbench/SQL/3node/oltp_point_select-30              459 ± 0%       458 ± 0%  -0.27%  (p=0.013 n=9+9)
Sysbench/SQL/1node_remote/oltp_point_select-30       440 ± 0%       439 ± 0%  -0.23%  (p=0.000 n=10+9)
Sysbench/SQL/1node_remote/oltp_write_only-30       6.97k ± 0%     6.95k ± 0%  -0.20%  (p=0.000 n=10+9)
Sysbench/KV/3node/oltp_write_only-30               4.41k ± 0%     4.40k ± 0%  -0.18%  (p=0.000 n=9+10)
Sysbench/SQL/1node_remote/oltp_read_write-30       14.7k ± 0%     14.7k ± 0%  -0.16%  (p=0.002 n=10+9)
Sysbench/SQL/3node/oltp_read_only-30               7.96k ± 1%     7.98k ± 1%    ~     (p=0.618 n=9+10)
Sysbench/SQL/3node/oltp_write_only-30              9.32k ± 1%     9.24k ± 2%    ~     (p=0.211 n=9+10)

Epic: None
Release note: None

@nvb nvb added the o-perf-efficiency Related to performance efficiency label Dec 7, 2024
@nvb nvb requested a review from tbg December 7, 2024 01:30
@nvb nvb requested a review from a team as a code owner December 7, 2024 01:30
@blathers-crl
Copy link

blathers-crl bot commented Dec 7, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to go further, you can move TracingInternalClient and associated logic to kvpb, make it type tracingInternalClient internalClient, and implement all methods on it (each one just calls (*iternalClient)(c).Method()). That would avoid the allocation and the virtual calls.

Or better yet, make it type tracingInternalClient grpc.ClientConn and implement all calls as ic := internalClient{(*grpc.ClientConn)(c)}; ic.Method(). That also avoids the allocation in NewInternalClient.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/rpc/nodedialer/nodedialer.go line 288 at r1 (raw file):

func maybeWrapInTracingClient(ctx context.Context, client kvpb.InternalClient) kvpb.InternalClient {
	sp := tracing.SpanFromContext(ctx)

Is this ctx the same with the one that goes to Batch. If no, is it possible that this doesn't have a span but the Batch one will? If yes, can we store the span in TracingInternalClient so we don't obtain it from the context again?

@nvb nvb requested a review from RaduBerinde December 9, 2024 21:38
Copy link
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestions! I agree that we could do more to eliminate these heap allocations. I'll revisit this once we land #136648, as that changes the types of optimizations we'll have available to us.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)


pkg/rpc/nodedialer/nodedialer.go line 288 at r1 (raw file):

Previously, RaduBerinde wrote…

Is this ctx the same with the one that goes to Batch. If no, is it possible that this doesn't have a span but the Batch one will? If yes, can we store the span in TracingInternalClient so we don't obtain it from the context again?

Yes, this will be the same ctx. In isolation, I agree with storing the span on the client, but hanging state off of TracingInternalClient will prevent us from making some of the other optimizations you're proposing above, so I'll hold off on this for now.

@craig
Copy link
Contributor

craig bot commented Dec 9, 2024

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 10, 2024

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 10, 2024

Build failed:

Small performance improvement in the common case where tracing is disabled.

```
name                                            old time/op    new time/op    delta
Sysbench/SQL/1node_remote/oltp_read_only-30       6.84ms ± 2%    6.86ms ± 3%    ~     (p=0.912 n=10+10)
Sysbench/SQL/1node_remote/oltp_write_only-30      4.16ms ± 2%    4.17ms ± 2%    ~     (p=0.529 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_write-30      11.2ms ± 2%    11.2ms ± 2%    ~     (p=0.912 n=10+10)
Sysbench/SQL/1node_remote/oltp_point_select-30     458µs ± 2%     459µs ± 3%    ~     (p=0.971 n=10+10)
Sysbench/SQL/3node/oltp_read_only-30              7.02ms ± 2%    7.06ms ± 2%    ~     (p=0.529 n=10+10)
Sysbench/SQL/3node/oltp_write_only-30             4.42ms ± 2%    4.41ms ± 2%    ~     (p=0.604 n=10+9)
Sysbench/SQL/3node/oltp_read_write-30             11.7ms ± 2%    11.7ms ± 2%    ~     (p=0.684 n=10+10)
Sysbench/SQL/3node/oltp_point_select-30            473µs ± 2%     474µs ± 2%    ~     (p=0.796 n=10+10)
Sysbench/KV/1node_remote/oltp_read_only-30        2.67ms ± 2%    2.68ms ± 2%    ~     (p=0.243 n=9+10)
Sysbench/KV/1node_remote/oltp_write_only-30       2.20ms ± 3%    2.19ms ± 2%    ~     (p=0.739 n=10+10)
Sysbench/KV/1node_remote/oltp_read_write-30       5.04ms ± 2%    5.05ms ± 2%    ~     (p=0.853 n=10+10)
Sysbench/KV/1node_remote/oltp_point_select-30      125µs ± 2%     125µs ± 1%    ~     (p=0.278 n=10+9)
Sysbench/KV/3node/oltp_read_only-30               2.79ms ± 2%    2.79ms ± 2%    ~     (p=0.579 n=10+10)
Sysbench/KV/3node/oltp_write_only-30              2.51ms ± 2%    2.50ms ± 1%    ~     (p=0.762 n=10+8)
Sysbench/KV/3node/oltp_read_write-30              5.57ms ± 2%    5.55ms ± 1%    ~     (p=0.968 n=10+9)
Sysbench/KV/3node/oltp_point_select-30             131µs ± 4%     132µs ± 3%    ~     (p=0.481 n=10+10)

name                                            old alloc/op   new alloc/op   delta
Sysbench/SQL/1node_remote/oltp_read_only-30       1.35MB ± 0%    1.35MB ± 0%    ~     (p=0.063 n=10+10)
Sysbench/SQL/1node_remote/oltp_write_only-30       670kB ± 0%     670kB ± 0%    ~     (p=0.739 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_write-30      2.11MB ± 0%    2.11MB ± 0%    ~     (p=0.063 n=10+10)
Sysbench/SQL/1node_remote/oltp_point_select-30    42.2kB ± 0%    42.2kB ± 0%    ~     (p=0.540 n=10+10)
Sysbench/SQL/3node/oltp_read_only-30              1.42MB ± 4%    1.40MB ± 1%    ~     (p=0.965 n=10+8)
Sysbench/SQL/3node/oltp_write_only-30             1.15MB ± 3%    1.13MB ± 8%    ~     (p=0.165 n=10+10)
Sysbench/SQL/3node/oltp_read_write-30             2.77MB ± 1%    2.76MB ± 1%    ~     (p=0.190 n=10+10)
Sysbench/SQL/3node/oltp_point_select-30           44.9kB ± 1%    47.2kB ±12%    ~     (p=0.101 n=8+10)
Sysbench/KV/1node_remote/oltp_read_only-30         791kB ± 0%     790kB ± 0%    ~     (p=0.280 n=10+10)
Sysbench/KV/1node_remote/oltp_write_only-30        334kB ± 2%     334kB ± 2%    ~     (p=0.971 n=10+10)
Sysbench/KV/1node_remote/oltp_read_write-30       1.12MB ± 0%    1.12MB ± 0%    ~     (p=0.447 n=10+9)
Sysbench/KV/1node_remote/oltp_point_select-30     16.4kB ± 0%    16.4kB ± 0%    ~     (p=0.956 n=10+10)
Sysbench/KV/3node/oltp_read_only-30                808kB ± 1%     809kB ± 0%    ~     (p=0.211 n=10+9)
Sysbench/KV/3node/oltp_write_only-30               618kB ± 1%     621kB ± 1%    ~     (p=0.075 n=10+10)
Sysbench/KV/3node/oltp_read_write-30              1.45MB ± 1%    1.45MB ± 0%    ~     (p=0.315 n=10+8)
Sysbench/KV/3node/oltp_point_select-30            17.1kB ± 1%    17.0kB ± 1%    ~     (p=0.382 n=10+10)

name                                            old allocs/op  new allocs/op  delta
Sysbench/KV/1node_remote/oltp_point_select-30        211 ± 0%       210 ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/KV/3node/oltp_point_select-30               213 ± 0%       212 ± 0%  -0.47%  (p=0.000 n=10+10)
Sysbench/SQL/3node/oltp_read_write-30              17.5k ± 1%     17.4k ± 1%  -0.46%  (p=0.029 n=10+10)
Sysbench/KV/1node_remote/oltp_read_only-30         3.99k ± 0%     3.97k ± 0%  -0.35%  (p=0.000 n=10+10)
Sysbench/KV/3node/oltp_read_only-30                4.02k ± 0%     4.01k ± 0%  -0.34%  (p=0.000 n=10+10)
Sysbench/KV/1node_remote/oltp_write_only-30        3.15k ± 0%     3.14k ± 0%  -0.32%  (p=0.000 n=10+10)
Sysbench/KV/3node/oltp_read_write-30               8.63k ± 0%     8.61k ± 0%  -0.31%  (p=0.000 n=10+9)
Sysbench/KV/1node_remote/oltp_read_write-30        7.20k ± 0%     7.18k ± 0%  -0.29%  (p=0.000 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_only-30        7.70k ± 0%     7.67k ± 0%  -0.27%  (p=0.000 n=10+10)
Sysbench/SQL/3node/oltp_point_select-30              459 ± 0%       458 ± 0%  -0.27%  (p=0.013 n=9+9)
Sysbench/SQL/1node_remote/oltp_point_select-30       440 ± 0%       439 ± 0%  -0.23%  (p=0.000 n=10+9)
Sysbench/SQL/1node_remote/oltp_write_only-30       6.97k ± 0%     6.95k ± 0%  -0.20%  (p=0.000 n=10+9)
Sysbench/KV/3node/oltp_write_only-30               4.41k ± 0%     4.40k ± 0%  -0.18%  (p=0.000 n=9+10)
Sysbench/SQL/1node_remote/oltp_read_write-30       14.7k ± 0%     14.7k ± 0%  -0.16%  (p=0.002 n=10+9)
Sysbench/SQL/3node/oltp_read_only-30               7.96k ± 1%     7.98k ± 1%    ~     (p=0.618 n=9+10)
Sysbench/SQL/3node/oltp_write_only-30              9.32k ± 1%     9.24k ± 2%    ~     (p=0.211 n=9+10)
```

Epic: None
Release note: None
@nvb nvb force-pushed the nvanbenschoten/internalClientTracingWrap branch from 44df9c6 to f1c04dd Compare December 10, 2024 00:49
@nvb
Copy link
Contributor Author

nvb commented Dec 10, 2024

bors r+

@craig craig bot merged commit de3b122 into cockroachdb:master Dec 10, 2024
@nvb nvb deleted the nvanbenschoten/internalClientTracingWrap branch December 10, 2024 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-perf-efficiency Related to performance efficiency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants