Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't tokio::spawn Subscriber::unsubscribe #1061

Closed
wants to merge 1 commit into from

Conversation

paolobarbolini
Copy link
Contributor

This changes Subscriber to use try_send instead of send to send the Unsubscribe command.
Considering that the only advantage of send is that it waits until it finds an empty position in the channel before writing, while try_send fails if the channel is full, it didn't make sense to use send as it required the Future to be tokio::spawned, sort of like using an unbounded queue to write into a bounded one.

Considering that the ConnectionHandler already catches missed unsubscriptions, I think it's fair to rely more on it.

Err(mpsc::error::TrySendError::Closed(_)) => {
self.subscriptions.remove(&sid);
self.connection
.write_op(&ClientOp::Unsubscribe { sid, max: None })
.await?;
self.handle_flush().await?;
}

@caspervonb
Copy link
Collaborator

Considering that the ConnectionHandler already catches missed unsubscriptions, I think it's fair to rely more on it.

We only kill off the channels on reconnect tho (which doesn't send unsub because it's re-establishing the state) so this could lead to subscriptions hanging around longer than intended.

@Jarema
Copy link
Member

Jarema commented Jul 30, 2023

One of the things we really care about in the NATS org are idle resources.

Your reasoning is correct - if anything for a deleted subscription would come in, it would delete it and unsubscribe. However if nothing comes in - that subscription is still registered on the server and becomes a zombie.

In global topologies containing millions of subscriptions and idle resources, such things can add up.

I do realize there is a small possibility for it to happen and that the current approach is not perfect either, but I wanted to give you context why we're not merging it as of yet.

@caspervonb
Copy link
Collaborator

Alternative approach could be to poll the closed future on the subscriptions themselves.

@paolobarbolini
Copy link
Contributor Author

Alternative approach could be to poll the closed future on the subscriptions themselves.

I have a similar idea but it will have to wait for #1060

@paolobarbolini
Copy link
Contributor Author

paolobarbolini commented Sep 2, 2023

This will make sure we don't loose unsubscriptions. It'll be fun to fix the merge conflict with #1060 🙄

@paolobarbolini
Copy link
Contributor Author

I'm trying to figure out why some benchmarks got worse.

@paolobarbolini
Copy link
Contributor Author

I redid the benchmarks and the regressions seem to have gone away

nats::publish_throughput/32
                        time:   [136.75 ms 164.08 ms 180.37 ms]
                        thrpt:  [84.596 MiB/s 92.999 MiB/s 111.58 MiB/s]
                 change:
                        time:   [-15.230% +5.8552% +33.107%] (p = 0.63 > 0.05)
                        thrpt:  [-24.873% -5.5313% +17.966%]
                        No change in performance detected.
nats::publish_throughput/1024
                        time:   [228.68 ms 234.13 ms 238.97 ms]
                        thrpt:  [1.9954 GiB/s 2.0367 GiB/s 2.0852 GiB/s]
                 change:
                        time:   [-1.6376% +2.1682% +5.5261%] (p = 0.26 > 0.05)
                        thrpt:  [-5.2367% -2.1221% +1.6648%]
                        No change in performance detected.
Benchmarking nats::publish_throughput/8192: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.4s.
nats::publish_throughput/8192
                        time:   [1.2324 s 1.3206 s 1.4001 s]
                        thrpt:  [2.7245 GiB/s 2.8886 GiB/s 3.0954 GiB/s]
                 change:
                        time:   [-11.122% -4.4919% +2.6441%] (p = 0.25 > 0.05)
                        thrpt:  [-2.5759% +4.7032% +12.513%]
                        No change in performance detected.

Benchmarking nats::publish_amount/32: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.3s or enable flat sampling.
nats::publish_amount/32 time:   [117.72 ms 142.13 ms 167.59 ms]
                        thrpt:  [2.9834 Melem/s 3.5178 Melem/s 4.2472 Melem/s]
                 change:
                        time:   [-22.367% -3.7361% +18.525%] (p = 0.74 > 0.05)
                        thrpt:  [-15.629% +3.8811% +28.812%]
                        No change in performance detected.
nats::publish_amount/1024
                        time:   [224.45 ms 232.34 ms 240.86 ms]
                        thrpt:  [2.0759 Melem/s 2.1521 Melem/s 2.2277 Melem/s]
                 change:
                        time:   [-2.9379% +1.0238% +5.8220%] (p = 0.66 > 0.05)
                        thrpt:  [-5.5017% -1.0135% +3.0268%]
                        No change in performance detected.
Benchmarking nats::publish_amount/8192: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 14.1s.
nats::publish_amount/8192
                        time:   [1.2625 s 1.3439 s 1.4266 s]
                        thrpt:  [350.48 Kelem/s 372.04 Kelem/s 396.05 Kelem/s]
                 change:
                        time:   [-12.099% -5.2984% +2.1473%] (p = 0.18 > 0.05)
                        thrpt:  [-2.1022% +5.5948% +13.764%]
                        No change in performance detected.

nats::subscribe_amount/32
                        time:   [309.71 ms 322.87 ms 335.49 ms]
                        thrpt:  [1.4903 Melem/s 1.5486 Melem/s 1.6144 Melem/s]
                 change:
                        time:   [+3.9207% +15.962% +29.704%] (p = 0.02 < 0.05)
                        thrpt:  [-22.901% -13.765% -3.7727%]
                        Performance has regressed.
nats::subscribe_amount/1024
                        time:   [374.71 ms 387.17 ms 399.08 ms]
                        thrpt:  [1.2529 Melem/s 1.2914 Melem/s 1.3344 Melem/s]
                 change:
                        time:   [-6.4008% -1.9310% +3.0671%] (p = 0.45 > 0.05)
                        thrpt:  [-2.9759% +1.9691% +6.8385%]
                        No change in performance detected.
Benchmarking nats::subscribe_amount/8192: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 16.1s.
nats::subscribe_amount/8192
                        time:   [1.5909 s 1.6243 s 1.6562 s]
                        thrpt:  [301.90 Kelem/s 307.82 Kelem/s 314.29 Kelem/s]
                 change:
                        time:   [-3.6698% -0.7660% +2.1296%] (p = 0.61 > 0.05)
                        thrpt:  [-2.0852% +0.7719% +3.8096%]
                        No change in performance detected.

Benchmarking nats::request_amount/32: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.2s.
nats::request_amount/32 time:   [772.13 ms 797.02 ms 820.14 ms]
                        thrpt:  [12.193 Kelem/s 12.547 Kelem/s 12.951 Kelem/s]
                 change:
                        time:   [-6.7241% -2.4372% +1.9819%] (p = 0.32 > 0.05)
                        thrpt:  [-1.9434% +2.4981% +7.2088%]
                        No change in performance detected.
Benchmarking nats::request_amount/1024: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.5s.
nats::request_amount/1024
                        time:   [776.12 ms 796.75 ms 818.48 ms]
                        thrpt:  [12.218 Kelem/s 12.551 Kelem/s 12.885 Kelem/s]
                 change:
                        time:   [-6.3650% -2.9473% +0.9011%] (p = 0.15 > 0.05)
                        thrpt:  [-0.8931% +3.0368% +6.7976%]
                        No change in performance detected.
Benchmarking nats::request_amount/8192: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.6s.
nats::request_amount/8192
                        time:   [838.39 ms 862.62 ms 885.42 ms]
                        thrpt:  [11.294 Kelem/s 11.593 Kelem/s 11.928 Kelem/s]
                 change:
                        time:   [-1.9536% +3.4236% +10.308%] (p = 0.31 > 0.05)
                        thrpt:  [-9.3448% -3.3103% +1.9926%]
                        No change in performance detected.

Benchmarking jetstream::sync_publish_throughput/32: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 33.6s.
jetstream::sync_publish_throughput/32
                        time:   [3.1860 s 3.2722 s 3.3603 s]
                        thrpt:  [464.99 KiB/s 477.50 KiB/s 490.43 KiB/s]
                 change:
                        time:   [-5.9398% -2.2136% +1.6111%] (p = 0.29 > 0.05)
                        thrpt:  [-1.5856% +2.2637% +6.3149%]
                        No change in performance detected.
Benchmarking jetstream::sync_publish_throughput/1024: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 31.9s.
jetstream::sync_publish_throughput/1024
                        time:   [3.4416 s 3.5343 s 3.6185 s]
                        thrpt:  [13.494 MiB/s 13.816 MiB/s 14.188 MiB/s]
                 change:
                        time:   [-4.0616% -1.2686% +1.1776%] (p = 0.41 > 0.05)
                        thrpt:  [-1.1639% +1.2849% +4.2336%]
                        No change in performance detected.
Benchmarking jetstream::sync_publish_throughput/8192: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 46.1s.
jetstream::sync_publish_throughput/8192
                        time:   [4.4581 s 4.5115 s 4.5617 s]
                        thrpt:  [85.632 MiB/s 86.585 MiB/s 87.622 MiB/s]
                 change:
                        time:   [-2.0275% -0.5654% +1.0933%] (p = 0.51 > 0.05)
                        thrpt:  [-1.0815% +0.5687% +2.0695%]
                        No change in performance detected.

Benchmarking jetstream sync publish messages amount/32: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 33.3s.
jetstream sync publish messages amount/32
                        time:   [3.2561 s 3.3360 s 3.4192 s]
                        thrpt:  [14.623 Kelem/s 14.988 Kelem/s 15.356 Kelem/s]
                 change:
                        time:   [-5.0695% -1.8916% +1.7689%] (p = 0.33 > 0.05)
                        thrpt:  [-1.7381% +1.9281% +5.3402%]
                        No change in performance detected.
Benchmarking jetstream sync publish messages amount/1024: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 36.0s.
jetstream sync publish messages amount/1024
                        time:   [3.5616 s 3.6140 s 3.6610 s]
                        thrpt:  [13.658 Kelem/s 13.835 Kelem/s 14.039 Kelem/s]
                 change:
                        time:   [-1.5310% +0.3083% +2.1799%] (p = 0.76 > 0.05)
                        thrpt:  [-2.1334% -0.3073% +1.5548%]
                        No change in performance detected.
Benchmarking jetstream sync publish messages amount/8192: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 45.3s.
jetstream sync publish messages amount/8192
                        time:   [4.4275 s 4.4633 s 4.5006 s]
                        thrpt:  [11.110 Kelem/s 11.202 Kelem/s 11.293 Kelem/s]
                 change:
                        time:   [-4.2553% -3.3777% -2.3956%] (p = 0.00 < 0.05)
                        thrpt:  [+2.4544% +3.4958% +4.4445%]
                        Performance has improved.

Benchmarking jetstream async publish throughput/32: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.6s or enable flat sampling.
jetstream async publish throughput/32
                        time:   [76.092 ms 97.916 ms 112.25 ms]
                        thrpt:  [13.593 MiB/s 15.583 MiB/s 20.053 MiB/s]
                 change:
                        time:   [-11.334% +11.301% +39.204%] (p = 0.37 > 0.05)
                        thrpt:  [-28.163% -10.154% +12.783%]
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
Benchmarking jetstream async publish throughput/1024: Warming up for 1.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.3s or enable flat sampling.
jetstream async publish throughput/1024
                        time:   [72.199 ms 86.251 ms 101.43 ms]
                        thrpt:  [481.39 MiB/s 566.11 MiB/s 676.30 MiB/s]
                 change:
                        time:   [-25.717% -8.2061% +13.250%] (p = 0.45 > 0.05)
                        thrpt:  [-11.700% +8.9397% +34.621%]
                        No change in performance detected.
jetstream async publish throughput/8192
                        time:   [209.15 ms 229.25 ms 252.61 ms]
                        thrpt:  [1.5101 GiB/s 1.6640 GiB/s 1.8239 GiB/s]
                 change:
                        time:   [-12.408% -2.4269% +8.3355%] (p = 0.68 > 0.05)
                        thrpt:  [-7.6942% +2.4872% +14.166%]
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

jetstream::async_publish_messages_amount/32
                        time:   [102.75 ms 108.50 ms 116.83 ms]
                        thrpt:  [427.98 Kelem/s 460.81 Kelem/s 486.64 Kelem/s]
                 change:
                        time:   [-4.3137% +10.699% +31.575%] (p = 0.23 > 0.05)
                        thrpt:  [-23.998% -9.6652% +4.5082%]
                        No change in performance detected.
jetstream::async_publish_messages_amount/1024
                        time:   [100.76 ms 108.84 ms 118.98 ms]
                        thrpt:  [420.22 Kelem/s 459.39 Kelem/s 496.24 Kelem/s]
                 change:
                        time:   [-19.198% -6.4008% +6.5653%] (p = 0.39 > 0.05)
                        thrpt:  [-6.1609% +6.8385% +23.759%]
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low severe
  1 (10.00%) high mild
jetstream::async_publish_messages_amount/8192
                        time:   [205.99 ms 226.23 ms 247.16 ms]
                        thrpt:  [202.30 Kelem/s 221.02 Kelem/s 242.73 Kelem/s]
                 change:
                        time:   [-21.757% -12.217% -2.8450%] (p = 0.04 < 0.05)
                        thrpt:  [+2.9283% +13.917% +27.807%]
                        Performance has improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants